ML p(r)ior | Hybrid CNN and Dictionary-Based Models for Scene Recognition and Domain Adaptation

Hybrid CNN and Dictionary-Based Models for Scene Recognition and Domain Adaptation

2016-01-29
Convolutional neural network (CNN) has achieved state-of-the-art performance in many different visual tasks. Learned from a large-scale training dataset, CNN features are much more discriminative and accurate than the hand-crafted features. Moreover, CNN features are also transferable among different domains. On the other hand, traditional dictionarybased features (such as BoW and SPM) contain much more local discriminative and structural information, which is implicitly embedded in the images. To further improve the performance, in this paper, we propose to combine CNN with dictionarybased models for scene recognition and visual domain adaptation. Specifically, based on the well-tuned CNN models (e.g., AlexNet and VGG Net), two dictionary-based representations are further constructed, namely mid-level local representation (MLR) and convolutional Fisher vector representation (CFV). In MLR, an efficient two-stage clustering method, i.e., weighted spatial and feature space spectral clustering on the parts of a single image followed by clustering all representative parts of all images, is used to generate a class-mixture or a classspecific part dictionary. After that, the part dictionary is used to operate with the multi-scale image inputs for generating midlevel representation. In CFV, a multi-scale and scale-proportional GMM training strategy is utilized to generate Fisher vectors based on the last convolutional layer of CNN. By integrating the complementary information of MLR, CFV and the CNN features of the fully connected layer, the state-of-the-art performance can be achieved on scene recognition and domain adaptation problems. An interested finding is that our proposed hybrid representation (from VGG net trained on ImageNet) is also complementary with GoogLeNet and/or VGG-11 (trained on Place205) greatly.
PDF

Highlights - Most important sentences from the article

Login to like/save this paper, take notes and configure your recommendations

Related Articles

2014-09-04
1409.1556 | cs.CV

In this work we investigate the effect of the convolutional network depth on its accuracy in the lar… show more
PDF

Highlights - Most important sentences from the article

2013-10-06

We evaluate whether features extracted from the activation of a deep convolutional network trained i… show more
PDF

Highlights - Most important sentences from the article

2014-03-23

Recent results indicate that the generic descriptors extracted from the convolutional neural network… show more
PDF

Highlights - Most important sentences from the article

2016-10-04

Convolutional Neural Networks (CNNs) have made remarkable progress on scene recognition, partially d… show more
PDF

Highlights - Most important sentences from the article

2016-03-30
1603.09246 | cs.CV

In this paper we study the problem of image representation learning without human annotation. By fol… show more
PDF

Highlights - Most important sentences from the article

2017-04-13
1704.04232 | cs.CV

We propose `Hide-and-Seek', a weakly-supervised framework that aims to improve object localization i… show more
PDF

Highlights - Most important sentences from the article

2017-03-27

We use the scattering network as a generic and fixed ini-tialization of the first layers of a superv… show more
PDF

Highlights - Most important sentences from the article

2018-08-11

In view-based 3D shape recognition, extracting discriminative visual representation of 3D shapes fro… show more
PDF

Highlights - Most important sentences from the article

2018-04-23
1804.08348 | cs.CV

With the transition of facial expression recognition (FER) from laboratory-controlled to challenging… show more
PDF

Highlights - Most important sentences from the article

2016-09-01

What if we could effectively read the mind and transfer human visual capabilities to computer vision… show more
PDF

Highlights - Most important sentences from the article

2019-01-02

How to learn a discriminative fine-grained representation is a key point in many computer vision app… show more
PDF

Highlights - Most important sentences from the article

2016-09-01

Traditional feature encoding scheme (e.g., Fisher vector) with local descriptors (e.g., SIFT) and re… show more
PDF

Highlights - Most important sentences from the article

2016-02-24

This paper focuses on the problem of script identification in scene text images. Facing this problem… show more
PDF

Highlights - Most important sentences from the article

2015-11-09
1511.02853 | cs.CV

Weakly supervised learning of object detection is an important problem in image understanding that s… show more
PDF

Highlights - Most important sentences from the article

2015-11-23

We tackle the problem of large scale visual place recognition, where the task is to quickly and accu… show more
PDF

Highlights - Most important sentences from the article