St. Petersburg Polytechnical University Journal. Computer Science. Telecommunication and Control Systems
RUS  ENG    JOURNALS   PEOPLE   ORGANISATIONS   CONFERENCES   SEMINARS   VIDEO LIBRARY   PACKAGE AMSBIB  
General information
Latest issue
Archive

Search papers
Search references

RSS
Latest issue
Current issues
Archive issues
What is RSS



Computing, Telecommunication and Control:
Year:
Volume:
Issue:
Page:
Find






Personal entry:
Login:
Password:
Save password
Enter
Forgotten password?
Register


St. Petersburg Polytechnical University Journal. Computer Science. Telecommunication and Control Systems, 2015, Issue 2-3(217-222), Pages 105–114
DOI: https://doi.org/10.5862/JCSTCS.217-222.9
(Mi ntitu107)
 

System Analysis and Control

Using an ensemble of transforming autoencoders to represent 3D objects

A. A. Khurshudov

Kuban State Tekhnological University
Abstract: One of the key goals of computer vision-related machine learning is to obtain high-quality representations of visual data resistant to changes in viewpoint, area, lighting, object pose or texture. Current state-of-the-art convolutional networks, such as GoogLeNet or AlexNet, can successfully produce invariant representations sufficient to perform complex multiclass classification. Some researchers, however, (Hinton, Khizhevsky, et al.) suggest that this approach, while being quite suitable for classification tasks, is misguided in terms of what an efficient visual system should be capable of doing: namely, being able to reflect spatial transformations of learned objects in a predictable way. The key concept of their research is equivariance rather than invariance, or the model's ability to change representation parameters in response to different poses and transformations of a model-specific visual entity.
This paper employs Hinton's architecture of transforming autoencoder neural networks to identify lowlevel spatial feature descriptors. Applying a supervised SVM classifier to these detectors, one can then represent a sufficiently complex object, such as a geometric shape or a human face, as a composition of spatially related features. Using the equivariance property, one can also draw distinctions between different object poses, e.g., a frontal face image or a profile image, and then, be able to learn about another, higher-leveled transforming autoencoder via the same architecture. To obtain initial data for first-level feature learning, we use sequences of frames, or movies, and apply computer vision algorithms to detect regions of maximum interest and track their image patches across the movie. We argue that this way of learning features represents a more realistic approach to vision than general naive feature learning from a supervised dataset. The initial idea came from the concept of one-shot learning (by Fei-Fei et al.), that suggests a possibility of obtaining meaningful features from just one image (or, as in this study, a rather limited set of images supervised by time and order).
Keywords: transforming autoencoder, one-shot learning, equivariant representation, capsules.
Document Type: Article
UDC: 004.923
Language: Russian
Citation: A. A. Khurshudov, “Using an ensemble of transforming autoencoders to represent 3D objects”, St. Petersburg Polytechnical University Journal. Computer Science. Telecommunication and Control Sys, 2015, no. 2-3(217-222), 105–114
Citation in format AMSBIB
\Bibitem{Khu15}
\by A.~A.~Khurshudov
\paper Using an ensemble of transforming autoencoders to represent 3D objects
\jour St. Petersburg Polytechnical University Journal. Computer Science. Telecommunication and Control Sys
\yr 2015
\issue 2-3(217-222)
\pages 105--114
\mathnet{http://mi.mathnet.ru/ntitu107}
\crossref{https://doi.org/10.5862/JCSTCS.217-222.9}
Linking options:
  • https://www.mathnet.ru/eng/ntitu107
  • https://www.mathnet.ru/eng/ntitu/y2015/i2/p105
  • Citing articles in Google Scholar: Russian citations, English citations
    Related articles in Google Scholar: Russian articles, English articles
    Computing, Telecommunication and Control
    Statistics & downloads:
    Abstract page:185
    Full-text PDF :59
    First page:14
     
      Contact us:
     Terms of Use  Registration to the website  Logotypes © Steklov Mathematical Institute RAS, 2024