A. A. Khurshudov, “Using an ensemble of transforming autoencoders to represent 3D objects”, St. Petersburg Polytechnical University Journal. Computer Science. Telecommunication and Control Sys, 2015, no. 2-3(217-222), 105

St. Petersburg Polytechnical University Journal. Computer Science. Telecommunication and Control Systems

RUS ENG

JOURNALS PEOPLE ORGANISATIONS CONFERENCES SEMINARS VIDEO LIBRARY PACKAGE AMSBIB

JavaScript is disabled in your browser. Please switch it on to enable full functionality of the website

	General information
	Latest issue
	Archive

	Search papers
	Search references

	RSS
	Latest issue
	Current issues
	Archive issues
	What is RSS

Computing, Telecommunication and Control:
Year:
Volume:
Issue:
Page:
	Find

Personal entry:
Login:
Password:
	Save password
	Enter
	Forgotten password?
	Register

St. Petersburg Polytechnical University Journal. Computer Science. Telecommunication and Control Systems, 2015, Issue 2-3(217-222), Pages 105–114
DOI: https://doi.org/10.5862/JCSTCS.217-222.9 (Mi ntitu107)

System Analysis and Control

Using an ensemble of transforming autoencoders to represent 3D objects

A. A. Khurshudov

Kuban State Tekhnological University

Full-text PDF (426 kB)

DOI: https://doi.org/10.5862/JCSTCS.217-222.9

Abstract: One of the key goals of computer vision-related machine learning is to obtain high-quality representations of visual data resistant to changes in viewpoint, area, lighting, object pose or texture. Current state-of-the-art convolutional networks, such as GoogLeNet or AlexNet, can successfully produce invariant representations sufficient to perform complex multiclass classification. Some researchers, however, (Hinton, Khizhevsky, et al.) suggest that this approach, while being quite suitable for classification tasks, is misguided in terms of what an efficient visual system should be capable of doing: namely, being able to reflect spatial transformations of learned objects in a predictable way. The key concept of their research is equivariance rather than invariance, or the model's ability to change representation parameters in response to different poses and transformations of a model-specific visual entity.
This paper employs Hinton's architecture of transforming autoencoder neural networks to identify lowlevel spatial feature descriptors. Applying a supervised SVM classifier to these detectors, one can then represent a sufficiently complex object, such as a geometric shape or a human face, as a composition of spatially related features. Using the equivariance property, one can also draw distinctions between different object poses, e.g., a frontal face image or a profile image, and then, be able to learn about another, higher-leveled transforming autoencoder via the same architecture. To obtain initial data for first-level feature learning, we use sequences of frames, or movies, and apply computer vision algorithms to detect regions of maximum interest and track their image patches across the movie. We argue that this way of learning features represents a more realistic approach to vision than general naive feature learning from a supervised dataset. The initial idea came from the concept of one-shot learning (by Fei-Fei et al.), that suggests a possibility of obtaining meaningful features from just one image (or, as in this study, a rather limited set of images supervised by time and order).

Keywords: transforming autoencoder, one-shot learning, equivariant representation, capsules.

Document Type: Article

UDC: 004.923

Language: Russian

Citation: A. A. Khurshudov, “Using an ensemble of transforming autoencoders to represent 3D objects”, St. Petersburg Polytechnical University Journal. Computer Science. Telecommunication and Control Sys, 2015, no. 2-3(217-222), 105–114

Citation in format AMSBIB

\Bibitem{Khu15}

\by A.~A.~Khurshudov

\paper Using an ensemble of transforming autoencoders to represent 3D objects

\jour St. Petersburg Polytechnical University Journal. Computer Science. Telecommunication and Control Sys

\yr 2015

\issue 2-3(217-222)

\pages 105--114

\mathnet{http://mi.mathnet.ru/ntitu107}

\crossref{https://doi.org/10.5862/JCSTCS.217-222.9}

Linking options:

https://www.mathnet.ru/eng/ntitu107

https://www.mathnet.ru/eng/ntitu/y2015/i2/p105

Citing articles in Google Scholar: Russian citations, English citations
Related articles in Google Scholar: Russian articles, English articles

Computing, Telecommunication and Control

Statistics & downloads:
Abstract page:	185
Full-text PDF :	59
First page:	14

Что такое QR-код?

Registration to the website

Logotypes