|
This article is cited in 1 scientific paper (total in 1 paper)
Motif based sequence classification
E. P. Ofitserov Tula State University
Abstract:
Sequence classification problems often arise in such areas as bioinformatics
and natural language processing. In the last few year best results in this
field were achieved by the deep learning methods, especially by architectures
based on recurrent neural networks (RNN). However, the common problem of such
models is a lack of interpretability, i.e., extraction of key features from
data that affect the most the model's decision. Meanwhile, using of less
complicated neural network leads to decreasing predictive performance thus
limiting usage of state-of-art machine learning methods in many subject areas.
In this work we propose a novel interpretable deep learning architecture based
on extraction of principal sets of short substrings — sequence motifs. The
presence of extracted motif in the input sequence is a marker for a certain
class. The key component of proposed solution is differential alignment
algorithm developed by us, which provides a smooth analog of classical string
comparison methods such as Levenshtein edit distance, and Smith–Waterman local
alignment. Unlike previous works devoted to the motif based classification,
which used CNN for shift-invariant searching, ours model provide a way to shift
and gap invariant extraction of motifs.
Keywords:
sequence classification, machine learning, neural network, motif extraction.
Citation:
E. P. Ofitserov, “Motif based sequence classification”, Chebyshevskii Sb., 19:1 (2018), 187–199
Linking options:
https://www.mathnet.ru/eng/cheb631 https://www.mathnet.ru/eng/cheb/v19/i1/p187
|
Statistics & downloads: |
Abstract page: | 294 | Full-text PDF : | 195 | References: | 34 |
|