Matematicheskaya Biologiya i Bioinformatika
RUS  ENG    JOURNALS   PEOPLE   ORGANISATIONS   CONFERENCES   SEMINARS   VIDEO LIBRARY   PACKAGE AMSBIB  
General information
Latest issue
Archive
Impact factor

Search papers
Search references

RSS
Latest issue
Current issues
Archive issues
What is RSS



Mat. Biolog. Bioinform.:
Year:
Volume:
Issue:
Page:
Find






Personal entry:
Login:
Password:
Save password
Enter
Forgotten password?
Register


Matematicheskaya Biologiya i Bioinformatika, 2021, Volume 16, Issue 2, Pages 299–316
DOI: https://doi.org/10.17537/2021.16.299
(Mi mbb468)
 

Bioinformatics

Principal components of genetic sequences: correlations and significance

V. M. Efimovabcd, K. V. Efimove, V. Yu. Kovalevab, Yu. G. Matushkina

a Institute of Cytology and Genetics, Siberian Branch of the Russian Academy of Sciences, Novosibirsk, Russia
b Institute of Systematics and Ecology of Animals SB RAS, Novosibirsk, Russia
c Novosibirsk State University, Novosibirsk, Russia
d Tomsk State University, Tomsk, Russia
e HSE School of Economics, Moscow, Russia
References:
Abstract: Any numerical series can be decomposed into principal components using singular spectral analysis. We have recently proposed a new analysis method – PCA-Seq, which allows calculating numerical principal components for a sequence of elements of any type. In particular, the sequence may be composed of nucleotide base pairs or amino acid residues. Two questions inevitably arise about interpretation of the obtained principal components and about the assessment of their reliability. For interpretation of the symbolic sequence principal components, it is reasonable to evaluate their correlations with numerical characteristics of the sequence elements. To assess the significance of correlations between sequences, one should bear in mind that standard significance criteria are based on the assumption of independence of observations, which, as a rule, is not fulfilled for real sequences. The article discusses the use of an anchor bootstrap technique for these purposes also previously developed by the authors of the article. In this approach it is assumed, that points of a metric space can represent the objects. When taken together they make up some fixed structure in it, in particular, a sequence. The objects are assigned the same random integer weights as in the classical bootstrap. This is sufficient to obtain the bootstrap distribution of the correlation coefficients and assess their significance. The coding sequence of the SLC9A1 gene (synonyms APNH, NHE1, PPP1R143) were taken as an example of use the anchor bootstrap technique in the genetic sequence analysis. Significant correlations of the first principal component were revealed with the hydrophobicity/“transmembraneity” of the corresponding fragments of the amino acid sequence, the phenylalanine content in them, as well as the difference in the T- and A-content in the corresponding nucleotide fragments. Earlier a similar pattern was found by other authors for other genes. Very likely, that it is of a more general nature.
Key words: SSA, PCA-Seq, SLC9A1 (NHE1) gene, CDS, protein secondary structure, external factors, anchor bootstrap.
Received 10.05.2021, 30.07.2021, Published 10.09.2021
Bibliographic databases:
Document Type: Article
Language: Russian
Citation: V. M. Efimov, K. V. Efimov, V. Yu. Kovaleva, Yu. G. Matushkin, “Principal components of genetic sequences: correlations and significance”, Mat. Biolog. Bioinform., 16:2 (2021), 299–316
Citation in format AMSBIB
\Bibitem{EfiEfiKov21}
\by V.~M.~Efimov, K.~V.~Efimov, V.~Yu.~Kovaleva, Yu.~G.~Matushkin
\paper Principal components of genetic sequences: correlations and significance
\jour Mat. Biolog. Bioinform.
\yr 2021
\vol 16
\issue 2
\pages 299--316
\mathnet{http://mi.mathnet.ru/mbb468}
\crossref{https://doi.org/10.17537/2021.16.299}
\elib{https://elibrary.ru/item.asp?id=47918036}
Linking options:
  • https://www.mathnet.ru/eng/mbb468
  • https://www.mathnet.ru/eng/mbb/v16/i2/p299
  • Citing articles in Google Scholar: Russian citations, English citations
    Related articles in Google Scholar: Russian articles, English articles
    Statistics & downloads:
    Abstract page:88
    Full-text PDF :36
    References:4
     
      Contact us:
     Terms of Use  Registration to the website  Logotypes © Steklov Mathematical Institute RAS, 2024