Trudy SPIIRAN
RUS  ENG    JOURNALS   PEOPLE   ORGANISATIONS   CONFERENCES   SEMINARS   VIDEO LIBRARY   PACKAGE AMSBIB  
General information
Latest issue
Archive

Search papers
Search references

RSS
Latest issue
Current issues
Archive issues
What is RSS



Informatics and Automation:
Year:
Volume:
Issue:
Page:
Find






Personal entry:
Login:
Password:
Save password
Enter
Forgotten password?
Register


Trudy SPIIRAN, 2018, Issue 58, Pages 5–26
DOI: https://doi.org/10.15622/sp.58.1
(Mi trspy1004)
 

Artificial Intelligence, Knowledge and Data Engineering

A high-performance genome-wide association study algorithm based on analysis of pairs of individuals

L. V. Utkina, I. L. Utkinaba

a Peter the Great Saint-Petersburg Polytechnic University
b Skolkovo Institute of Science and Technology
Abstract: An extremely simple and high-performance genome-wide association study (GWAS) algorithm for estimating the main and epistatic effects of markers or single nucleotide polymorphisms (SNPs) is proposed. The main idea underlying the algorithm is based on comparison of genotypes of pairs of individuals and comparison of the corresponding phenotype values. It is used the intuitive assumption that changes of alleles corresponding to important SNPs in a pair of individuals lead to a large difference of phenotype values of these individuals. In other words, the algorithm is based on considering pairs of individuals instead of SNPs or pairs of SNPs. The main advantage of the algorithm is that it weakly depends on the number of SNPs in a genotype matrix. It mainly depends on the number of individuals, which is typically very small in comparison with the number of SNPs. Another important advantage of the algorithm is that it can detect the epistatic effect viewed as gene-gene interaction without additional computations. The algorithm can also be used when the phenotype takes only two values (the case-control study). Moreover, it can be simply extended from the analysis of binary genotype matrices to the microarray gene expression data analysis. Numerical experiments with real data sets consisting of populations of double haploid lines of barley illustrate the outperformance of the proposed algorithm in comparison with standard GWAS algorithms from the computation point of view especially for detecting the gene-gene interactions. The ways for improving the proposed algorithm are discussed in the paper.
Keywords: GWAS, ANOVA, machine learning, epistasis, SNP, quantitative trait, distance metric.
Received: 30.09.2017
Bibliographic databases:
Document Type: Article
UDC: 006.72
Language: English
Citation: L. V. Utkin, I. L. Utkina, “A high-performance genome-wide association study algorithm based on analysis of pairs of individuals”, Tr. SPIIRAN, 58 (2018), 5–26
Citation in format AMSBIB
\Bibitem{UtkUtk18}
\by L.~V.~Utkin, I.~L.~Utkina
\paper A high-performance genome-wide association study algorithm based on analysis of pairs of individuals
\jour Tr. SPIIRAN
\yr 2018
\vol 58
\pages 5--26
\mathnet{http://mi.mathnet.ru/trspy1004}
\crossref{https://doi.org/10.15622/sp.58.1}
\elib{https://elibrary.ru/item.asp?id=35630301}
Linking options:
  • https://www.mathnet.ru/eng/trspy1004
  • https://www.mathnet.ru/eng/trspy/v58/p5
  • Citing articles in Google Scholar: Russian citations, English citations
    Related articles in Google Scholar: Russian articles, English articles
    Informatics and Automation
    Statistics & downloads:
    Abstract page:145
    Full-text PDF :50
     
      Contact us:
     Terms of Use  Registration to the website  Logotypes © Steklov Mathematical Institute RAS, 2024