|
Artificial Intelligence, Knowledge and Data Engineering
A high-performance genome-wide association study algorithm based on analysis of pairs of individuals
L. V. Utkina, I. L. Utkinaba a Peter the Great Saint-Petersburg Polytechnic University
b Skolkovo Institute of Science and Technology
Abstract:
An extremely simple and high-performance genome-wide association study (GWAS) algorithm for estimating the main and epistatic effects of markers or single nucleotide polymorphisms (SNPs) is proposed. The main idea underlying the algorithm is based on comparison of genotypes of pairs of individuals and comparison of the corresponding phenotype values. It is used the intuitive assumption that changes of alleles corresponding to important SNPs in a pair of individuals lead to a large difference of phenotype values of these individuals. In other words, the algorithm is based on considering pairs of individuals instead of SNPs or pairs of SNPs. The main advantage of the algorithm is that it weakly depends on the number of SNPs in a genotype matrix. It mainly depends on the number of individuals, which is typically very small in comparison with the number of SNPs. Another important advantage of the algorithm is that it can detect the epistatic effect viewed as gene-gene interaction without additional computations. The algorithm can also be used when the phenotype takes only two values (the case-control study). Moreover, it can be simply extended from the analysis of binary genotype matrices to the microarray gene expression data analysis. Numerical experiments with real data sets consisting of populations of double haploid lines of barley illustrate the outperformance of the proposed algorithm in comparison with standard GWAS algorithms from the computation point of view especially for detecting the gene-gene interactions. The ways for improving the proposed algorithm are discussed in the paper.
Keywords:
GWAS, ANOVA, machine learning, epistasis, SNP, quantitative trait, distance metric.
Received: 30.09.2017
Citation:
L. V. Utkin, I. L. Utkina, “A high-performance genome-wide association study algorithm based on analysis of pairs of individuals”, Tr. SPIIRAN, 58 (2018), 5–26
Linking options:
https://www.mathnet.ru/eng/trspy1004 https://www.mathnet.ru/eng/trspy/v58/p5
|
|