|
Bioinformatics
Number of overlaps in patterns
E. I. Furletovaa, M. A. Roytbergabc a Institute of Mathematical Problems of Biology, Russian Academy of Science, Pushchino, Moscow Region, Russia
b Moscow Institute of Physics and Technology, Dolgoprudny, Moscow Region, Russia
c Higher School of Economics, Moscow, Russia
Abstract:
The aim of the paper is to estimate the number of overlaps in the given pattern. The pattern is a set of words of same length $m$ in an alphabet $A$. We present theoretical and experimental bounds for overlaps number in two types of patterns. Firstly, we considered random patterns which relate to uniform probability model, i.e. all letters in the alphabet and, correspondently, all words of same length are equiprobable. We proved that the average number of overlaps $P$ for random patterns consisting of $n$ words of length $m$ linearly depends on pattern size $n$ and is independent of length of pattern words. In performed computer experiments the ratio $P/n$ ranged from $0.33$ till $1.06$; the theoretical evaluations of the ratio for the patterns do not exceed $1.67$. The secondly, we studied the patterns described by position weight matrices (PWM) from the data base HOCOMOCO and various cut-offs. For such patterns the ratio $P/n$ in experiments ranged from $0.004$ till $1$, for most of the patterns it is smaller then $0.1$.
Key words:
overlap, pattern, pattern occurrence in a sequence.
Received 19.11.2015, Published 27.01.2016
Citation:
E. I. Furletova, M. A. Roytberg, “Number of overlaps in patterns”, Mat. Biolog. Bioinform., 11:1 (2016), 14–23
Linking options:
https://www.mathnet.ru/eng/mbb248 https://www.mathnet.ru/eng/mbb/v11/i1/p14
|
Statistics & downloads: |
Abstract page: | 155 | Full-text PDF : | 54 | References: | 35 |
|