|
On the generation of synthetic features based on support chains and arbitrary metrics within the framework of a topological approach to data analysis. Part 2. Experimental testing on pharmacoinformatics problems
I. Yu. Torshin Federal Research Center “Computer Science and Control” of the Russian Academy of Sciences, 44-2 Vavilov Str., Moscow 119333, Russian Federation
Abstract:
Consideration of precedent relationships between features and a target variable in the form of sets of Boolean lattice elements indicates the possibility of generating synthetic features using metric distance functions. Approaches to ($i$) assessing the relevance (“informativeness”) of metrics in relation to the problems being solved; ($ii$) generating; and ($iii$) selecting synthetic features that are more informative than the original feature descriptions are formulated. The results of topological analysis of 2400 samples of “molecule–property” data from ProteomicsDB made it possible to obtain fairly effective algorithms for predicting the properties of molecules (rank correlation in cross-validation is 0.90$\pm$0.23). Using this sample of problems, metrics have been established that most often generate informative synthetic features: maximum Kolmogorov deviation, “oblique” distance, and Lp, Renyi, and von Mises metrics. To solve the studied set of problems, the advantage of polynomial correctors compared to neural network and random forest correctors is shown.
Keywords:
topological data analysis, lattice theory, algebraic approach of Yu. I. Zhuravlev, pharmacoinformatics.
Received: 09.04.2024
Citation:
I. Yu. Torshin, “On the generation of synthetic features based on support chains and arbitrary metrics within the framework of a topological approach to data analysis. Part 2. Experimental testing on pharmacoinformatics problems”, Inform. Primen., 18:2 (2024), 47–53
Linking options:
https://www.mathnet.ru/eng/ia899 https://www.mathnet.ru/eng/ia/v18/i2/p47
|
Statistics & downloads: |
Abstract page: | 25 | Full-text PDF : | 4 | References: | 5 |
|