|
Parallel frequent itemset mining on the Intel MIC accelerators
M. L. Tsymbler South Ural State University (pr. Lenina 76, Chelyabinsk, 454080 Russia)
Abstract:
Association rule mining is one of the basic problems of data mining, which supposes finding strong correlations between itemsets in large transaction database. Association rules are generated from frequent itemsets (itemset is frequent if its items frequent occur together in transactions). The DIC (Dynamic Itemset Counting) algorithm is modification of the classical Apriori algorithm of finding frequent itemsets. DIC tries to reduce the number of passes made over the transaction database while keeping the number of itemsets counted in a pass relatively low. The paper addresses the task of accelerating DIC on the Intel MIC (Many Integrated Core) systems in the case when the transaction database fits into the main memory. The paper presents a parallel implementation of DIC based on OpenMP technology and thread-level parallelism. We exploit the bit-based internal layout for transactions and itemsets. This technique simplifies the support count via logical bitwise operation, and allows for vectorization of such a step. Experiments with large synthetic and real databases showed good performance and scalability of the proposed algorithm.
Keywords:
data mining, frequent itemset counting, OpenMP, Intel Many Integrated Core.
Received: 26.12.2018
Citation:
M. L. Tsymbler, “Parallel frequent itemset mining on the Intel MIC accelerators”, Vestn. YuUrGU. Ser. Vych. Matem. Inform., 8:1 (2019), 54–70
Linking options:
https://www.mathnet.ru/eng/vyurv206 https://www.mathnet.ru/eng/vyurv/v8/i1/p54
|
Statistics & downloads: |
Abstract page: | 175 | Full-text PDF : | 65 | References: | 21 |
|