Discovery and Effective Use of Frequent Item-set Mining and Association Rules in Datasets.
Degree: PhD, Computer Science, 2019, York University
The unprecedented rise in digitized data generation has led to the ever-expanding demand for sophisticated storage and analysis methods capable of handling vast amounts of complex data, much of which is stored within many databases. Owing to the large size of such databases, employment of sophisticated analysis methods, such as data mining and machine learning, becomes necessary to extract useful insights regarding a given system under study. Frequent itemset mining and association rules mining represent two key approaches to mining knowledge stored in databases. However, handling of large databases often leads to time-consuming calculations that necessitate large amounts of memory. In this regard, the development of methods capable of enabling faster, less laborious search or pattern discovery remains a central focus in the field of data mining. Incontestably, such methods could aid in faster processing and knowledge extraction, enabling new breakthroughs in how knowledge is acquired from data and applied in real-world applications. However, real-world applications are often hindered by limitations inherent to currently available algorithms. For instance, many itemset mining algorithms are known to first store a given database as a tree structure in memory. However, such algorithms fail to provide a tight upper bound on the number of nodes that will be generated during the tree building process accordingly, there are no upper bounds governing the amount of memory that is needed to generate such trees. As such, practical implementation of frequent itemset
mining algorithms is often restricted by memory consumption. However, despite the importance of memory consumption in the applicability of itemset mining, this factor has not drawn adequate attention from the data mining community and remains as a key challenge in its application. In addition, the majority of algorithms widely used and studied to date are known to require multiple database scans, a factor which restricts their applicability for incremental mining applications. In this regard, the development of an algorithm capable of dynamically mining frequent patterns on-the-fly would open new pathways in data mining, enabling the application of itemset mining methods to new real-world applications, in addition to vastly improving current applications.
In this thesis, different approaches are proposed in relation to the above-mentioned limitations currently hampering further progress in this significant area of data mining. First, an upper bound on the number of nodes of well-known tree structures in frequent itemset mining is presented. Second, aiming to overcome the memory consumption constraint, a memory-efficient method to store data processed by the frequent itemset mining algorithm is proposed, where instead of a tree, data is stored in a compact directed graph whose nodes represent items. Third, an algorithm is proposed to overcome costly databases scans in the form of a novel SPFP-tree (single pass frequent pattern tree) algorithm. Lastly, approaches…
Advisors/Committee Members: Gryz, Jarek (advisor).
Subjects/Keywords: Computer engineering; Machine learning; Recommender systems; Frequent itemset mining; Association rule mining
to Zotero / EndNote / Reference
APA (6th Edition):
Shahbazi, N. (2019). Discovery and Effective Use of Frequent Item-set Mining and Association Rules in Datasets. (Doctoral Dissertation). York University. Retrieved from http://hdl.handle.net/10315/35912
Chicago Manual of Style (16th Edition):
Shahbazi, Nima. “Discovery and Effective Use of Frequent Item-set Mining and Association Rules in Datasets.” 2019. Doctoral Dissertation, York University. Accessed March 20, 2019.
MLA Handbook (7th Edition):
Shahbazi, Nima. “Discovery and Effective Use of Frequent Item-set Mining and Association Rules in Datasets.” 2019. Web. 20 Mar 2019.
Shahbazi N. Discovery and Effective Use of Frequent Item-set Mining and Association Rules in Datasets. [Internet] [Doctoral dissertation]. York University; 2019. [cited 2019 Mar 20].
Available from: http://hdl.handle.net/10315/35912.
Council of Science Editors:
Shahbazi N. Discovery and Effective Use of Frequent Item-set Mining and Association Rules in Datasets. [Doctoral Dissertation]. York University; 2019. Available from: http://hdl.handle.net/10315/35912