Compressing and Performing Algorithms on Massively Large Networks.
Networks are represented as a set of nodes (vertices) and the arcs (links) connecting them. Such networks can model various real-world structures such as social networks (e.g., Facebook), information networks (e.g., citation networks), technological networks (e.g., the Internet), and biological networks (e.g., gene-phenotype network). Analysis of such structures is a heavily studied area with many applications. However, in this era of big data, we find ourselves with networks so massive that the space requirements inhibit network analysis.
Since many of these networks have nodes and arcs on the order of billions to trillions, even basic data structures such as adjacency lists could cost petabytes to zettabytes of storage. Storing these networks in secondary memory would require I/O access (i.e., disk access) during analysis, thus drastically slowing analysis time. To perform analysis efficiently on such extensive data, we either need enough main memory for the data structures and algorithms, or we need to develop compressions that require much less space while still being able to answer queries efficiently.
In this dissertation, we develop several compression techniques that succinctly represent these real-world networks while still being able to efficiently query the network (e.g., check if an arc exists between two nodes). Furthermore, since many of these networks continue to grow over time, our compression techniques also support the ability to add and remove nodes and edges directly on the compressed structure. We also provide a way to compress the data quickly without any intermediate structure, thus giving minimal memory overhead. We provide detailed analysis and prove that our compression is indeed succinct (i.e., achieves the information-theoretic lower bound). Also, we empirically show that our compression rates outperform or are equal to existing compression algorithms on many benchmark datasets.
We also extend our technique to time-evolving networks. That is, we store the entire state of the network at each time frame. Studying time-evolving networks allows us to find patterns throughout the time that would not be available in regular, static network analysis. A succinct representation for time-evolving networks is arguably more important than static graphs, due to the extra dimension inflating the space requirements of basic data structures even more. Again, we manage to achieve succinctness while also providing fast encoding, minimal memory overhead during encoding, fast queries, and fast, direct modification. We also compare against several benchmarks and empirically show that we achieve compression rates better than or equal to the best performing benchmark for each dataset.
Finally, we also develop both static and time-evolving algorithms that run directly on our compressed structures. Using our static graph compression combined with our differential technique, we find that we can speed up matrix-vector multiplication by reusing previously computed products. We compare…
Advisors/Committee Members: Radhakrishnan, Sridhar (advisor), Changwook, Kim (committee member), Cheng, Qi (committee member), Grant, Christan (committee member), Nicholson, Charles (committee member).
to Zotero / EndNote / Reference
APA (6th Edition):
Nelson, M. (2019). Compressing and Performing Algorithms on Massively Large Networks. (Doctoral Dissertation). University of Oklahoma. Retrieved from http://hdl.handle.net/11244/323259
Chicago Manual of Style (16th Edition):
Nelson, Michael. “Compressing and Performing Algorithms on Massively Large Networks.” 2019. Doctoral Dissertation, University of Oklahoma. Accessed January 25, 2020.
MLA Handbook (7th Edition):
Nelson, Michael. “Compressing and Performing Algorithms on Massively Large Networks.” 2019. Web. 25 Jan 2020.
Nelson M. Compressing and Performing Algorithms on Massively Large Networks. [Internet] [Doctoral dissertation]. University of Oklahoma; 2019. [cited 2020 Jan 25].
Available from: http://hdl.handle.net/11244/323259.
Council of Science Editors:
Nelson M. Compressing and Performing Algorithms on Massively Large Networks. [Doctoral Dissertation]. University of Oklahoma; 2019. Available from: http://hdl.handle.net/11244/323259