Modeling text embedded information cascades.
Degree: PhD, 2019, Northeastern University
Networks mediate several aspects of society. For example, social networking services (SNS) like Twitter and Facebook have greatly helped people connect with families, friends and the outside world. Public policy diffuses over institutional and social networks that connect political actors in different areas. Inferring network structure is thus essential for understanding the transmission of ideas and information, which in turn could answer questions about communities, collective actions, and influential social participants. Since many networks are not directly observed, we often rely on indirect evidence, such as the timing of messages between participants, to infer latent connections. The textual content of messages, especially the reuse text originating elsewhere, is one source of such evidence. This thesis contributes techniques for detecting the evidence of text reuse and modeling underlying network structure. We propose methods to model text reuse with accidental and intentional lexical and semantic mutations. For lexical similarity detection, an n-gram shingling algorithm is proposed to detect "locally" reused passages, instead of near-duplicate documents, embedded within the larger text output of network nodes. For semantic similarity, we use an attention based neural network to also detect embedded reused texts. When modeling network structure, we are interested in inferring different levels of details: individual links between participants, the structure of a specific information cascade, or global network properties. We propose a contrastive training objective for conditional models of edges in information cascades that has the flexibility to answer those questions and is also capable of incorporating rich node and edge features. Last but not least, network embedding methods prove to be a good way to learn the representations of nodes while preserving structure, node and edge properties, and side information. We propose a self-attention Transformer-based neural network trained to predict the next activated node in a given cascade to learn node embeddings.
Advisors/Committee Members: Smith, David A (David Arthur) (advisor).
Subjects/Keywords: Information networks; Data processing; Soft computing; Data processing; Neural networks (Computer science); Text data mining; natural language processing; network embedding; social network analysis; text reuse
to Zotero / EndNote / Reference
APA (6th Edition):
Xu, S. (2019). Modeling text embedded information cascades. (Doctoral Dissertation). Northeastern University. Retrieved from http://hdl.handle.net/2047/D20328884
Chicago Manual of Style (16th Edition):
Xu, Shaobin. “Modeling text embedded information cascades.” 2019. Doctoral Dissertation, Northeastern University. Accessed August 03, 2020.
MLA Handbook (7th Edition):
Xu, Shaobin. “Modeling text embedded information cascades.” 2019. Web. 03 Aug 2020.
Xu S. Modeling text embedded information cascades. [Internet] [Doctoral dissertation]. Northeastern University; 2019. [cited 2020 Aug 03].
Available from: http://hdl.handle.net/2047/D20328884.
Council of Science Editors:
Xu S. Modeling text embedded information cascades. [Doctoral Dissertation]. Northeastern University; 2019. Available from: http://hdl.handle.net/2047/D20328884