You searched for +publisher:"Virginia Tech" +contributor:("Ramakrishnan, Naren")
.
Showing records 1 – 30 of
162 total matches.
◁ [1] [2] [3] [4] [5] [6] ▶

Virginia Tech
1.
Self, Nathan.
User Interfaces for an Open Source Indicators Forecasting System.
Degree: MS, Computer Science and Applications, 2015, Virginia Tech
URL: http://hdl.handle.net/10919/56696
► Intelligence analysts today are faced with many challenges, chief among them being the need to fuse disparate streams of data and rapidly arrive at analytical…
(more)
▼ Intelligence analysts today are faced with many challenges, chief among them being the need to fuse disparate streams of data and rapidly arrive at analytical decisions and quantitative predictions for use by policy makers. A forecasting tool to anticipate key events of interest is an invaluable aid in helping analysts cut through the chatter. We present the design of user interfaces for the EMBERS system, an anticipatory intelligence system that ingests myriad open source data streams (e.g., news, blogs, tweets, economic and financial indicators, search trends) to generate forecasts of significant societal-level events such as disease outbreaks, protests, and elections. A key research issue in EMBERS is not just to generate high-quality forecasts but provide interfaces for analysts so they can understand the rationale behind these forecasts and pose why, what-if, and other exploratory questions.
This thesis presents the design and implementation of three visualization interfaces for EMBERS. First, we illustrate how the rationale behind forecasts can be presented to users through the use of an audit trail and its associated visualization. The audit trail enables an analyst to drill-down from a final forecast down to the raw (and processed) data sources that contributed to the forecast. Second, we present a forensics tool called Reverse OSI that enables analysts to investigate if there was additional information either in existing or new data sources that can be used to improve forecasting. Unlike the audit trail which captures the transduction of data from raw feeds into alerts, Reverse OSI enables us to posit connections from (missed) forecasts back to raw feeds. Finally, we present an interactive machine learning approach for analysts to steer the construction of machine learning mod-els. This provides fine-grained control into tuning tradeoffs underlying EMBERS. Together, these three interfaces support a range of functionality in EMBERS, from visualization of algorithm output to a complete framework for user feedback via a tight human-algorithm loop. They are currently being utilized by a range of user groups in EMBERS: analysts, social scientists, and machine learning developers, respectively.
Advisors/Committee Members: Ramakrishnan, Naren (committeechair), Luther, Kurt (committee member), North, Christopher L. (committee member).
Subjects/Keywords: Visualization; Forecasting; Intelligence Analysis; Open Source Indicators
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Self, N. (2015). User Interfaces for an Open Source Indicators Forecasting System. (Masters Thesis). Virginia Tech. Retrieved from http://hdl.handle.net/10919/56696
Chicago Manual of Style (16th Edition):
Self, Nathan. “User Interfaces for an Open Source Indicators Forecasting System.” 2015. Masters Thesis, Virginia Tech. Accessed March 07, 2021.
http://hdl.handle.net/10919/56696.
MLA Handbook (7th Edition):
Self, Nathan. “User Interfaces for an Open Source Indicators Forecasting System.” 2015. Web. 07 Mar 2021.
Vancouver:
Self N. User Interfaces for an Open Source Indicators Forecasting System. [Internet] [Masters thesis]. Virginia Tech; 2015. [cited 2021 Mar 07].
Available from: http://hdl.handle.net/10919/56696.
Council of Science Editors:
Self N. User Interfaces for an Open Source Indicators Forecasting System. [Masters Thesis]. Virginia Tech; 2015. Available from: http://hdl.handle.net/10919/56696

Virginia Tech
2.
Wang, Ji.
Clustered Layout Word Cloud for User Generated Online Reviews.
Degree: MS, Computer Science and Applications, 2012, Virginia Tech
URL: http://hdl.handle.net/10919/19193
► User generated reviews, like those found on Yelp and Amazon, have become important refer- ence material in casual decision making, like dining, shopping and entertainment.…
(more)
▼ User generated reviews, like those found on Yelp and Amazon, have become important refer- ence material in casual decision making, like dining, shopping and entertainment. However, very large amounts of reviews make the review reading process time consuming. A text visualization can speed up the review reading process. In this thesis, we present the clustered layout word cloud – a text visualization that quickens decision making based on user generated reviews. We used a natural language processing approach, called grammatical dependency parsing, to analyze user generated review content and create a semantic graph. A force-directed graph layout was applied to the graph to create the clustered layout word cloud. We conducted a two-task user study to compare the clustered layout word cloud to two alternative review reading techniques: random layout word cloud and normal block-text reviews. The results showed that the clustered layout word cloud offers faster task completion time and better user satisfaction than the other two alternative review reading techniques.
[Permission email from J. Huang removed at his request. GMc March 11, 2014]
Advisors/Committee Members: North, Christopher L. (committeechair), Cao, Yong (committee member), Ramakrishnan, Naren (committee member).
Subjects/Keywords: Word Cloud; Text Visualization; User Study
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Wang, J. (2012). Clustered Layout Word Cloud for User Generated Online Reviews. (Masters Thesis). Virginia Tech. Retrieved from http://hdl.handle.net/10919/19193
Chicago Manual of Style (16th Edition):
Wang, Ji. “Clustered Layout Word Cloud for User Generated Online Reviews.” 2012. Masters Thesis, Virginia Tech. Accessed March 07, 2021.
http://hdl.handle.net/10919/19193.
MLA Handbook (7th Edition):
Wang, Ji. “Clustered Layout Word Cloud for User Generated Online Reviews.” 2012. Web. 07 Mar 2021.
Vancouver:
Wang J. Clustered Layout Word Cloud for User Generated Online Reviews. [Internet] [Masters thesis]. Virginia Tech; 2012. [cited 2021 Mar 07].
Available from: http://hdl.handle.net/10919/19193.
Council of Science Editors:
Wang J. Clustered Layout Word Cloud for User Generated Online Reviews. [Masters Thesis]. Virginia Tech; 2012. Available from: http://hdl.handle.net/10919/19193

Virginia Tech
3.
Arefiyan Khalilabad, Seyyed Mostafa.
Deep Learning Models for Context-Aware Object Detection.
Degree: MS, Computer Engineering, 2017, Virginia Tech
URL: http://hdl.handle.net/10919/88387
► In this thesis, we present ContextNet, a novel general object detection framework for incorporating context cues into a detection pipeline. Current deep learning methods for…
(more)
▼ In this thesis, we present ContextNet, a novel general object detection framework for incorporating context cues into a detection pipeline. Current deep learning methods for object detection exploit state-of-the-art image recognition networks for classifying the given region-of-interest (ROI) to predefined classes and regressing a bounding-box around it without using any information about the corresponding scene. ContextNet is based on an intuitive idea of having cues about the general scene (e.g., kitchen and library), and changes the priors about presence/absence of some object classes. We provide a general means for integrating this notion in the decision process about the given ROI by using a pretrained network on the scene recognition datasets in parallel to a pretrained network for extracting object-level features for the corresponding ROI. Using comprehensive experiments on the PASCAL VOC 2007, we demonstrate the effectiveness of our design choices, the resulting system outperforms the baseline in most object classes, and reaches 57.5 mAP (mean Average Precision) on the PASCAL VOC 2007 test set in comparison with 55.6 mAP for the baseline.
Advisors/Committee Members: Abbott, Amos L. (committeechair), Tokekar, Pratap (committee member), Ramakrishnan, Naren (committee member).
Subjects/Keywords: Context-aware Detection; Object Detection; Context Modeling; Context Extraction; Convolutional Neural Network; Computer Vision; Deep Learning; Machine Learning
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Arefiyan Khalilabad, S. M. (2017). Deep Learning Models for Context-Aware Object Detection. (Masters Thesis). Virginia Tech. Retrieved from http://hdl.handle.net/10919/88387
Chicago Manual of Style (16th Edition):
Arefiyan Khalilabad, Seyyed Mostafa. “Deep Learning Models for Context-Aware Object Detection.” 2017. Masters Thesis, Virginia Tech. Accessed March 07, 2021.
http://hdl.handle.net/10919/88387.
MLA Handbook (7th Edition):
Arefiyan Khalilabad, Seyyed Mostafa. “Deep Learning Models for Context-Aware Object Detection.” 2017. Web. 07 Mar 2021.
Vancouver:
Arefiyan Khalilabad SM. Deep Learning Models for Context-Aware Object Detection. [Internet] [Masters thesis]. Virginia Tech; 2017. [cited 2021 Mar 07].
Available from: http://hdl.handle.net/10919/88387.
Council of Science Editors:
Arefiyan Khalilabad SM. Deep Learning Models for Context-Aware Object Detection. [Masters Thesis]. Virginia Tech; 2017. Available from: http://hdl.handle.net/10919/88387

Virginia Tech
4.
Muthiah, Sathappan.
Forecasting Protests by Detecting Future Time Mentions in News and Social Media.
Degree: MS, Computer Science and Applications, 2014, Virginia Tech
URL: http://hdl.handle.net/10919/49535
► Civil unrest (protests, strikes, and ``occupy'' events) is a common occurrence in both democracies and authoritarian regimes. The study of civil unrest is a key…
(more)
▼ Civil unrest (protests, strikes, and ``occupy'' events) is a common occurrence in both democracies and authoritarian regimes. The study of civil unrest is a key topic for political scientists as it helps capture an important mechanism by which citizenry express themselves. In countries where civil unrest is lawful, qualitative analysis has revealed that more than 75% of the protests are planned, organized, and/or announced in advance; therefore detecting future time mentions in relevant news and social media is a simple way to develop a protest forecasting system. We develop such a system in this thesis, using a combination of key phrase learning to identify what to look for, probabilistic soft logic to reason about location occurrences in extracted results, and time normalization to resolve future tense mentions. We illustrate the application of our system to 10 countries in Latin America, viz. Argentina, Brazil, Chile, Colombia, Ecuador, El Salvador, Mexico, Paraguay, Uruguay, and Venezuela. Results demonstrate our successes in capturing significant societal unrest in these countries with an average lead time of 4.08 days. We also study the selective superiorities of news media versus social media (Twitter, Facebook) to identify relevant tradeoffs.
Advisors/Committee Members: Ramakrishnan, Naren (committeechair), Lu, Chang Tien (committee member), Katz, E. Graham (committee member).
Subjects/Keywords: Textmining; Information Retrieval; Social Media
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Muthiah, S. (2014). Forecasting Protests by Detecting Future Time Mentions in News and Social Media. (Masters Thesis). Virginia Tech. Retrieved from http://hdl.handle.net/10919/49535
Chicago Manual of Style (16th Edition):
Muthiah, Sathappan. “Forecasting Protests by Detecting Future Time Mentions in News and Social Media.” 2014. Masters Thesis, Virginia Tech. Accessed March 07, 2021.
http://hdl.handle.net/10919/49535.
MLA Handbook (7th Edition):
Muthiah, Sathappan. “Forecasting Protests by Detecting Future Time Mentions in News and Social Media.” 2014. Web. 07 Mar 2021.
Vancouver:
Muthiah S. Forecasting Protests by Detecting Future Time Mentions in News and Social Media. [Internet] [Masters thesis]. Virginia Tech; 2014. [cited 2021 Mar 07].
Available from: http://hdl.handle.net/10919/49535.
Council of Science Editors:
Muthiah S. Forecasting Protests by Detecting Future Time Mentions in News and Social Media. [Masters Thesis]. Virginia Tech; 2014. Available from: http://hdl.handle.net/10919/49535

Virginia Tech
5.
Choudhry, Arjun.
Narrative Generation to Support Causal Exploration of Directed Graphs.
Degree: MS, Computer Science and Applications, 2020, Virginia Tech
URL: http://hdl.handle.net/10919/98670
► Narrative generation is the art of creating coherent snippets of text that cumulatively describe a succession of events, played across a period of time. These…
(more)
▼ Narrative generation is the art of creating coherent snippets of text that cumulatively describe
a succession of events, played across a period of time. These goals of narrative generation
are also shared by causal graphs – models that encapsulate inferences between the
nodes through the strength and polarity of the connecting edges. Causal graphs are an useful
mechanism to visualize changes propagating amongst nodes in the system. However, as
the graph starts addressing real-world actors and their interactions, it becomes increasingly
difficult to understand causal inferences between distant nodes, especially if the graph is
cyclic. Moreover, if the value of more than a single node is altered and the cumulative effect
of the change is to be perceived on a set of target nodes, it becomes extremely difficult to the
human eye. This thesis attempts to alleviate this problem by generating dynamic narratives
detailing the effect of one or more interventions on one or more target nodes, incorporating
time-series analysis, Wikification, and spike detection. Moreover, the narrative enhances the
user's understanding of the change propagation occurring in the system. The efficacy of
the narrative was further corroborated by the results of user studies, which concluded that
the presence of the narrative aids the user's confidence level, correctness, and speed while
exploring the causal network.
Advisors/Committee Members: Ramakrishnan, Naren (committeechair), Reddy, Chandan K. (committee member), Lu, Chang Tien (committee member).
Subjects/Keywords: Narrative Generation; Causal Exploration; Natural Language Processing; Graph Inference
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Choudhry, A. (2020). Narrative Generation to Support Causal Exploration of Directed Graphs. (Masters Thesis). Virginia Tech. Retrieved from http://hdl.handle.net/10919/98670
Chicago Manual of Style (16th Edition):
Choudhry, Arjun. “Narrative Generation to Support Causal Exploration of Directed Graphs.” 2020. Masters Thesis, Virginia Tech. Accessed March 07, 2021.
http://hdl.handle.net/10919/98670.
MLA Handbook (7th Edition):
Choudhry, Arjun. “Narrative Generation to Support Causal Exploration of Directed Graphs.” 2020. Web. 07 Mar 2021.
Vancouver:
Choudhry A. Narrative Generation to Support Causal Exploration of Directed Graphs. [Internet] [Masters thesis]. Virginia Tech; 2020. [cited 2021 Mar 07].
Available from: http://hdl.handle.net/10919/98670.
Council of Science Editors:
Choudhry A. Narrative Generation to Support Causal Exploration of Directed Graphs. [Masters Thesis]. Virginia Tech; 2020. Available from: http://hdl.handle.net/10919/98670

Virginia Tech
6.
Fiaux, Patrick O.
Solving Intelligence Analysis Problems using Biclusters.
Degree: MS, Computer Science and Applications, 2012, Virginia Tech
URL: http://hdl.handle.net/10919/31293
► Analysts must filter through an ever-growing amount of data to obtain information relevant to their investigations. Looking at every piece of information individually is in…
(more)
▼ Analysts must filter through an ever-growing amount of data to obtain information relevant to their investigations. Looking at every piece of information individually is in many cases not feasible; there is hence a growing need for new filtering tools and techniques to improve the analyst process with large datasets. We present MineVis â an analytics system that integrates biclustering algorithms and visual analytics tools in one seamless environment. The combination of biclusters and visual data glyphs in a visual analytics spatial environment enables a novel type of filtering. This design allows for rapid exploration and navigation across connected documents. Through a user study we conclude that our system has the potential to help analysts filter data by allowing them to i) form hypotheses before reading documents and subsequently ii) validating them by reading a reduced and focused set of documents.
Advisors/Committee Members: Pérez-Quiñones, Manuel A. (committee member), Ramakrishnan, Naren (committeecochair), North, Christopher L. (committeecochair).
Subjects/Keywords: Visual Analytics; Biclustering
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Fiaux, P. O. (2012). Solving Intelligence Analysis Problems using Biclusters. (Masters Thesis). Virginia Tech. Retrieved from http://hdl.handle.net/10919/31293
Chicago Manual of Style (16th Edition):
Fiaux, Patrick O. “Solving Intelligence Analysis Problems using Biclusters.” 2012. Masters Thesis, Virginia Tech. Accessed March 07, 2021.
http://hdl.handle.net/10919/31293.
MLA Handbook (7th Edition):
Fiaux, Patrick O. “Solving Intelligence Analysis Problems using Biclusters.” 2012. Web. 07 Mar 2021.
Vancouver:
Fiaux PO. Solving Intelligence Analysis Problems using Biclusters. [Internet] [Masters thesis]. Virginia Tech; 2012. [cited 2021 Mar 07].
Available from: http://hdl.handle.net/10919/31293.
Council of Science Editors:
Fiaux PO. Solving Intelligence Analysis Problems using Biclusters. [Masters Thesis]. Virginia Tech; 2012. Available from: http://hdl.handle.net/10919/31293

Virginia Tech
7.
Sethi, Iccha.
Clinician Decision Support Dashboard: Extracting value from Electronic Medical Records.
Degree: MS, Computer Science, 2012, Virginia Tech
URL: http://hdl.handle.net/10919/41894
► Medical records are rapidly being digitized to electronic medical records. Although Electronic Medical Records (EMRs) improve administration, billing, and logistics, an open research problem remains…
(more)
▼ Medical records are rapidly being digitized to electronic medical records. Although Electronic Medical Records (EMRs) improve administration, billing, and logistics, an open research problem remains as to how doctors can leverage EMRs to enhance patient care. This thesis describes a system that analyzes a patientâ s evolving EMR in context with available biomedical knowledge and the accumulated experience recorded in various text sources including the EMRs of other patients. The aim of the Clinician Decision Support (CDS) Dashboard is to provide interactive, automated, actionable EMR text-mining tools that help improve both the patient and clinical care staff experience. The CDS Dashboard, in a secure network, helps physicians find de-identified electronic medical records similar to their patient's medical record thereby aiding them in diagnosis, treatment, prognosis and outcomes. It is of particular value in cases involving complex disorders, and also allows physicians to explore relevant medical literature, recent research findings, clinical trials and medical cases. A pilot study done with medical students at the
Virginia Tech Carilion School of Medicine and Research Institute (VTC) showed that 89% of them found the CDS Dashboard to be useful in aiding patient care for doctors and 81% of them found it useful for aiding medical students pedagogically. Additionally, over 81% of the medical students found the tool user friendly. The CDS Dashboard is constructed using a multidisciplinary approach including: computer science, medicine, biomedical research, and human-machine interfacing. Our multidisciplinary approach combined with the high usability scores obtained from VTC indicated the CDS Dashboard has a high potential value to clinicians and medical students.
Advisors/Committee Members: Garner, Harold Ray (committeechair), Ramakrishnan, Naren (committee member), Feng, Wu-Chun (committee member).
Subjects/Keywords: electronic medical records; clinical decision support systems; text data mining; medical informatics
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Sethi, I. (2012). Clinician Decision Support Dashboard: Extracting value from Electronic Medical Records. (Masters Thesis). Virginia Tech. Retrieved from http://hdl.handle.net/10919/41894
Chicago Manual of Style (16th Edition):
Sethi, Iccha. “Clinician Decision Support Dashboard: Extracting value from Electronic Medical Records.” 2012. Masters Thesis, Virginia Tech. Accessed March 07, 2021.
http://hdl.handle.net/10919/41894.
MLA Handbook (7th Edition):
Sethi, Iccha. “Clinician Decision Support Dashboard: Extracting value from Electronic Medical Records.” 2012. Web. 07 Mar 2021.
Vancouver:
Sethi I. Clinician Decision Support Dashboard: Extracting value from Electronic Medical Records. [Internet] [Masters thesis]. Virginia Tech; 2012. [cited 2021 Mar 07].
Available from: http://hdl.handle.net/10919/41894.
Council of Science Editors:
Sethi I. Clinician Decision Support Dashboard: Extracting value from Electronic Medical Records. [Masters Thesis]. Virginia Tech; 2012. Available from: http://hdl.handle.net/10919/41894

Virginia Tech
8.
Mahendiran, Aravindan.
Automated Vocabulary Building for Characterizing and Forecasting Elections using Social Media Analytics.
Degree: MS, Computer Science and Applications, 2014, Virginia Tech
URL: http://hdl.handle.net/10919/25430
► Twitter has become a popular data source in the recent decade and garnered a significant amount of attention as a surrogate data source for many…
(more)
▼ Twitter has become a popular data source in the recent decade and garnered a significant amount of attention as a surrogate data source for many important forecasting problems.
Strong correlations have been observed between Twitter indicators and real-world trends spanning elections, stock markets, book sales, and flu outbreaks. A key ingredient to all methods that use Twitter for forecasting is to agree on a domain-specific vocabulary to track the pertinent tweets, which is typically provided by subject matter experts (SMEs).
The language used in Twitter drastically differs from other forms of online discourse,
such as news articles and blogs. It constantly evolves over time as users adopt popular hashtags to express their opinions. Thus, the vocabulary used by forecasting algorithms needs to be dynamic in nature and should capture emerging trends over time. This thesis proposes a novel unsupervised learning algorithm that builds a dynamic vocabulary using Probabilistic Soft Logic (PSL), a framework for probabilistic reasoning over relational domains. Using eight presidential elections from Latin America, we show how our query expansion methodology improves the performance of traditional election forecasting algorithms. Through this approach we demonstrate how we can achieve close to a two-fold increase in the number of tweets retrieved for predictions and a 36.90% reduction in prediction error.
Advisors/Committee Members: Ramakrishnan, Naren (committeechair), Ribbens, Calvin J. (committee member), Prakash, B. Aditya (committee member).
Subjects/Keywords: Election Forecasting; Twitter; Query Expansion; Social Group Modeling; Probabilistic Soft Logic
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Mahendiran, A. (2014). Automated Vocabulary Building for Characterizing and Forecasting Elections using Social Media Analytics. (Masters Thesis). Virginia Tech. Retrieved from http://hdl.handle.net/10919/25430
Chicago Manual of Style (16th Edition):
Mahendiran, Aravindan. “Automated Vocabulary Building for Characterizing and Forecasting Elections using Social Media Analytics.” 2014. Masters Thesis, Virginia Tech. Accessed March 07, 2021.
http://hdl.handle.net/10919/25430.
MLA Handbook (7th Edition):
Mahendiran, Aravindan. “Automated Vocabulary Building for Characterizing and Forecasting Elections using Social Media Analytics.” 2014. Web. 07 Mar 2021.
Vancouver:
Mahendiran A. Automated Vocabulary Building for Characterizing and Forecasting Elections using Social Media Analytics. [Internet] [Masters thesis]. Virginia Tech; 2014. [cited 2021 Mar 07].
Available from: http://hdl.handle.net/10919/25430.
Council of Science Editors:
Mahendiran A. Automated Vocabulary Building for Characterizing and Forecasting Elections using Social Media Analytics. [Masters Thesis]. Virginia Tech; 2014. Available from: http://hdl.handle.net/10919/25430

Virginia Tech
9.
Sundareisan, Shashidhar.
Making diffusion work for you: Classification sans text, finding culprits and filling missing values.
Degree: MS, Computer Science and Applications, 2014, Virginia Tech
URL: http://hdl.handle.net/10919/49678
► Can we find people infected with the flu virus even though they did not visit a doctor? Can the temporal features of a trending hashtag…
(more)
▼ Can we find people infected with the flu virus even though they did not visit a doctor?
Can the temporal features of a trending hashtag or a keyword indicate which topic it belongs to without any textual information?
Given a history of interactions between blogs and news websites, can we predict blogs posts/news websites that are not in the sample but talk about the ``the state of the economy'' in 2008?
These questions have two things in common: a network (social networks or human contact networks) and a virus (meme, keyword or the flu virus) diffusing over the network. We can think of interactions like memes, hashtags, influenza infections, computer viruses etc., as viruses spreading in a network. This treatment allows for the usage of epidemiologically inspired models to study or model these interactions. Understanding the complex propagation dynamics involved in information diffusion with the help of these models uncovers various non-trivial and interesting results.
In this thesis we propose (a) A fast and efficient algorithm NetFill, which can be used to find quantitatively and qualitatively correct infected nodes, not in the sample and finding the culprits and (b) A method, SansText that can be used to find out which topic a keyword/hashtag belongs to just by looking at the popularity graph of the keyword without textual analysis.
The results derived in this thesis can be used in various areas like epidemiology, news and protest detection, viral marketing and it can also be used to reduce sampling errors in graphs.
Advisors/Committee Members: Prakash, B. Aditya (committeechair), Batra, Dhruv (committee member), Ramakrishnan, Naren (committee member).
Subjects/Keywords: Data Mining; Social Networks; Epidemiology; Culprits; Missing nodes; Diffusion; Protests; Classification
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Sundareisan, S. (2014). Making diffusion work for you: Classification sans text, finding culprits and filling missing values. (Masters Thesis). Virginia Tech. Retrieved from http://hdl.handle.net/10919/49678
Chicago Manual of Style (16th Edition):
Sundareisan, Shashidhar. “Making diffusion work for you: Classification sans text, finding culprits and filling missing values.” 2014. Masters Thesis, Virginia Tech. Accessed March 07, 2021.
http://hdl.handle.net/10919/49678.
MLA Handbook (7th Edition):
Sundareisan, Shashidhar. “Making diffusion work for you: Classification sans text, finding culprits and filling missing values.” 2014. Web. 07 Mar 2021.
Vancouver:
Sundareisan S. Making diffusion work for you: Classification sans text, finding culprits and filling missing values. [Internet] [Masters thesis]. Virginia Tech; 2014. [cited 2021 Mar 07].
Available from: http://hdl.handle.net/10919/49678.
Council of Science Editors:
Sundareisan S. Making diffusion work for you: Classification sans text, finding culprits and filling missing values. [Masters Thesis]. Virginia Tech; 2014. Available from: http://hdl.handle.net/10919/49678
10.
Lokegaonkar, Sanket Avinash.
Continual Learning for Deep Dense Prediction.
Degree: MS, Computer Science and Applications, 2018, Virginia Tech
URL: http://hdl.handle.net/10919/83513
► Transferring a deep learning model from old tasks to a new one is known to suffer from the catastrophic forgetting effects. Such forgetting mechanism is…
(more)
▼ Transferring a deep learning model from old tasks to a new one is known to suffer from the catastrophic forgetting effects. Such forgetting mechanism is problematic as it does not allow us to accumulate knowledge sequentially and requires retaining and retraining on all the training data. Existing techniques for mitigating the abrupt performance degradation on previously trained tasks are mainly studied in the context of image classification. In this work, we present a simple method to alleviate catastrophic forgetting for pixel-wise dense labeling problems. We build upon the regularization technique using knowledge distillation to minimize the discrepancy between the posterior distribution of pixel class labels for old tasks predicted from 1) the original and 2) the updated networks. This technique, however, might fail in circumstances where the source and target distribution differ significantly. To handle the above scenario, we further propose an improvement to the distillation based approach by adding adaptive l2-regularization depending upon the per-parameter importance to the older tasks. We train our model on FCN8s, but our training can be generalized to stronger models like DeepLab, PSPNet, etc. Through extensive evaluation and comparisons, we show that our technique can incrementally train dense prediction models for novel object classes, different visual domains, and different visual tasks.
Advisors/Committee Members: Ramakrishnan, Naren (committeechair), Huang, Jia-Bin (committeechair), Huang, Bert (committee member).
Subjects/Keywords: Computer Vision; Continual Learning; Image Segmentation; Dense Prediction
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Lokegaonkar, S. A. (2018). Continual Learning for Deep Dense Prediction. (Masters Thesis). Virginia Tech. Retrieved from http://hdl.handle.net/10919/83513
Chicago Manual of Style (16th Edition):
Lokegaonkar, Sanket Avinash. “Continual Learning for Deep Dense Prediction.” 2018. Masters Thesis, Virginia Tech. Accessed March 07, 2021.
http://hdl.handle.net/10919/83513.
MLA Handbook (7th Edition):
Lokegaonkar, Sanket Avinash. “Continual Learning for Deep Dense Prediction.” 2018. Web. 07 Mar 2021.
Vancouver:
Lokegaonkar SA. Continual Learning for Deep Dense Prediction. [Internet] [Masters thesis]. Virginia Tech; 2018. [cited 2021 Mar 07].
Available from: http://hdl.handle.net/10919/83513.
Council of Science Editors:
Lokegaonkar SA. Continual Learning for Deep Dense Prediction. [Masters Thesis]. Virginia Tech; 2018. Available from: http://hdl.handle.net/10919/83513

Virginia Tech
11.
Cho, Yong Ju.
Algorithms for Reconstructing and Reasoning about Chemical Reaction Networks.
Degree: PhD, Computer Science and Applications, 2013, Virginia Tech
URL: http://hdl.handle.net/10919/19244
► Recent advances in systems biology have uncovered detailed mechanisms of biological processes such as the cell cycle, circadian rhythms, and signaling pathways. These mechanisms are…
(more)
▼ Recent advances in systems biology have uncovered detailed mechanisms of biological processes such as the cell cycle, circadian rhythms, and signaling pathways. These mechanisms are modeled by chemical reaction networks (CRNs) which are typically simulated by converting to ordinary differential equations (ODEs), so that the goal is to closely reproduce the observed quantitative and qualitative behaviors of the modeled process. This thesis proposes two algorithmic problems related to the construction and comprehension of CRN models. The first problem focuses on reconstructing CRNs from given time series. Given multivariate time course data obtained by perturbing a given CRN, how can we systematically deduce the interconnections between the species of the network? We demonstrate how this problem can be modeled as, first, one of uncovering conditional independence relationships using buffering experiments and, second, of determining the properties of the individual chemical reactions. Experimental results demonstrate the effectiveness of our approach on both synthetic and real CRNs. The second problem this work focuses on is to aid in network comprehension, i.e., to understand the motifs underlying complex dynamical behaviors of CRNs. Specifically, we focus on bistability – an important dynamical property of a CRN – and propose algorithms to identify the core structures responsible for conferring bistability. The approach we take is to systematically infer the instability causing structures (ICSs) of a CRN and use machine learning techniques to relate properties of the CRN to the presence of such ICSs. This work has the potential to aid in not just network comprehension but also model simplification, by helping reduce the complexity of known bistable systems.
Advisors/Committee Members: Ramakrishnan, Naren (committeechair), Vullikanti, Anil Kumar S. (committee member), Murali, T. M. (committee member), Bevan, David R. (committee member), Cao, Yang (committee member).
Subjects/Keywords: Chemical reaction networks; bistability; data mining; time series modeling
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Cho, Y. J. (2013). Algorithms for Reconstructing and Reasoning about Chemical Reaction Networks. (Doctoral Dissertation). Virginia Tech. Retrieved from http://hdl.handle.net/10919/19244
Chicago Manual of Style (16th Edition):
Cho, Yong Ju. “Algorithms for Reconstructing and Reasoning about Chemical Reaction Networks.” 2013. Doctoral Dissertation, Virginia Tech. Accessed March 07, 2021.
http://hdl.handle.net/10919/19244.
MLA Handbook (7th Edition):
Cho, Yong Ju. “Algorithms for Reconstructing and Reasoning about Chemical Reaction Networks.” 2013. Web. 07 Mar 2021.
Vancouver:
Cho YJ. Algorithms for Reconstructing and Reasoning about Chemical Reaction Networks. [Internet] [Doctoral dissertation]. Virginia Tech; 2013. [cited 2021 Mar 07].
Available from: http://hdl.handle.net/10919/19244.
Council of Science Editors:
Cho YJ. Algorithms for Reconstructing and Reasoning about Chemical Reaction Networks. [Doctoral Dissertation]. Virginia Tech; 2013. Available from: http://hdl.handle.net/10919/19244

Virginia Tech
12.
Zhao, Liang.
Spatio-temporal Event Detection and Forecasting in Social Media.
Degree: PhD, Computer Science and Applications, 2016, Virginia Tech
URL: http://hdl.handle.net/10919/81904
► Nowadays, knowledge discovery on social media is attracting growing interest. Social media has become more than a communication tool, effectively functioning as a social sensor…
(more)
▼ Nowadays, knowledge discovery on social media is attracting growing interest. Social media has become more than a communication tool, effectively functioning as a social sensor for our society.
This dissertation focuses on the development of methods for social media-based spatiotemporal event detection and forecasting for a variety of event topics and assumptions. Five methods are proposed, namely dynamic query expansion for event detection, a generative framework for event forecasting, multi-task learning for spatiotemporal event forecasting, multi-source spatiotemporal event forecasting, and deep learning based epidemic modeling for forecasting influenza outbreaks. For the first of these methods, existing solutions for spatiotemporal event detection are mostly supervised and lack the flexibility to handle the dynamic keywords used in social media. The contributions of this work are: (1) Develop an unsupervised framework; (2) Design a novel dynamic query expansion
method; and (3) Propose an innovative local modularity spatial scan
algorithm.
For the second of these methods, traditional solutions are unable to capture the spatiotemporal context, model mixed-type observations, or utilize prior geographical knowledge. The contributions
of this work include: (1) Propose a novel generative model for spatial event
forecasting; (2) Design an effective algorithm for model parameter
inference; and (3) Develop a new sequence likelihood calculation
method. For the third method, traditional solutions cannot deal with spatial heterogeneity or handle the dynamics of social media data effectively.
This work's contributions include: (1) Formulate a multi-task learning framework for event forecasting; (2) simultaneously model static and dynamic terms; and (3) Develop efficient parameter optimization algorithms.
For the fourth method, traditional multi-source solutions typically fail to consider the geographical hierarchy or cope with incomplete data blocks among different sources.
The contributions here are: (1) Design a framework for event forecasting based
on hierarchical multi-source indicators; (2) Propose a robust model for geo-hierarchical feature
selection; and (3) Develop an efficient algorithm for model parameter optimization.
For the last method, existing work on epidemic modeling either cannot ensure timeliness, or cannot characterize the underlying epidemic propagation mechanisms. The contributions of this work include: (1) Propose a novel integrated framework for computational epidemiology and social media mining; (2) Develop a semi-supervised multilayer perceptron for mining epidemic features; and (3) Design an online training algorithm.
Advisors/Committee Members: Lu, Chang-Tien (committeechair), Ramakrishnan, Naren (committee member), Chen, Ing Ray (committee member), Ye, Jieping (committee member), Chen, Jiangzhuo (committee member).
Subjects/Keywords: event detection; event forecasting; social media
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Zhao, L. (2016). Spatio-temporal Event Detection and Forecasting in Social Media. (Doctoral Dissertation). Virginia Tech. Retrieved from http://hdl.handle.net/10919/81904
Chicago Manual of Style (16th Edition):
Zhao, Liang. “Spatio-temporal Event Detection and Forecasting in Social Media.” 2016. Doctoral Dissertation, Virginia Tech. Accessed March 07, 2021.
http://hdl.handle.net/10919/81904.
MLA Handbook (7th Edition):
Zhao, Liang. “Spatio-temporal Event Detection and Forecasting in Social Media.” 2016. Web. 07 Mar 2021.
Vancouver:
Zhao L. Spatio-temporal Event Detection and Forecasting in Social Media. [Internet] [Doctoral dissertation]. Virginia Tech; 2016. [cited 2021 Mar 07].
Available from: http://hdl.handle.net/10919/81904.
Council of Science Editors:
Zhao L. Spatio-temporal Event Detection and Forecasting in Social Media. [Doctoral Dissertation]. Virginia Tech; 2016. Available from: http://hdl.handle.net/10919/81904

Virginia Tech
13.
Ramesh, Bharath.
Samhita: Virtual Shared Memory for Non-Cache-Coherent Systems.
Degree: PhD, Computer Science and Applications, 2013, Virginia Tech
URL: http://hdl.handle.net/10919/23687
► Among the key challenges of computing today are the emergence of many-core architectures and the resulting need to effectively exploit explicit parallelism. Indeed, programmers are…
(more)
▼ Among the key challenges of computing today are the emergence of many-core architectures and the resulting need to effectively exploit explicit parallelism. Indeed, programmers are striving to exploit parallelism across virtually all platforms and application domains. The shared memory programming model effectively addresses the parallelism needs of mainstream computing (e.g., portable devices, laptops, desktop, servers), giving rise to a growing ecosystem of shared memory parallel techniques, tools, and design practices. However, to meet the extreme demands for processing and memory of critical problem domains, including scientific computation and data intensive computing, computing researchers continue to innovate in the high-end distributed memory architecture space to create cost-effective and scalable solutions. The emerging distributed memory architectures are both highly parallel and increasingly heterogeneous. As a result, they do not present the programmer with a cache-coherent view of shared memory, either across the entire system or even at the level of an individual node. Furthermore, it remains an open research question which programming model is best for the heterogeneous platforms that feature multiple traditional processors along with accelerators or co-processors. Hence, we have two contradicting trends. On the one hand, programming convenience and the presence of shared memory call for a shared memory programming model across the entire heterogeneous system. On the other hand, increasingly parallel and heterogeneous nodes lacking cache-coherent shared memory call for a message passing model. In this dissertation, we present the architecture of Samhita, a distributed shared memory (DSM) system that addresses the challenge of providing shared memory for non-cache-coherent systems. We define regional consistency (RegC), the memory consistency model implemented by Samhita. We present performance results for Samhita on several computational kernels and benchmarks, on both cluster supercomputers and heterogeneous systems. The results demonstrate the promising potential of Samhita and the RegC model, and include the largest scale evaluation by a significant margin for any DSM system reported to date.
Advisors/Committee Members: Varadarajan, Srinidhi (committeechair), Ribbens, Calvin J. (committeechair), Jones, Mark T. (committee member), Ramakrishnan, Naren (committee member), Kafura, Dennis G. (committee member).
Subjects/Keywords: Distributed Shared Memory; Virtual Shared Memory; Memory Consistency
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Ramesh, B. (2013). Samhita: Virtual Shared Memory for Non-Cache-Coherent Systems. (Doctoral Dissertation). Virginia Tech. Retrieved from http://hdl.handle.net/10919/23687
Chicago Manual of Style (16th Edition):
Ramesh, Bharath. “Samhita: Virtual Shared Memory for Non-Cache-Coherent Systems.” 2013. Doctoral Dissertation, Virginia Tech. Accessed March 07, 2021.
http://hdl.handle.net/10919/23687.
MLA Handbook (7th Edition):
Ramesh, Bharath. “Samhita: Virtual Shared Memory for Non-Cache-Coherent Systems.” 2013. Web. 07 Mar 2021.
Vancouver:
Ramesh B. Samhita: Virtual Shared Memory for Non-Cache-Coherent Systems. [Internet] [Doctoral dissertation]. Virginia Tech; 2013. [cited 2021 Mar 07].
Available from: http://hdl.handle.net/10919/23687.
Council of Science Editors:
Ramesh B. Samhita: Virtual Shared Memory for Non-Cache-Coherent Systems. [Doctoral Dissertation]. Virginia Tech; 2013. Available from: http://hdl.handle.net/10919/23687

Virginia Tech
14.
Cadena, Jose Eduardo.
Finding Interesting Subgraphs with Guarantees.
Degree: PhD, Computer Science and Applications, 2018, Virginia Tech
URL: http://hdl.handle.net/10919/81960
► Networks are a mathematical abstraction of the interactions between a set of entities, with extensive applications in social science, epidemiology, bioinformatics, and cybersecurity, among others.…
(more)
▼ Networks are a mathematical abstraction of the interactions between a set of entities, with extensive applications in social science, epidemiology, bioinformatics, and cybersecurity, among others. There are many fundamental problems when analyzing network data, such as anomaly detection, dense subgraph mining, motif finding, information diffusion, and epidemic spread. A common underlying task in all these problems is finding an "interesting subgraph"; that is, finding a part of the graph – usually small relative to the whole – that optimizes a score function and has some property of interest, such as connectivity or a minimum density.
Finding subgraphs that satisfy common constraints of interest, such as the ones above, is computationally hard in general, and state-of-the-art algorithms for many problems in network analysis are heuristic in nature. These methods are fast and usually easy to implement. However, they come with no theoretical guarantees on the quality of the solution, which makes it difficult to assess how the discovered subgraphs compare to an optimal solution, which in turn affects the data mining task at hand. For instance, in anomaly detection, solutions with low anomaly score lead to sub-optimal detection power. On the other end of the spectrum, there have been significant advances on approximation algorithms for these challenging graph problems in the theoretical computer science community. However, these algorithms tend to be slow, difficult to implement, and they do not scale to the large datasets that are common nowadays.
The goal of this dissertation is developing scalable algorithms with theoretical guarantees for various network analysis problems, where the underlying task is to find subgraphs with constraints. We find interesting subgraphs with guarantees by adapting techniques from parameterized complexity, convex optimization, and submodularity optimization. These techniques are well-known in the algorithm design literature, but they lead to slow and impractical algorithms. One unifying theme in the problems that we study is that our methods are scalable without sacrificing the theoretical guarantees of these algorithm design techniques. We accomplish this combination of scalability and rigorous bounds by exploiting properties of the problems we are trying to optimize, decomposing or compressing the input graph to a manageable size, and parallelization.
We consider problems on network analysis for both static and dynamic network models. And we illustrate the power of our methods in applications, such as public health, sensor data analysis, and event detection using social media data.
Advisors/Committee Members: Vullikanti, Anil Kumar S. (committeechair), Marathe, Madhav Vishnu (committee member), Lu, Chang Tien (committee member), Konjevod, Goran (committee member), Ramakrishnan, Naren (committee member).
Subjects/Keywords: Graph Mining; Data Mining; Graph Algorithms; Anomaly Detection; Finding Subgraphs; Parameterized Complexity; Distributed Algorithms
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Cadena, J. E. (2018). Finding Interesting Subgraphs with Guarantees. (Doctoral Dissertation). Virginia Tech. Retrieved from http://hdl.handle.net/10919/81960
Chicago Manual of Style (16th Edition):
Cadena, Jose Eduardo. “Finding Interesting Subgraphs with Guarantees.” 2018. Doctoral Dissertation, Virginia Tech. Accessed March 07, 2021.
http://hdl.handle.net/10919/81960.
MLA Handbook (7th Edition):
Cadena, Jose Eduardo. “Finding Interesting Subgraphs with Guarantees.” 2018. Web. 07 Mar 2021.
Vancouver:
Cadena JE. Finding Interesting Subgraphs with Guarantees. [Internet] [Doctoral dissertation]. Virginia Tech; 2018. [cited 2021 Mar 07].
Available from: http://hdl.handle.net/10919/81960.
Council of Science Editors:
Cadena JE. Finding Interesting Subgraphs with Guarantees. [Doctoral Dissertation]. Virginia Tech; 2018. Available from: http://hdl.handle.net/10919/81960

Virginia Tech
15.
Shukla, Manu.
Algorithmic Distribution of Applied Learning on Big Data.
Degree: PhD, Computer Science and Applications, 2020, Virginia Tech
URL: http://hdl.handle.net/10919/100603
► Distribution of Machine Learning and Graph algorithms is commonly performed by modeling the core algorithm in the same way as the sequential technique except implemented…
(more)
▼ Distribution of Machine Learning and Graph algorithms is commonly performed by modeling the core algorithm in the same way as the sequential technique except implemented on distributed framework. This approach is satisfactory in very few cases, such as depth-first search and subgraph enumerations in graphs, k nearest neighbors, and few additional common methods. These techniques focus on stitching the results from smaller data or compute chunks as the best possible way to have the outcome as close to the sequential results on entire data as possible. This approach is not feasible in numerous kernel, matrix, optimization, graph, and other techniques where the algorithm needs to perform exhaustive computations on all the data during execution. In this work, we propose key-value pair based distribution techniques that are exhaustive and widely applicable to statistical machine learning algorithms along with matrix, graph, and time series based operations. The crucial difference with previously proposed techniques is that all operations are modeled as key-value pair based fine or coarse-grained steps. This allows flexibility in distribution with no compounding error in each step. The distribution is applicable not only in robust disk-based frameworks but also in in-memory based systems without significant changes. Key-value pair based techniques also provide the ability to generate the same result as sequential techniques with no edge or overlap effects in structures such as graphs or matrices to resolve.
Advisors/Committee Members: Lu, Chang Tien (committeechair), Ramakrishnan, Naren (committee member), Chen, Ing Ray (committee member), Xuan, Jianhua (committee member), Zhang, Jianping (committee member).
Subjects/Keywords: Big Data; Distributed Machine Learning; In-Memory Distribution; Graph Distribution
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Shukla, M. (2020). Algorithmic Distribution of Applied Learning on Big Data. (Doctoral Dissertation). Virginia Tech. Retrieved from http://hdl.handle.net/10919/100603
Chicago Manual of Style (16th Edition):
Shukla, Manu. “Algorithmic Distribution of Applied Learning on Big Data.” 2020. Doctoral Dissertation, Virginia Tech. Accessed March 07, 2021.
http://hdl.handle.net/10919/100603.
MLA Handbook (7th Edition):
Shukla, Manu. “Algorithmic Distribution of Applied Learning on Big Data.” 2020. Web. 07 Mar 2021.
Vancouver:
Shukla M. Algorithmic Distribution of Applied Learning on Big Data. [Internet] [Doctoral dissertation]. Virginia Tech; 2020. [cited 2021 Mar 07].
Available from: http://hdl.handle.net/10919/100603.
Council of Science Editors:
Shukla M. Algorithmic Distribution of Applied Learning on Big Data. [Doctoral Dissertation]. Virginia Tech; 2020. Available from: http://hdl.handle.net/10919/100603

Virginia Tech
16.
Gad, Samah Hossam Aldin.
Expressive Forms of Topic Modeling to Support Digital Humanities.
Degree: PhD, Computer Science and Applications, 2014, Virginia Tech
URL: http://hdl.handle.net/10919/65145
► Unstructured textual data is rapidly growing and practitioners from diverse disciplines are expe- riencing a need to structure this massive amount of data. Topic modeling…
(more)
▼ Unstructured textual data is rapidly growing and practitioners from diverse disciplines are expe- riencing a need to structure this massive amount of data. Topic modeling is one of the most used techniques for analyzing and understanding the latent structure of large text collections. Probabilistic graphical models are the main building block behind topic modeling and they are used to express assumptions about the latent structure of complex data. This dissertation address four problems related to drawing structure from high dimensional data and improving the text mining process.
Studying the ebb and flow of ideas during critical events, e.g. an epidemic, is very important to understanding the reporting or coverage around the event or the impact of the event on the society. This can be accomplished by capturing the dynamic evolution of topics underlying a text corpora. We propose an approach to this problem by identifying segment boundaries that detect significant shifts of topic coverage. In order to identify segment boundaries, we embed a temporal segmentation algorithm around a topic modeling algorithm to capture such significant shifts of coverage. A key advantage of our approach is that it integrates with existing topic modeling algorithms in a transparent manner; thus, more sophisticated algorithms can be readily plugged in as research in topic modeling evolves. We apply this algorithm to studying data from the iNeighbors system, and apply our algorithm to six neighborhoods (three economically advantaged and three economically disadvantaged) to evaluate differences in conversations for statistical significance. Our findings suggest that social technologies may afford opportunities for democratic engagement in contexts that are otherwise less likely to support opportunities for deliberation and participatory democracy. We also examine the progression in coverage of historical newspapers about the 1918 influenza epidemic by applying our algorithm on the Washington Times archives. The algorithm is successful in identifying important qualitative features of news coverage of the pandemic.
Visually convincing results of data mining algorithms and models is crucial to analyzing and driving conclusions from the algorithms. We develop ThemeDelta, a visual analytics system for extracting and visualizing temporal trends, clustering, and reorganization in time-indexed textual datasets. ThemeDelta is supported by a dynamic temporal segmentation algorithm that integrates with topic modeling algorithms to identify change points where significant shifts in topics occur. This algorithm detects not only the clustering and associations of keywords in a time period, but also their convergence into topics (groups of keywords) that may later diverge into new groups. The visual representation of ThemeDelta uses sinuous, variable-width lines to show this evolution on a timeline, utilizing color for categories, and line width for keyword strength. We demonstrate how interaction with ThemeDelta helps capture the rise and fall of…
Advisors/Committee Members: Ramakrishnan, Naren (committeechair), Tilevich, Eli (committee member), Elmqvist, L. Niklas (committee member), Kavanaugh, Andrea L. (committee member), North, Christopher L. (committee member).
Subjects/Keywords: Topic Modeling; LDA; Segmentation
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Gad, S. H. A. (2014). Expressive Forms of Topic Modeling to Support Digital Humanities. (Doctoral Dissertation). Virginia Tech. Retrieved from http://hdl.handle.net/10919/65145
Chicago Manual of Style (16th Edition):
Gad, Samah Hossam Aldin. “Expressive Forms of Topic Modeling to Support Digital Humanities.” 2014. Doctoral Dissertation, Virginia Tech. Accessed March 07, 2021.
http://hdl.handle.net/10919/65145.
MLA Handbook (7th Edition):
Gad, Samah Hossam Aldin. “Expressive Forms of Topic Modeling to Support Digital Humanities.” 2014. Web. 07 Mar 2021.
Vancouver:
Gad SHA. Expressive Forms of Topic Modeling to Support Digital Humanities. [Internet] [Doctoral dissertation]. Virginia Tech; 2014. [cited 2021 Mar 07].
Available from: http://hdl.handle.net/10919/65145.
Council of Science Editors:
Gad SHA. Expressive Forms of Topic Modeling to Support Digital Humanities. [Doctoral Dissertation]. Virginia Tech; 2014. Available from: http://hdl.handle.net/10919/65145
17.
Ning, Yue.
Modeling Information Precursors for Event Forecasting.
Degree: PhD, Computer Science and Applications, 2018, Virginia Tech
URL: http://hdl.handle.net/10919/84486
► This dissertation is focused on the design and evaluation of machine learning algorithms for modeling information precursors for use in event modeling and forecasting. Given…
(more)
▼ This dissertation is focused on the design and evaluation of machine learning algorithms for modeling information precursors for use in event modeling and forecasting. Given an online stream of information (e.g., news articles, social media postings), how can we model and understand how events unfold, how they influence each other, and how they can act as determinants of future events?
First, we study information reciprocity in joint news and social media streams to capture how events evolve. We present an online story chaining algorithm which links related news articles together in a low complexity manner and a mechanism to classify the interaction between a news article and social media (Twitter) activity into four categories. This is followed by identification of major information sources for a given story chain based on the interaction states of news and Twitter. We demonstrate through this study that Twitter as a social network platform serves as a fast way to draw attention from the public to many social events such as sports, whereas news media is quicker to report events regarding political, economical, and business issues.
In the second problem we focus on forecasting and understanding large-scale societal events from open source datasets. Our goal here is to develop algorithms that can automatically reconstruct precursors to societal events. We develop a nested framework involving multi-instance learning for mining precursors by harnessing temporal constraints. We evaluate the proposed model for various event categories in multiple geo-locations with comprehensive experiments.
Next, to reinforce the fact that events are typically inter-connected and influenced by events in other locations, we develop an approach that creates personalized models for exploring spatio-temporal event correlations; this approach also helps tackle data/label sparsity problems across geolocations.
Finally, this dissertation demonstrates how our algorithms can be used to study key characteristics of mass events such as protests. Some mass gatherings run the risk of turning violent, causing damage to both property and people. We propose a tailored solution for uncovering triggers from both news media and social media for violent event analysis.
This work was partially supported by the Intelligence Advanced Research Projects Activity (IARPA) via Department of Interior National Business Center (DoI/NBC) contract number D12PC000337, the Office of Naval Research under contract N00014-16-C-1054, and the U.S. Department of Homeland Security under Grant Award Number 2017-ST-061-CINA01. The US Government is authorized to reproduce and distribute reprints for Governmental purposes notwithstanding any copyright annotation thereon. The views and conclusions contained herein are those of the author and should not be interpreted as necessarily representing the official policies or endorsements, either expressed or implied, of NSF, IARPA, DoI/NBC, or the US Government.
Advisors/Committee Members: Ramakrishnan, Naren (committeechair), Reddy, Chandan K. (committee member), North, Christopher L. (committee member), Lu, Chang Tien (committee member), Rangwala, Huzefa (committee member).
Subjects/Keywords: Information Reciprocity; Precursor Learning; Event Modeling; Event Forecasting
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Ning, Y. (2018). Modeling Information Precursors for Event Forecasting. (Doctoral Dissertation). Virginia Tech. Retrieved from http://hdl.handle.net/10919/84486
Chicago Manual of Style (16th Edition):
Ning, Yue. “Modeling Information Precursors for Event Forecasting.” 2018. Doctoral Dissertation, Virginia Tech. Accessed March 07, 2021.
http://hdl.handle.net/10919/84486.
MLA Handbook (7th Edition):
Ning, Yue. “Modeling Information Precursors for Event Forecasting.” 2018. Web. 07 Mar 2021.
Vancouver:
Ning Y. Modeling Information Precursors for Event Forecasting. [Internet] [Doctoral dissertation]. Virginia Tech; 2018. [cited 2021 Mar 07].
Available from: http://hdl.handle.net/10919/84486.
Council of Science Editors:
Ning Y. Modeling Information Precursors for Event Forecasting. [Doctoral Dissertation]. Virginia Tech; 2018. Available from: http://hdl.handle.net/10919/84486

Virginia Tech
18.
Afzalan, Milad.
Data-driven customer energy behavior characterization for distributed energy management.
Degree: PhD, Civil Engineering, 2020, Virginia Tech
URL: http://hdl.handle.net/10919/99210
► Buildings account for more than 70% of electricity consumption in the U.S., in which more than 40% is associated with the residential sector. During recent…
(more)
▼ Buildings account for more than 70% of electricity consumption in the U.S., in which more than 40% is associated with the residential sector. During recent years, with the advancement in Information and Communication Technologies (ICT) and the proliferation of data from consumers and devices, data-driven methods have received increasing attention for improving the energy-efficiency initiatives.
With the increased adoption of renewable and distributed resources in buildings (e.g., solar panels and storage systems), an important aspect to improve the efficiency by matching the demand and supply is to add flexibility to the energy consumption patterns (e.g., trying to match the times of high energy demand from buildings and renewable generation). In this dissertation, we introduced data-driven solutions using the historical energy data of consumers with application to the flexibility provision. Specific problems include: (1) introducing a ranking score for buildings in a community to detect the candidates that can provide higher energy saving in the future events, (2) estimating the operation time of major energy-intensive appliances by analyzing the whole-house energy data using machine learning models, and (3) investigating the potential of achieving demand-supply balance in communities of buildings under the impact of different levels of solar panels, battery systems, and occupants energy consumption behavior.
In the first study, a ranking score was introduced that analyzes the historical energy data from major loads such as washing machines and dishwashers in individual buildings and group the buildings based on their potential for energy saving at different times of the day. The proposed approach was investigated for real data of 400 buildings. The results for EV, washing machine, dishwasher, dryer, and AC show that the approach could successfully rank buildings by their demand reduction potential at critical times of the day.
In the second study, machine learning (ML) frameworks were introduced to identify the times of the day that major energy-intensive appliances are operated. To do so, the input of the model was considered as the main circuit electricity information of the whole building either in lower-resolution data (smart meter data) or higher-resolution data (60Hz). Unlike previous studies that required considerable efforts for training the model (e.g, defining specific parameters for mathematical formulation of the appliance model), the aim was to develop data-driven approaches to learn the model either from the same building itself or from the neighbors that have appliance-level metering devices. For the lower-resolution data, the objective was that, if a few samples of buildings have already access to plug meters (i.e., appliance level data), one could estimate the operation time of major appliances through ML models by matching the energy behavior of the buildings, reflected in their smart meter information, with the ones in the neighborhood that have similar behaviors. For the higher-resolution data,…
Advisors/Committee Members: Jazizadeh Karimi, Farrokh (committeechair), de la Garza, Jesus M. (committee member), Huang, Bert (committee member), Ramakrishnan, Naren (committee member).
Subjects/Keywords: Distributed energy management; Smart grid; Machine learning; Human-building interaction; Segmentation.
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Afzalan, M. (2020). Data-driven customer energy behavior characterization for distributed energy management. (Doctoral Dissertation). Virginia Tech. Retrieved from http://hdl.handle.net/10919/99210
Chicago Manual of Style (16th Edition):
Afzalan, Milad. “Data-driven customer energy behavior characterization for distributed energy management.” 2020. Doctoral Dissertation, Virginia Tech. Accessed March 07, 2021.
http://hdl.handle.net/10919/99210.
MLA Handbook (7th Edition):
Afzalan, Milad. “Data-driven customer energy behavior characterization for distributed energy management.” 2020. Web. 07 Mar 2021.
Vancouver:
Afzalan M. Data-driven customer energy behavior characterization for distributed energy management. [Internet] [Doctoral dissertation]. Virginia Tech; 2020. [cited 2021 Mar 07].
Available from: http://hdl.handle.net/10919/99210.
Council of Science Editors:
Afzalan M. Data-driven customer energy behavior characterization for distributed energy management. [Doctoral Dissertation]. Virginia Tech; 2020. Available from: http://hdl.handle.net/10919/99210

Virginia Tech
19.
Chen, Zhiqian.
Graph Neural Networks: Techniques and Applications.
Degree: PhD, Computer Science and Applications, 2020, Virginia Tech
URL: http://hdl.handle.net/10919/99848
► Graph data is pervasive throughout most fields, including pandemic spread network, social network, transportation roads, internet, and chemical structure. Therefore, the applications modeled by graph…
(more)
▼ Graph data is pervasive throughout most fields, including pandemic spread network, social network, transportation roads, internet, and chemical structure. Therefore, the applications modeled by graph benefit people's everyday life, and graph mining derives insightful opinions from this complex topology.
This paper investigates an emerging technique called graph neural newton (GNNs), which is designed for graph data mining.
There are two primary goals of this thesis paper: (1) understanding the GNNs in theory, and (2) apply GNNs for unexplored and values real-world scenarios.
For the first goal, we investigate spectral theory and approximation theory, and a unified framework is proposed to summarize most GNNs. This direction provides a possibility that existing or newly proposed works can be compared, and the actual process can be measured. Specifically, this result demonstrates that most GNNs are either an approximation for a function of graph adjacency matrix or a function of eigenvalues. Different types of approximations are analyzed in terms of physical meaning, and the advantages and disadvantages are offered. Beyond that, we proposed a new optimization for a highly accurate but low efficient approximation. Evaluation of synthetic data proves its theoretical power, and the tests on two transportation networks show its potentials in real-world graphs.
For the second goal, the circuit is selected as a novel application since it is crucial, but there are few works. Specifically, we focus on a security problem, a high-value real-world problem in industry companies such as Nvidia, Apple, AMD, etc. This problem is defined as a circuit graph as apply GNN to learn the representation regarding the prediction target such as attach runtime. Experiment on several benchmark circuits shows its superiority on effectiveness and efficacy compared with competitive baselines.
This paper provides exploration in theory and application with GNNs, which shows a promising direction for graph mining tasks. Its potentials also provide a wide range of innovations in graph-based problems.
Advisors/Committee Members: Lu, Chang Tien (committeechair), Chen, Feng (committee member), Chen, Ing Ray (committee member), Ramakrishnan, Naren (committee member), Haghighat, Alireza (committee member).
Subjects/Keywords: graph neural network; graph mining; approximation theory; spectral graph; circuit deobfuscation
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Chen, Z. (2020). Graph Neural Networks: Techniques and Applications. (Doctoral Dissertation). Virginia Tech. Retrieved from http://hdl.handle.net/10919/99848
Chicago Manual of Style (16th Edition):
Chen, Zhiqian. “Graph Neural Networks: Techniques and Applications.” 2020. Doctoral Dissertation, Virginia Tech. Accessed March 07, 2021.
http://hdl.handle.net/10919/99848.
MLA Handbook (7th Edition):
Chen, Zhiqian. “Graph Neural Networks: Techniques and Applications.” 2020. Web. 07 Mar 2021.
Vancouver:
Chen Z. Graph Neural Networks: Techniques and Applications. [Internet] [Doctoral dissertation]. Virginia Tech; 2020. [cited 2021 Mar 07].
Available from: http://hdl.handle.net/10919/99848.
Council of Science Editors:
Chen Z. Graph Neural Networks: Techniques and Applications. [Doctoral Dissertation]. Virginia Tech; 2020. Available from: http://hdl.handle.net/10919/99848

Virginia Tech
20.
Chhabra, Meenal.
Studies in the Algorithmic Pricing of Information Goods and Services.
Degree: PhD, Computer Science and Applications, 2014, Virginia Tech
URL: http://hdl.handle.net/10919/25874
► This thesis makes a contribution to the algorithmic pricing literature by proposing and analyzing techniques for automatically pricing digital and information goods in order to…
(more)
▼ This thesis makes a contribution to the algorithmic pricing literature by proposing and analyzing techniques for automatically pricing digital and information goods in order to maximize profit in different settings. We also consider the effect on social welfare when agents use these pricing algorithms. The digital goods considered in this thesis are electronic commodities that have zero marginal cost and unlimited supply e.g., iTunes apps. On the other hand, an information good is an entity that bridges the knowledge gap about a product between the consumer and the seller when the consumer cannot assess the utility of owning that product accurately e.g., Carfax provides vehicle history and can be used by a potential buyer of a vehicle to get information about the vehicle.
With the emergence of e-commerce, the customers are increasingly price sensitive and search for the best opportunies anywhere. It is almost impossible to manually adjust the prices with rapidly changing demand and competition. Moreover, online shopping platforms also enable sellers to change prices easily and quickly as opposed to updating price labels in brick and mortar stores so they can also experiment with different prices to maximize their revenue. Therefore, e-marketplaces have created a need for designing sophisticated practical algorithms for pricing. This need has evoked interest in algorithmic pricing in the computer science, economics, and operations research communities.
In this thesis, we seek solutions to the following two algorithmic pricing problems:
(1) In the first problem, a seller launches a new digital good (this good has unlimited supply and zero marginal cost) but is unaware of its demand in a posted-price setting (i.e., the seller quotes a price to a buyer, and the buyer makes a decision depending on her willingness to pay); we look at the question – how should the seller set the prices in order to maximize her infinite horizon discounted revenue? This is a classic problem of learning while earning. We propose a few algorithms for this problem and demonstrate their effectiveness using rigorous empirical tests on both synthetic datasets and real-world datasets from auctions at eBay and Yahoo!, and ratings on jokes from Jester, an online joke recommender system. We also show that under certain conditions the myopic Bayesian strategy is also Bayes-optimal. Moreover, this strategy has finite regret (independent of time) which means that it also learns very fast.
(2) The second problem is based on search markets: a consumer is searching for a product sequentially (i.e., she examines possible options one by one and on observing them decides whether to buy or not). However, merely observing a good, although partially informative, does not typically provide the potential purchaser with the complete information set necessary to execute her buying decision. This lack of perfect information about the good creates a market for intermediaries (we refer to them as experts) who can conduct research on behalf of the buyer and sell…
Advisors/Committee Members: Das, Sanmay (committeechair), Vullikanti, Anil Kumar S. (committeechair), Sarne, David (committee member), Ryzhov, Ilya O. (committee member), Ramakrishnan, Naren (committee member).
Subjects/Keywords: Non-linear pricing; Sequential Search; Algorithm pricing; Information goods; Dynamic pricing; Revenue maximization; Reinforcement learning; Search markets
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Chhabra, M. (2014). Studies in the Algorithmic Pricing of Information Goods and Services. (Doctoral Dissertation). Virginia Tech. Retrieved from http://hdl.handle.net/10919/25874
Chicago Manual of Style (16th Edition):
Chhabra, Meenal. “Studies in the Algorithmic Pricing of Information Goods and Services.” 2014. Doctoral Dissertation, Virginia Tech. Accessed March 07, 2021.
http://hdl.handle.net/10919/25874.
MLA Handbook (7th Edition):
Chhabra, Meenal. “Studies in the Algorithmic Pricing of Information Goods and Services.” 2014. Web. 07 Mar 2021.
Vancouver:
Chhabra M. Studies in the Algorithmic Pricing of Information Goods and Services. [Internet] [Doctoral dissertation]. Virginia Tech; 2014. [cited 2021 Mar 07].
Available from: http://hdl.handle.net/10919/25874.
Council of Science Editors:
Chhabra M. Studies in the Algorithmic Pricing of Information Goods and Services. [Doctoral Dissertation]. Virginia Tech; 2014. Available from: http://hdl.handle.net/10919/25874

Virginia Tech
21.
Khandpur, Rupinder Paul.
Augmenting Dynamic Query Expansion in Microblog Texts.
Degree: PhD, Computer Science and Applications, 2018, Virginia Tech
URL: http://hdl.handle.net/10919/84852
► Dynamic query expansion is a method of automatically identifying terms relevant to a target domain based on an incomplete query input. With the explosive growth…
(more)
▼ Dynamic query expansion is a method of automatically identifying terms relevant to a target domain based on an incomplete query input. With the explosive growth of online media, such tools are essential for efficient search result refining to track emerging themes in noisy, unstructured text streams. It's crucial for large-scale predictive analytics and decision-making, systems which use open source indicators to find meaningful information rapidly and accurately. The problems of information overload and semantic mismatch are systemic during the Information Retrieval (IR) tasks undertaken by such systems.
In this dissertation, we develop approaches to dynamic query expansion algorithms that can help improve the efficacy of such systems using only a small set of seed queries and requires no training or labeled samples. We primarily investigate four significant problems related to the retrieval and assessment of event-related information, viz. (1) How can we adapt the query expansion process to support rank-based analysis when tracking a fixed set of entities? A scalable framework is essential to allow relative assessment of emerging themes such as airport threats. (2) What visual knowledge discovery framework to adopt that can incorporate users' feedback back into the search result refinement process? A crucial step to efficiently integrate real-time `situational awareness' when monitoring specific themes using open source indicators. (3) How can we contextualize query expansions? We focus on capturing semantic relatedness between a query and reference text so that it can quickly adapt to different target domains. (4) How can we synchronously perform knowledge discovery and characterization (unstructured to structured) during the retrieval process? We mainly aim to model high-order, relational aspects of event-related information from microblog texts.
Advisors/Committee Members: Ramakrishnan, Naren (committeechair), Lu, Chang Tien (committeechair), Han, Eui-Hong (committee member), North, Christopher L. (committee member), Reddy, Chandan K. (committee member).
Subjects/Keywords: Dynamic Query Expansion; Microblog Event Retrieval; Social Media Analytics; Visual Knowledge Discovery
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Khandpur, R. P. (2018). Augmenting Dynamic Query Expansion in Microblog Texts. (Doctoral Dissertation). Virginia Tech. Retrieved from http://hdl.handle.net/10919/84852
Chicago Manual of Style (16th Edition):
Khandpur, Rupinder Paul. “Augmenting Dynamic Query Expansion in Microblog Texts.” 2018. Doctoral Dissertation, Virginia Tech. Accessed March 07, 2021.
http://hdl.handle.net/10919/84852.
MLA Handbook (7th Edition):
Khandpur, Rupinder Paul. “Augmenting Dynamic Query Expansion in Microblog Texts.” 2018. Web. 07 Mar 2021.
Vancouver:
Khandpur RP. Augmenting Dynamic Query Expansion in Microblog Texts. [Internet] [Doctoral dissertation]. Virginia Tech; 2018. [cited 2021 Mar 07].
Available from: http://hdl.handle.net/10919/84852.
Council of Science Editors:
Khandpur RP. Augmenting Dynamic Query Expansion in Microblog Texts. [Doctoral Dissertation]. Virginia Tech; 2018. Available from: http://hdl.handle.net/10919/84852

Virginia Tech
22.
Chen, Feng.
Efficient Algorithms for Mining Large Spatio-Temporal Data.
Degree: PhD, Computer Science and Applications, 2013, Virginia Tech
URL: http://hdl.handle.net/10919/19220
► Knowledge discovery on spatio-temporal datasets has attracted growing interests. Recent advances on remote sensing technology mean that massive amounts of spatio-temporal data are being collected,…
(more)
▼ Knowledge discovery on spatio-temporal datasets has attracted growing interests. Recent advances on remote sensing technology mean that massive amounts of spatio-temporal data are being collected, and its volume keeps increasing at an ever faster pace. It becomes critical to design efficient algorithms for identifying novel and meaningful patterns from massive spatio-temporal datasets. Different from the other data sources, this data exhibits significant space-time statistical dependence, and the assumption of i.i.d. is no longer valid. The exact modeling of space-time dependence will render the exponential growth of model complexity as the data size increases. This research focuses on the construction of efficient and effective approaches using approximate inference techniques for three main mining tasks, including spatial outlier detection, robust spatio-temporal prediction, and novel applications to real world problems. Spatial novelty patterns, or spatial outliers, are those data points whose characteristics are markedly different from their spatial neighbors. There are two major branches of spatial outlier detection methodologies, which can be either global Kriging based or local Laplacian smoothing based. The former approach requires the exact modeling of spatial dependence, which is time extensive; and the latter approach requires the i.i.d. assumption of the smoothed observations, which is not statistically solid. These two approaches are constrained to numerical data, but in real world applications we are often faced with a variety of non-numerical data types, such as count, binary, nominal, and ordinal. To summarize, the main research challenges are: 1) how much spatial dependence can be eliminated via Laplace smoothing; 2) how to effectively and efficiently detect outliers for large numerical spatial datasets; 3) how to generalize numerical detection methods and develop a unified outlier detection framework suitable for large non-numerical datasets; 4) how to achieve accurate spatial prediction even when the training data has been contaminated by outliers; 5) how to deal with spatio-temporal data for the preceding problems. To address the first and second challenges, we mathematically validated the effectiveness of Laplacian smoothing on the elimination of spatial autocorrelations. This work provides fundamental support for existing Laplacian smoothing based methods. We also discovered a nontrivial side-effect of Laplacian smoothing, which ingests additional spatial variations to the data due to convolution effects. To capture this extra variability, we proposed a generalized local statistical model, and designed two fast forward and backward outlier detection methods that achieve a better balance between computational efficiency and accuracy than most existing methods, and are…
Advisors/Committee Members: Lu, Chang-Tien (committeechair), Ramakrishnan, Naren (committee member), Wang, Yue J. (committee member), Lou, Wenjing (committee member), Chen, Ing Ray (committee member).
Subjects/Keywords: Spatio-Temporal Analysis; Outlier Detection; Robust Prediction; Energy Disaggregation
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Chen, F. (2013). Efficient Algorithms for Mining Large Spatio-Temporal Data. (Doctoral Dissertation). Virginia Tech. Retrieved from http://hdl.handle.net/10919/19220
Chicago Manual of Style (16th Edition):
Chen, Feng. “Efficient Algorithms for Mining Large Spatio-Temporal Data.” 2013. Doctoral Dissertation, Virginia Tech. Accessed March 07, 2021.
http://hdl.handle.net/10919/19220.
MLA Handbook (7th Edition):
Chen, Feng. “Efficient Algorithms for Mining Large Spatio-Temporal Data.” 2013. Web. 07 Mar 2021.
Vancouver:
Chen F. Efficient Algorithms for Mining Large Spatio-Temporal Data. [Internet] [Doctoral dissertation]. Virginia Tech; 2013. [cited 2021 Mar 07].
Available from: http://hdl.handle.net/10919/19220.
Council of Science Editors:
Chen F. Efficient Algorithms for Mining Large Spatio-Temporal Data. [Doctoral Dissertation]. Virginia Tech; 2013. Available from: http://hdl.handle.net/10919/19220
23.
Zhang, Xuchao.
Scalable Robust Models Under Adversarial Data Corruption.
Degree: PhD, Computer Science and Applications, 2019, Virginia Tech
URL: http://hdl.handle.net/10919/88833
► Social media has experienced a rapid growth during the past decade. Millions of users of sites such as Twitter have been generating and sharing a…
(more)
▼ Social media has experienced a rapid growth during the past decade. Millions of users of sites such as Twitter have been generating and sharing a wide variety of content including texts, images, and other metadata. In addition, social media can be treated as a social sensor that reflects different aspects of our society. Event analytics in social media have enormous significance for applications like disease surveillance, business intelligence, and disaster management. Social media data possesses a number of important characteristics including dynamics, heterogeneity, noisiness, timeliness, big volume, and network properties. These characteristics cause various new challenges and hence invoke many interesting research topics, which will be addressed here. This dissertation focuses on the development of five novel methods for social media-based spatiotemporal event detection and forecasting. The first of these is a novel unsupervised approach for detecting the dynamic keywords of spatial events in targeted domains. This method has been deployed in a practical project for monitoring civil unrest events in several Latin American regions. The second builds on this by discovering the underlying development progress of events, jointly considering the structural contexts and spatiotemporal burstiness. The third seeks to forecast future events using social media data. The basic idea here is to search for subtle patterns in specific cities as indicators of ongoing or future events, where each pattern is defined as a burst of context features (keywords) that are relevant to a specific event. For instance, an initial expression of discontent gas price increases could actually be a potential precursor to a more general protest about government policies. Beyond social media data, in the fourth method proposed here, multiple data sources are leveraged to reflect different aspects of the society for event forecasting. This addresses several important problems, including the common phenomena that different sources may come from different geographical levels and have different available time periods. The fifth study is a novel flu forecasting method based on epidemics modeling and social media mining. A new framework is proposed to integrate prior knowledge of disease propagation mechanisms and real-time information from social media.
Advisors/Committee Members: Lu, Chang Tien (committeechair), Chen, Ing Ray (committee member), Ramakrishnan, Naren (committee member), Boedihardjo, Arnold P. (committee member), Reddy, Chandan K. (committee member).
Subjects/Keywords: Robust Model; Adversarial Data Corruption; Scalability
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Zhang, X. (2019). Scalable Robust Models Under Adversarial Data Corruption. (Doctoral Dissertation). Virginia Tech. Retrieved from http://hdl.handle.net/10919/88833
Chicago Manual of Style (16th Edition):
Zhang, Xuchao. “Scalable Robust Models Under Adversarial Data Corruption.” 2019. Doctoral Dissertation, Virginia Tech. Accessed March 07, 2021.
http://hdl.handle.net/10919/88833.
MLA Handbook (7th Edition):
Zhang, Xuchao. “Scalable Robust Models Under Adversarial Data Corruption.” 2019. Web. 07 Mar 2021.
Vancouver:
Zhang X. Scalable Robust Models Under Adversarial Data Corruption. [Internet] [Doctoral dissertation]. Virginia Tech; 2019. [cited 2021 Mar 07].
Available from: http://hdl.handle.net/10919/88833.
Council of Science Editors:
Zhang X. Scalable Robust Models Under Adversarial Data Corruption. [Doctoral Dissertation]. Virginia Tech; 2019. Available from: http://hdl.handle.net/10919/88833
24.
Akupatni, Vivek Bharath.
My4Sight: A Human Computation Platform for Improving Flu Predictions.
Degree: MS, Computer Science and Applications, 2015, Virginia Tech
URL: http://hdl.handle.net/10919/56579
► While many human computation (human-in-the-loop) systems exist in the field of Artificial Intelligence (AI) to solve problems that can't be solved by computers alone, comparatively…
(more)
▼ While many human computation (human-in-the-loop) systems exist in the field of Artificial Intelligence (AI) to solve problems that can't be solved by computers alone, comparatively fewer platforms exist for collecting human knowledge, and evaluation of various techniques for harnessing human insights in improving forecasting models for infectious diseases, such as Influenza and Ebola.
In this thesis, we present the design and implementation of My4Sight, a human computation system developed to harness human insights and intelligence to improve forecasting models. This web-accessible system simplifies the collection of human insights through the careful design of the following two tasks: (i) asking users to rank system-generated forecasts in order of likelihood; and (ii) allowing users to improve upon an existing system-generated prediction. The structured output collected from querying human computers can then be used in building better forecasting models. My4Sight is designed to be a complete end-to- end analytical platform, and provides access to data collection features and statistical tools that are applied to the collected data. The results are communicated to the user, wherever applicable, in the form of visualizations for easier data comprehension. With My4Sight, this thesis makes a valuable contribution to the field of epidemiology by providing the necessary data and infrastructure platform to improve forecasts in real time by harnessing the wisdom of the crowd.
Advisors/Committee Members: Marathe, Madhav Vishnu (committeechair), Ramakrishnan, Naren (committee member), Chen, Jiangzhuo (committee member), Bisset, Keith R. (committee member).
Subjects/Keywords: human computation; human-in-the-loop; crowd sourcing; my4sight; influenza forecasting
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Akupatni, V. B. (2015). My4Sight: A Human Computation Platform for Improving Flu Predictions. (Masters Thesis). Virginia Tech. Retrieved from http://hdl.handle.net/10919/56579
Chicago Manual of Style (16th Edition):
Akupatni, Vivek Bharath. “My4Sight: A Human Computation Platform for Improving Flu Predictions.” 2015. Masters Thesis, Virginia Tech. Accessed March 07, 2021.
http://hdl.handle.net/10919/56579.
MLA Handbook (7th Edition):
Akupatni, Vivek Bharath. “My4Sight: A Human Computation Platform for Improving Flu Predictions.” 2015. Web. 07 Mar 2021.
Vancouver:
Akupatni VB. My4Sight: A Human Computation Platform for Improving Flu Predictions. [Internet] [Masters thesis]. Virginia Tech; 2015. [cited 2021 Mar 07].
Available from: http://hdl.handle.net/10919/56579.
Council of Science Editors:
Akupatni VB. My4Sight: A Human Computation Platform for Improving Flu Predictions. [Masters Thesis]. Virginia Tech; 2015. Available from: http://hdl.handle.net/10919/56579

Virginia Tech
25.
Poirel, Christopher L.
Bridging Methodological Gaps in Network-Based Systems Biology.
Degree: PhD, Computer Science and Applications, 2013, Virginia Tech
URL: http://hdl.handle.net/10919/23899
► Functioning of the living cell is controlled by a complex network of interactions among genes, proteins, and other molecules. A major goal of systems biology…
(more)
▼ Functioning of the living cell is controlled by a complex network of interactions among genes, proteins, and other molecules. A major goal of systems biology is to understand and explain the mechanisms by which these interactions govern the cell's response to various conditions. Molecular interaction networks have proven to be a powerful representation for studying cellular behavior. Numerous algorithms have been developed to unravel the complexity of these networks. Our work addresses the drawbacks of existing techniques. This thesis includes three related research efforts that introduce network-based approaches to bridge current methodological gaps in systems biology.
i. Functional enrichment methods provide a summary of biological functions that are overrepresented in an interesting collection of genes (e.g., highly differentially expressed genes between a diseased cell and a healthy cell). Standard functional enrichment algorithms ignore the known interactions among proteins. We propose a novel network-based approach to functional enrichment that explicitly accounts for these underlying molecular interactions. Through this work, we close the gap between set-based functional enrichment and topological analysis of molecular interaction networks.
ii. Many techniques have been developed to compute the response network of a cell. A recent trend in this area is to compute response networks of small size, with the rationale that only part of a pathway is often changed by disease and that interpreting small subnetworks is easier than interpreting larger ones. However, these methods may not uncover the spectrum of pathways perturbed in a particular experiment or disease. To avoid these difficulties, we propose to use algorithms that reconcile case-control DNA microarray data with a molecular interaction network by modifying per-gene differential expression p-values such that two genes connected by an interaction show similar changes in their gene expression values.
iii. Top-down analyses in systems biology can automatically find correlations among genes and proteins in large-scale datasets. However, it is often difficult to design experiments from these results. In contrast, bottom-up approaches painstakingly craft detailed models of cellular processes. However, developing the models is a manual process that can take many years. These approaches have largely been developed independently. We present Linker, an efficient and automated data-driven method that analyzes molecular interactomes. Linker combines teleporting random walks and k-shortest path computations to discover connections from a set of source proteins to a set of target proteins. We demonstrate the efficacy of Linker through two applications: proposing extensions to an existing model of cell cycle regulation in budding yeast and automated reconstruction of human signaling pathways. Linker achieves superior precision and recall compared to state-of-the-art algorithms from the literature.
Advisors/Committee Members: Murali, T. M. (committeechair), Vullikanti, Anil Kumar S. (committee member), Grama, Ananth (committee member), Tyson, John J. (committee member), Ramakrishnan, Naren (committee member).
Subjects/Keywords: Computational Biology; Functional Enrichment; Graph Theory; Network; Random Walk; Signaling Pathways; Top-Down Analysis
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Poirel, C. L. (2013). Bridging Methodological Gaps in Network-Based Systems Biology. (Doctoral Dissertation). Virginia Tech. Retrieved from http://hdl.handle.net/10919/23899
Chicago Manual of Style (16th Edition):
Poirel, Christopher L. “Bridging Methodological Gaps in Network-Based Systems Biology.” 2013. Doctoral Dissertation, Virginia Tech. Accessed March 07, 2021.
http://hdl.handle.net/10919/23899.
MLA Handbook (7th Edition):
Poirel, Christopher L. “Bridging Methodological Gaps in Network-Based Systems Biology.” 2013. Web. 07 Mar 2021.
Vancouver:
Poirel CL. Bridging Methodological Gaps in Network-Based Systems Biology. [Internet] [Doctoral dissertation]. Virginia Tech; 2013. [cited 2021 Mar 07].
Available from: http://hdl.handle.net/10919/23899.
Council of Science Editors:
Poirel CL. Bridging Methodological Gaps in Network-Based Systems Biology. [Doctoral Dissertation]. Virginia Tech; 2013. Available from: http://hdl.handle.net/10919/23899
26.
Parikh, Nidhi Kiranbhai.
Behavior Modeling and Analytics for Urban Computing: A Synthetic Information-based Approach.
Degree: PhD, Computer Science and Applications, 2017, Virginia Tech
URL: http://hdl.handle.net/10919/84967
► The rapid increase in urbanization poses challenges in diverse areas such as energy, transportation, pandemic planning, and disaster response. Planning for urbanization is a big…
(more)
▼ The rapid increase in urbanization poses challenges in diverse areas such as energy, transportation, pandemic planning, and disaster response. Planning for urbanization is a big challenge because cities are complex systems consisting of human populations, infrastructures, and interactions and interdependence among them. This dissertation focuses on a synthetic information-based approach for modeling human activities and behaviors for two urban science applications, epidemiology and disaster planning, and with associated analytics. Synthetic information is a data-driven approach to create a detailed, high fidelity representation of human populations, infrastructural systems and their behavioral and interaction aspects. It is used in developing large-scale simulations to model what-if scenarios and for policy making.
Big cities have a large number of visitors visiting them every day. They often visit crowded areas in the city and come into contact with each other and the area residents. However, most epidemiological studies have ignored their role in spreading epidemics. We extend the synthetic population model of the Washington DC metro area to include transient populations, consisting of tourists and business travelers, along with their demographics and activities, by combining data from multiple sources. We evaluate the effect of including this population in epidemic forecasts, and the potential benefits of multiple interventions that target transients.
In the next study, we model human behavior in the aftermath of the detonation of an improvised nuclear device in Washington DC. Previous studies of this scenario have mostly focused on modeling physical impact and simple behaviors like sheltering and evacuation. However, these models have focused on optimal behavior, not naturalistic behavior. In other words, prior work is focused on whether it is better to shelter-in-place or evacuate, but has not been informed by the literature on what people actually do in the aftermath of disasters. Natural human behaviors in disasters, such as looking for family members or seeking healthcare, are supported by infrastructures such as cell-phone communication and transportation systems. We model a range of behaviors such as looking for family members, evacuation, sheltering, healthcare-seeking, worry, and search and rescue and their interactions with infrastructural systems.
Large-scale and complex agent-based simulations generate a large amount of data in each run of the simulation, making it hard to make sense of results. This leads us to formulate two new problems in simulation analytics. First, we develop algorithms to summarize simulation results by extracting causally-relevant state sequences - state sequences that have a measurable effect on the outcome of interest. Second, in order to develop effective interventions, it is important to understand which behaviors lead to positive and negative outcomes. It may happen that the same behavior may lead to different outcomes, depending upon the context. Hence, we develop an…
Advisors/Committee Members: Marathe, Madhav Vishnu (committeechair), Swarup, Samarth (committeechair), Vullikanti, Anil Kumar S. (committee member), Sukthankar, Gita Reese (committee member), Ramakrishnan, Naren (committee member).
Subjects/Keywords: Behavior Modeling; Simulation Analytics; Social Simulations; Synthetic Information; Transient Population; Urban Computing
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Parikh, N. K. (2017). Behavior Modeling and Analytics for Urban Computing: A Synthetic Information-based Approach. (Doctoral Dissertation). Virginia Tech. Retrieved from http://hdl.handle.net/10919/84967
Chicago Manual of Style (16th Edition):
Parikh, Nidhi Kiranbhai. “Behavior Modeling and Analytics for Urban Computing: A Synthetic Information-based Approach.” 2017. Doctoral Dissertation, Virginia Tech. Accessed March 07, 2021.
http://hdl.handle.net/10919/84967.
MLA Handbook (7th Edition):
Parikh, Nidhi Kiranbhai. “Behavior Modeling and Analytics for Urban Computing: A Synthetic Information-based Approach.” 2017. Web. 07 Mar 2021.
Vancouver:
Parikh NK. Behavior Modeling and Analytics for Urban Computing: A Synthetic Information-based Approach. [Internet] [Doctoral dissertation]. Virginia Tech; 2017. [cited 2021 Mar 07].
Available from: http://hdl.handle.net/10919/84967.
Council of Science Editors:
Parikh NK. Behavior Modeling and Analytics for Urban Computing: A Synthetic Information-based Approach. [Doctoral Dissertation]. Virginia Tech; 2017. Available from: http://hdl.handle.net/10919/84967
27.
Fan, Shuangfei.
Deep Representation Learning on Labeled Graphs.
Degree: PhD, Computer Science and Applications, 2020, Virginia Tech
URL: http://hdl.handle.net/10919/96596
► Graphs are one of the most important and powerful data structures for conveying the complex and correlated information among data points. In this research, we…
(more)
▼ Graphs are one of the most important and powerful data structures for conveying the complex and correlated information among data points. In this research, we aim to provide more robust and accurate models for some graph specific tasks, such as collective classification and graph generation, by designing deep learning models to learn better task-specific representations for graphs. First, we studied the collective classification problem in graphs and proposed recurrent collective classification, a variant of the iterative classification algorithm that is more robust to situations where predictions are noisy or inaccurate. Then we studied the problem of graph generation using deep generative models. We first proposed a deep generative model using the GAN framework that generates labeled graphs. Then in order to support more applications and also get more control over the generated graphs, we extended the problem of graph generation to conditional graph generation which can then be applied to various applications for modeling graph evolution and transformation.
Advisors/Committee Members: Huang, Bert (committeechair), Neville, Jennifer (committee member), Abbott, Amos L. (committee member), Ramakrishnan, Naren (committee member), Reddy, Chandan K. (committee member).
Subjects/Keywords: Machine learning
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Fan, S. (2020). Deep Representation Learning on Labeled Graphs. (Doctoral Dissertation). Virginia Tech. Retrieved from http://hdl.handle.net/10919/96596
Chicago Manual of Style (16th Edition):
Fan, Shuangfei. “Deep Representation Learning on Labeled Graphs.” 2020. Doctoral Dissertation, Virginia Tech. Accessed March 07, 2021.
http://hdl.handle.net/10919/96596.
MLA Handbook (7th Edition):
Fan, Shuangfei. “Deep Representation Learning on Labeled Graphs.” 2020. Web. 07 Mar 2021.
Vancouver:
Fan S. Deep Representation Learning on Labeled Graphs. [Internet] [Doctoral dissertation]. Virginia Tech; 2020. [cited 2021 Mar 07].
Available from: http://hdl.handle.net/10919/96596.
Council of Science Editors:
Fan S. Deep Representation Learning on Labeled Graphs. [Doctoral Dissertation]. Virginia Tech; 2020. Available from: http://hdl.handle.net/10919/96596

Virginia Tech
28.
Semaan, Marie.
A Novel Approach to Communal Rainwater Harvesting for Single-Family Housing: A Study of Tank Size, Reliability, and Costs.
Degree: PhD, Environmental Design and Planning, 2020, Virginia Tech
URL: http://hdl.handle.net/10919/97580
► An emerging field in rainwater harvesting (RWH) is the application of communal rainwater harvesting system. This system's main advantage compared to individual RWH is the…
(more)
▼ An emerging field in rainwater harvesting (RWH) is the application of communal rainwater harvesting system. This system's main advantage compared to individual RWH is the centralization of water treatment, which some users of individual RWH find difficult to maintain. Despite alleviating one concern, this communal approach does not increase the RHW system's (RWHS) reliability nor necessarily satisfy all water demands, and hence is not a major improvement in terms of system performance.
This research tackles this challenge with a novel approach to communal RWH for single-family houses. Instead of the traditional communal approach to RWH which uses only one storage location, we propose connecting multiple single-family homes' RWHSs to a communal backup tank, i.e., capturing overflow from multiple RWHS, which will increase reliability and water demand met in a way that will significantly improve the current performance of communal RWH. The proposed system will potentially maximize the availability of potable water while limiting spillage and overflow.
We simulated the performance of the system in two cities, Houston and Jacksonville, for multiple private and communal storage combination. Results show that volumetric reliability gains, of 1.5% - 6% and 1.5% - 4%, can be achieved for seven to ten and six to seven connected households, respectively, for Houston and Jacksonville if the emphasis is on volumetric reliability (VR). As per total storage capacity, the system achieves higher VR gains for lower total storage capacity in Houston while the system achieves higher VR gains for higher total storage capacities in Jacksonville.
With regards to the total cost of ownership per household for the individual system and for the communal storage system, the lifecycle cost of the system was performed using the Net Present Value (NPV) method, with an interest rate of 7% over 30 years. The NPV of the total system costs per household in the city of Houston is lowest for nine to ten connected households, as well as comparable to the base case of a rainwater harvesting system that is not connected to a communal tank for seven and eight connected households.
This communal system is more resilient and can be a worthy addition to water and stormwater infrastructures, especially in the face of climate change.
Advisors/Committee Members: Pearce, Annie R. (committeechair), Garvin, Michael J. (committee member), Day, Susan D. (committee member), Ramakrishnan, Naren (committee member).
Subjects/Keywords: rainwater harvesting; communal; simulation; modeling distributed rainwater
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Semaan, M. (2020). A Novel Approach to Communal Rainwater Harvesting for Single-Family Housing: A Study of Tank Size, Reliability, and Costs. (Doctoral Dissertation). Virginia Tech. Retrieved from http://hdl.handle.net/10919/97580
Chicago Manual of Style (16th Edition):
Semaan, Marie. “A Novel Approach to Communal Rainwater Harvesting for Single-Family Housing: A Study of Tank Size, Reliability, and Costs.” 2020. Doctoral Dissertation, Virginia Tech. Accessed March 07, 2021.
http://hdl.handle.net/10919/97580.
MLA Handbook (7th Edition):
Semaan, Marie. “A Novel Approach to Communal Rainwater Harvesting for Single-Family Housing: A Study of Tank Size, Reliability, and Costs.” 2020. Web. 07 Mar 2021.
Vancouver:
Semaan M. A Novel Approach to Communal Rainwater Harvesting for Single-Family Housing: A Study of Tank Size, Reliability, and Costs. [Internet] [Doctoral dissertation]. Virginia Tech; 2020. [cited 2021 Mar 07].
Available from: http://hdl.handle.net/10919/97580.
Council of Science Editors:
Semaan M. A Novel Approach to Communal Rainwater Harvesting for Single-Family Housing: A Study of Tank Size, Reliability, and Costs. [Doctoral Dissertation]. Virginia Tech; 2020. Available from: http://hdl.handle.net/10919/97580

Virginia Tech
29.
Liu, Xiaomo.
Online Knowledge Community Mining and Modeling for Effective Knowledge Management.
Degree: PhD, Computer Science and Applications, 2013, Virginia Tech
URL: http://hdl.handle.net/10919/50646
► More and more in recent years, activities that people once did in the real world they now do in virtual space. In particular, online communities…
(more)
▼ More and more in recent years, activities that people once did in the real world they now do in virtual space. In particular, online communities have become popular and efficient media for people all over the world to seek and share knowledge in domains that interest them. Such communities are called online knowledge communities (OKCs). Large-scale OKCs may comprise thousands of community members and archive many more online messages. As a result, problems such as how to identify and manage the knowledge collected and how to understand people\'s knowledge-sharing behaviors have become major challenges for leveraging online knowledge to sustain community growth.
In this dissertation I examine three important factors of managing knowledge in OKCs. First, I focus on how to build successful profiles for community members that describe their domain expertise. These expertise profiles are potentially important for directing questions to the right people and, thus, can improve the community\'s overall efficiency and efficacy. To address this issue, I present a comparative study of models of expertise profiling in online communities and identify the model combination that delivers the best results.
Next, I investigate how to automatically assess the information helpfulness of user postings. Due to the voluntary nature of online participation, there is no guarantee that all user-generated content (UGC) will be helpful. It is also difficult, given the sheer amount of online postings, for knowledge seekers to find information quickly that satisfies their informational needs. Therefore, I propose a theory-driven text classification framework based on the knowledge adoption model (KAM) for predicting the helpfulness of UGC in OKCs. I test the effectiveness of this framework at both the thread level and the post level of online messages.
Any given OKC generally has a huge number of individuals participating in online discussions, but exactly what, where, when and how they seek and share knowledge are still not fully understood or documented. In the last part of the dissertation, I describe a multi-level study of the knowledge-sharing behaviors of users in OKCs. Both exploratory data analysis and network analysis are applied to thread, forum and community levels of online data. I present a number of interesting findings on social dynamics in knowledge sharing and diffusion. These findings potentially have important implications for both the theory and practice of online community knowledge management.
Advisors/Committee Members: Fan, Weiguo Patrick (committeechair), Fox, Edward A. (committee member), Ramakrishnan, Naren (committee member), Wang, Gang (committee member), Du, Pang (committee member).
Subjects/Keywords: Online Communities; Knowledge Management; Expertise Profiling; Knowledge Helpfulness Prediction; Knowledge Sharing & Diffusion
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Liu, X. (2013). Online Knowledge Community Mining and Modeling for Effective Knowledge Management. (Doctoral Dissertation). Virginia Tech. Retrieved from http://hdl.handle.net/10919/50646
Chicago Manual of Style (16th Edition):
Liu, Xiaomo. “Online Knowledge Community Mining and Modeling for Effective Knowledge Management.” 2013. Doctoral Dissertation, Virginia Tech. Accessed March 07, 2021.
http://hdl.handle.net/10919/50646.
MLA Handbook (7th Edition):
Liu, Xiaomo. “Online Knowledge Community Mining and Modeling for Effective Knowledge Management.” 2013. Web. 07 Mar 2021.
Vancouver:
Liu X. Online Knowledge Community Mining and Modeling for Effective Knowledge Management. [Internet] [Doctoral dissertation]. Virginia Tech; 2013. [cited 2021 Mar 07].
Available from: http://hdl.handle.net/10919/50646.
Council of Science Editors:
Liu X. Online Knowledge Community Mining and Modeling for Effective Knowledge Management. [Doctoral Dissertation]. Virginia Tech; 2013. Available from: http://hdl.handle.net/10919/50646

Virginia Tech
30.
Yang, Seungwon.
Automatic Identification of Topic Tags from Texts Based on Expansion-Extraction Approach.
Degree: PhD, Computer Science and Applications, 2014, Virginia Tech
URL: http://hdl.handle.net/10919/25111
► Identifying topics of a textual document is useful for many purposes. We can organize the documents by topics in digital libraries. Then, we could browse…
(more)
▼ Identifying topics of a textual document is useful for many purposes. We can organize the documents by topics in digital libraries. Then, we could browse and search for the documents with specific topics. By examining the topics of a document, we can quickly understand what the document is about. To augment the traditional manual way of topic tagging tasks, which is labor-intensive, solutions using computers have been developed.
This dissertation describes the design and development of a topic identification approach, in this case applied to disaster events. In a sense, this study represents the marriage of research analysis with an engineering effort in that it combines inspiration from Cognitive Informatics with a practical model from Information Retrieval. One of the design constraints, however, is that the Web was used as a universal knowledge source, which was essential in accessing the required information for inferring topics from texts.
Retrieving specific information of interest from such a vast information source was achieved by querying a search engine's application programming interface. Specifically, the information gathered was processed mainly by incorporating the Vector Space Model from the Information Retrieval field. As a proof of concept, we subsequently developed and evaluated a prototype tool, Xpantrac, which is able to run in a batch mode to automatically process text documents. A user interface of Xpantrac also was constructed to support an interactive semi-automatic topic tagging application, which was subsequently assessed via a usability study.
Throughout the design, development, and evaluation of these various study components, we detail how the hypotheses and research questions of this dissertation have been supported and answered. We also present that our overarching goal, which was the identification of topics in a human-comparable way without depending on a large training set or a corpus, has been achieved.
Advisors/Committee Members: Fox, Edward A. (committeechair), Wildemuth, Barbara Marie (committee member), Ramakrishnan, Naren (committee member), Moore, John F. (committee member), Fan, Weiguo (committee member).
Subjects/Keywords: topic identification; tagging; cognitive informatics; vector space model; knowledge sources; natural language processing; digital libraries; usability study
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Yang, S. (2014). Automatic Identification of Topic Tags from Texts Based on Expansion-Extraction Approach. (Doctoral Dissertation). Virginia Tech. Retrieved from http://hdl.handle.net/10919/25111
Chicago Manual of Style (16th Edition):
Yang, Seungwon. “Automatic Identification of Topic Tags from Texts Based on Expansion-Extraction Approach.” 2014. Doctoral Dissertation, Virginia Tech. Accessed March 07, 2021.
http://hdl.handle.net/10919/25111.
MLA Handbook (7th Edition):
Yang, Seungwon. “Automatic Identification of Topic Tags from Texts Based on Expansion-Extraction Approach.” 2014. Web. 07 Mar 2021.
Vancouver:
Yang S. Automatic Identification of Topic Tags from Texts Based on Expansion-Extraction Approach. [Internet] [Doctoral dissertation]. Virginia Tech; 2014. [cited 2021 Mar 07].
Available from: http://hdl.handle.net/10919/25111.
Council of Science Editors:
Yang S. Automatic Identification of Topic Tags from Texts Based on Expansion-Extraction Approach. [Doctoral Dissertation]. Virginia Tech; 2014. Available from: http://hdl.handle.net/10919/25111
◁ [1] [2] [3] [4] [5] [6] ▶
.