You searched for subject:(data extraction)
.
Showing records 1 – 30 of
354 total matches.
◁ [1] [2] [3] [4] [5] … [12] ▶

University of Waterloo
1.
Zhou, Menglan.
Automated Extraction of 3D Building Windows from Mobile LiDAR Data.
Degree: 2016, University of Waterloo
URL: http://hdl.handle.net/10012/10342
► The three-dimensional (3D) city models have gained more and more attentions because of their considerable potential applications at present. In particular, the demands for Level…
(more)
▼ The three-dimensional (3D) city models have gained more and more attentions because of their considerable potential applications at present. In particular, the demands for Level of Detail (LoD) building models become urgent. Mobile Laser Scanning (MLS) has supplied a brand-new technology in the acquisition and update of 3D information in urban off-terrain features, particularly for building façade details. Accordingly, generating LoD3 building models from MLS point clouds becomes a new trend in recent studies.
As a consequence, a method that can accurately and automatically extract 3D windows from raw MLS point clouds is presented in this thesis. To provide solid and credible information for LoD3 building models, this automated method endeavors to identify window frames on building facades from MLS point clouds. This algorithm can typically be regarded as a stepwise procedure to interpret MLS point clouds as semantic features. A voxel-based upward-growing method is firstly applied to distinguish non-ground points from ground points. Noise is then filtered out from non-ground points by statistical analysis. In order to segment out the building facades, all the remaining non-ground points are clustered based on conditional Euclidean clustering algorithm; clusters whose density and width are over a given threshold will be designated as points for building facades. After a building façade is successfully extracted, a volumetric box is created to contain façade points so that neighbours of each point can be operated. A manipulator is finally applied according to the structural characteristics of window frames to extract the potential window points.
The experimental results demonstrate that the proposed algorithm can successfully extract the rectangular or curved windows in the test datasets with promising accuracies. The 2D validation and 3D validation were both conducted in this study. In the 2D validation, the lowest F-measure of the test datasets is 0.740, and the highest can be 0.977. While in the 3D validation, the lowest correctness of the test dataset is 79.58%, and the highest can be 97.96%. After further analysis of the experimental results, it was found that, for those windows concave on walls or with curtains drawn, the performance of the proposed method was influenced. Furthermore, big holes caused by system errors in raw point clouds also had negative impacts on the proposed method.
In conclusion, this thesis makes a considerable contribution to extracting 3D rectangular, irregular and arc-rounded windows from noisy MLS point clouds with high accuracy and high efficiency. It has supplied a promising method for generating LoD3 building models.
Subjects/Keywords: MLS data; building facade extraction; window extraction
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Zhou, M. (2016). Automated Extraction of 3D Building Windows from Mobile LiDAR Data. (Thesis). University of Waterloo. Retrieved from http://hdl.handle.net/10012/10342
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
Chicago Manual of Style (16th Edition):
Zhou, Menglan. “Automated Extraction of 3D Building Windows from Mobile LiDAR Data.” 2016. Thesis, University of Waterloo. Accessed April 11, 2021.
http://hdl.handle.net/10012/10342.
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
MLA Handbook (7th Edition):
Zhou, Menglan. “Automated Extraction of 3D Building Windows from Mobile LiDAR Data.” 2016. Web. 11 Apr 2021.
Vancouver:
Zhou M. Automated Extraction of 3D Building Windows from Mobile LiDAR Data. [Internet] [Thesis]. University of Waterloo; 2016. [cited 2021 Apr 11].
Available from: http://hdl.handle.net/10012/10342.
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
Council of Science Editors:
Zhou M. Automated Extraction of 3D Building Windows from Mobile LiDAR Data. [Thesis]. University of Waterloo; 2016. Available from: http://hdl.handle.net/10012/10342
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation

University of Waterloo
2.
Farid, Mina.
Extracting and Cleaning RDF Data.
Degree: 2020, University of Waterloo
URL: http://hdl.handle.net/10012/15934
► The RDF data model has become a prevalent format to represent heterogeneous data because of its versatility. The capability of dismantling information from its native…
(more)
▼ The RDF data model has become a prevalent format to represent heterogeneous data because of its versatility. The capability of dismantling information from its native formats and representing it in triple format offers a simple yet powerful way of modelling data that is obtained from multiple sources. In addition, the triple format and schema constraints of the RDF model make the RDF data easy to process as labeled, directed graphs.
This graph representation of RDF data supports higher-level analytics by enabling querying using different techniques and querying languages, e.g., SPARQL. Anlaytics that require structured data are supported by transforming the graph data on-the-fly to populate the target schema that is needed for downstream analysis. These target schemas are defined by downstream applications according to their information need.
The flexibility of RDF data brings two main challenges. First, the extraction of RDF data is a complex task that may involve domain expertise about the information required to be extracted for different applications. Another significant aspect of analyzing RDF data is its quality, which depends on multiple factors including the reliability of data sources and the accuracy of the extraction systems. The quality of the analysis depends mainly on the quality of the underlying data. Therefore, evaluating and improving the quality of RDF data has a direct effect on the correctness of downstream analytics.
This work presents multiple approaches related to the extraction and quality evaluation of RDF data. To cope with the large amounts of data that needs to be extracted, we present DSTLR, a scalable framework to extract RDF triples from semi-structured and unstructured data sources. For rare entities that fall on the long tail of information, there may not be enough signals to support high-confidence extraction. Towards this problem, we present an approach to estimate property values for long tail entities. We also present multiple algorithms and approaches that focus on the quality of RDF data. These include discovering quality constraints from RDF data, and utilizing machine learning techniques to repair errors in RDF data.
Subjects/Keywords: rdf; data quality; information extraction
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Farid, M. (2020). Extracting and Cleaning RDF Data. (Thesis). University of Waterloo. Retrieved from http://hdl.handle.net/10012/15934
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
Chicago Manual of Style (16th Edition):
Farid, Mina. “Extracting and Cleaning RDF Data.” 2020. Thesis, University of Waterloo. Accessed April 11, 2021.
http://hdl.handle.net/10012/15934.
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
MLA Handbook (7th Edition):
Farid, Mina. “Extracting and Cleaning RDF Data.” 2020. Web. 11 Apr 2021.
Vancouver:
Farid M. Extracting and Cleaning RDF Data. [Internet] [Thesis]. University of Waterloo; 2020. [cited 2021 Apr 11].
Available from: http://hdl.handle.net/10012/15934.
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
Council of Science Editors:
Farid M. Extracting and Cleaning RDF Data. [Thesis]. University of Waterloo; 2020. Available from: http://hdl.handle.net/10012/15934
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation

University of Manchester
3.
Tran, Thy Thy.
Exploiting Unlabelled Data for Relation
Extraction.
Degree: 2020, University of Manchester
URL: http://www.manchester.ac.uk/escholar/uk-ac-man-scw:327033
► Information extraction transforms unstructured text to structured by annotating semantic information on raw data. A crucial step in information extraction is relation extraction, which identifies…
(more)
▼ Information
extraction transforms unstructured text
to structured by annotating semantic information on raw
data. A
crucial step in information
extraction is relation
extraction,
which identifies semantic relationships between named entities in
text. The resulting relations can be used to construct and populate
knowledge bases as well as used in various applications such as
information retrieval and question answering. Relation
extraction
has been widely studied using fully supervised learning and
distantly supervised approaches, these approaches require either
manually- or automatically-annotated
data. In contrast, a massive
amount of unlabelled texts freely available are underused. We hence
focus on leveraging the unlabelled
data to improve and extend
relation
extraction. We approach the use of unlabelled text from
three directions: (i) use it for pre-training word representations,
(ii) conduct unsupervised learning, and (iii) perform weak
supervision. Regarding the first direction, we want to leverage
syntactic information for relation
extraction. Instead of directly
tuning such information on a relation
extraction corpus, we propose
a novel graph neural model for learning syntactically-informed word
representations. The proposed method allows us to enrich pretrained
word representations with syntactic information rather than
re-training language models from scratch as previous work.
Throughout this work, we can confirm that our novel representations
are beneficial for relations in two different domains. In the
second direction, we study unsupervised relation
extraction, which
is a promising approach because it does not require manually- or
automatically-labelled
data. We hypothesise that inductive biases
are extremely important to direct unsupervised relation
extraction.
We hence employ two simple methods using only entity types to infer
relations. Despite their simplicity, our methods can outperform
existing approaches on two popular datasets. These surprising
results suggest that entity types provide a strong inductive bias
for unsupervised relation
extraction. The last direction is
inspired by recent evidence that large-scale pretrained language
models capture some sort of relational facts. We want to
investigate whether these pretrained language models can serve as
weak annotators. To this end, we evaluate three large pretrained
language models by matching sentences against relations’
exemplars. The matching scores decide how likely a given sentence
expresses a relation. The top relations are further used as weak
annotations to train a relation classifier. We observe that
pretrained language models are confused by highly similar
relations, thus, we propose a method that models the labelling
confusion to correct relation prediction. We validate the proposed
method on two datasets with different characteristics, showing that
it can effectively model labelling noise from our weak annotator.
Overall, we illustrate that exploring the use of unlabelled
data is
an important step towards improving relation…
Advisors/Committee Members: BATISTA-NAVARRO, RIZA THERESA RTB, Ananiadou, Sophia, Batista-Navarro, Riza Theresa.
Subjects/Keywords: Relation Extraction; Unlabelled Data
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Tran, T. T. (2020). Exploiting Unlabelled Data for Relation
Extraction. (Doctoral Dissertation). University of Manchester. Retrieved from http://www.manchester.ac.uk/escholar/uk-ac-man-scw:327033
Chicago Manual of Style (16th Edition):
Tran, Thy Thy. “Exploiting Unlabelled Data for Relation
Extraction.” 2020. Doctoral Dissertation, University of Manchester. Accessed April 11, 2021.
http://www.manchester.ac.uk/escholar/uk-ac-man-scw:327033.
MLA Handbook (7th Edition):
Tran, Thy Thy. “Exploiting Unlabelled Data for Relation
Extraction.” 2020. Web. 11 Apr 2021.
Vancouver:
Tran TT. Exploiting Unlabelled Data for Relation
Extraction. [Internet] [Doctoral dissertation]. University of Manchester; 2020. [cited 2021 Apr 11].
Available from: http://www.manchester.ac.uk/escholar/uk-ac-man-scw:327033.
Council of Science Editors:
Tran TT. Exploiting Unlabelled Data for Relation
Extraction. [Doctoral Dissertation]. University of Manchester; 2020. Available from: http://www.manchester.ac.uk/escholar/uk-ac-man-scw:327033

University of Manchester
4.
Tran, Thy.
Exploiting unlabelled data for relation extraction.
Degree: PhD, 2021, University of Manchester
URL: https://www.research.manchester.ac.uk/portal/en/theses/exploiting-unlabelled-data-for-relation-extraction(dfa41a6b-25bb-4035-9a44-ce1635cf2646).html
;
https://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.823301
► Information extraction transforms unstructured text to structured by annotating semantic information on raw data. A crucial step in information extraction is relation extraction, which identifies…
(more)
▼ Information extraction transforms unstructured text to structured by annotating semantic information on raw data. A crucial step in information extraction is relation extraction, which identifies semantic relationships between named entities in text. The resulting relations can be used to construct and populate knowledge bases as well as used in various applications such as information retrieval and question answering. Relation extraction has been widely studied using fully supervised learning and distantly supervised approaches, these approaches require either manually- or automatically-annotated data. In contrast, a massive amount of unlabelled texts freely available are underused. We hence focus on leveraging the unlabelled data to improve and extend relation extraction. We approach the use of unlabelled text from three directions: (i) use it for pre-training word representations, (ii) conduct unsupervised learning, and (iii) perform weak supervision. Regarding the first direction, we want to leverage syntactic information for relation extraction. Instead of directly tuning such information on a relation extraction corpus, we propose a novel graph neural model for learning syntactically-informed word representations. The proposed method allows us to enrich pretrained word representations with syntactic information rather than re-training language models from scratch as previous work. Throughout this work, we can confirm that our novel representations are beneficial for relations in two different domains. In the second direction, we study unsupervised relation extraction, which is a promising approach because it does not require manually- or automatically-labelled data. We hypothesise that inductive biases are extremely important to direct unsupervised relation extraction. We hence employ two simple methods using only entity types to infer relations. Despite their simplicity, our methods can outperform existing approaches on two popular datasets. These surprising results suggest that entity types provide a strong inductive bias for unsupervised relation extraction. The last direction is inspired by recent evidence that large-scale pretrained language models capture some sort of relational facts. We want to investigate whether these pretrained language models can serve as weak annotators. To this end, we evaluate three large pretrained language models by matching sentences against relations' exemplars. The matching scores decide how likely a given sentence expresses a relation. The top relations are further used as weak annotations to train a relation classifier. We observe that pretrained language models are confused by highly similar relations, thus, we propose a method that models the labelling confusion to correct relation prediction. We validate the proposed method on two datasets with different characteristics, showing that it can effectively model labelling noise from our weak annotator. Overall, we illustrate that exploring the use of unlabelled data is an important step towards improving relation…
Subjects/Keywords: Unlabelled Data; Relation Extraction
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Tran, T. (2021). Exploiting unlabelled data for relation extraction. (Doctoral Dissertation). University of Manchester. Retrieved from https://www.research.manchester.ac.uk/portal/en/theses/exploiting-unlabelled-data-for-relation-extraction(dfa41a6b-25bb-4035-9a44-ce1635cf2646).html ; https://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.823301
Chicago Manual of Style (16th Edition):
Tran, Thy. “Exploiting unlabelled data for relation extraction.” 2021. Doctoral Dissertation, University of Manchester. Accessed April 11, 2021.
https://www.research.manchester.ac.uk/portal/en/theses/exploiting-unlabelled-data-for-relation-extraction(dfa41a6b-25bb-4035-9a44-ce1635cf2646).html ; https://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.823301.
MLA Handbook (7th Edition):
Tran, Thy. “Exploiting unlabelled data for relation extraction.” 2021. Web. 11 Apr 2021.
Vancouver:
Tran T. Exploiting unlabelled data for relation extraction. [Internet] [Doctoral dissertation]. University of Manchester; 2021. [cited 2021 Apr 11].
Available from: https://www.research.manchester.ac.uk/portal/en/theses/exploiting-unlabelled-data-for-relation-extraction(dfa41a6b-25bb-4035-9a44-ce1635cf2646).html ; https://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.823301.
Council of Science Editors:
Tran T. Exploiting unlabelled data for relation extraction. [Doctoral Dissertation]. University of Manchester; 2021. Available from: https://www.research.manchester.ac.uk/portal/en/theses/exploiting-unlabelled-data-for-relation-extraction(dfa41a6b-25bb-4035-9a44-ce1635cf2646).html ; https://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.823301

University of Nairobi
5.
Didas, Malekia.
Holistic approach for efficient extraction of web data
.
Degree: 2011, University of Nairobi
URL: http://erepository.uonbi.ac.ke:8080/xmlui/handle/123456789/13136
► There is a tremendous growth in the volume of information available on the internet, digital libraries, new sources and company database or intranets that contain…
(more)
▼ There is a tremendous growth in the volume of information available on the internet,
digital libraries, new sources and company database or intranets that contain valuable
information. Information from World Wide Web has been a source of information which
caters for different sectors ranging from social, political and economical spheres for
decision making. Such information would be more valuable if it can be available to the
end user and other application systems in required formats. This has caused the need for
tools to assist users in extracting relevant information in a fast and effective way. We
explore an efficient mechanism of extracting web data through analysis of HTML tags
and patterns. HTML constitutes a large percentage of web content. However, much of
this content lacks strict structure and proper schema. Additionally, web content has high
update frequency and semantic heterogeneity of the information as compared to other
format such as XML that are more firm in structure. We have managed to produce a
custornised generic model that can be used to extract unstructured data from the web and
populate it to a database. The main contribution is an automated process for locating,
extracting and storing data from HTM L web sources. Such data is then available to other
application software for analysis and other processing
Subjects/Keywords: Web data extraction;
structured data;
semi structured and unstructured data
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Didas, M. (2011). Holistic approach for efficient extraction of web data
. (Thesis). University of Nairobi. Retrieved from http://erepository.uonbi.ac.ke:8080/xmlui/handle/123456789/13136
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
Chicago Manual of Style (16th Edition):
Didas, Malekia. “Holistic approach for efficient extraction of web data
.” 2011. Thesis, University of Nairobi. Accessed April 11, 2021.
http://erepository.uonbi.ac.ke:8080/xmlui/handle/123456789/13136.
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
MLA Handbook (7th Edition):
Didas, Malekia. “Holistic approach for efficient extraction of web data
.” 2011. Web. 11 Apr 2021.
Vancouver:
Didas M. Holistic approach for efficient extraction of web data
. [Internet] [Thesis]. University of Nairobi; 2011. [cited 2021 Apr 11].
Available from: http://erepository.uonbi.ac.ke:8080/xmlui/handle/123456789/13136.
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
Council of Science Editors:
Didas M. Holistic approach for efficient extraction of web data
. [Thesis]. University of Nairobi; 2011. Available from: http://erepository.uonbi.ac.ke:8080/xmlui/handle/123456789/13136
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation

NSYSU
6.
Yang, Cheng-Ju.
Image classification via successive core tensor selection procedure.
Degree: Master, Applied Mathematics, 2018, NSYSU
URL: http://etd.lib.nsysu.edu.tw/ETD-db/ETD-search/view_etd?URN=etd-0606118-151922
► In the field of artificial intelligence, high-order tensor data have been studied and analyzed, such as the automated optical inspection and MRI. Therefore, tensor decompositions…
(more)
▼ In the field of artificial intelligence, high-order tensor
data have been studied and analyzed, such as the automated optical inspection and MRI. Therefore, tensor decompositions and classification algorithms have become an important research topic.
In a traditional neural network or machine learning method, the classification algorithm inputs training
data in the form of vectors, and the trained model can identify and classify the testing
data. In order to conform the input constraints, high-order tensor
data are often expanded into high-dimensional vectors. However, it also leads to the loss of spatially related information adjacent to different orders, thus damages the performance of the classification.
This thesis proposes a classification model combining non-negative Tucker decomposition and high-order tensors principal component analysis, and extracts feature core tensors successively to improve the accuracy of classification. Comparing with to neural network classifiers, we replace affine transformations with tensor transformations, which optimizes tensor projections to avoid missing information representing the spatial relationships in different orders, so that it extracts more complete features. For signal processing and medical image fields,
data will lose its physical significance at negative values. So many non-negative decomposition and analysis methods have also become important research issues. The non-negative Tucker decomposition referred in this paper is one of them, and it is also one of the classic high-order extensions of non-negative matrix factorization. In the classification model, non-negative Tucker decomposition can not only maintain the non-negative physical meaning, but also can ignore the difference between same class, which makes the classification accuracy increase.
This study explores the computational time cost and classification accuracy of the model. In the experiment of image recognition, the training time of the high-order tensor principal component analysis was reduced to half after combining non-negative Tucker decomposition. In terms of accuracy, the smaller the number of training
data, the more pronounced the lead of our model is.
Advisors/Committee Members: Yueh-Cheng Kuo (chair), Tzon-Tzer Lu (chair), Chieh-Sen Huang (chair), Tsung-Lin Lee (committee member).
Subjects/Keywords: data feature extraction; image classification; tensor decomposition
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Yang, C. (2018). Image classification via successive core tensor selection procedure. (Thesis). NSYSU. Retrieved from http://etd.lib.nsysu.edu.tw/ETD-db/ETD-search/view_etd?URN=etd-0606118-151922
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
Chicago Manual of Style (16th Edition):
Yang, Cheng-Ju. “Image classification via successive core tensor selection procedure.” 2018. Thesis, NSYSU. Accessed April 11, 2021.
http://etd.lib.nsysu.edu.tw/ETD-db/ETD-search/view_etd?URN=etd-0606118-151922.
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
MLA Handbook (7th Edition):
Yang, Cheng-Ju. “Image classification via successive core tensor selection procedure.” 2018. Web. 11 Apr 2021.
Vancouver:
Yang C. Image classification via successive core tensor selection procedure. [Internet] [Thesis]. NSYSU; 2018. [cited 2021 Apr 11].
Available from: http://etd.lib.nsysu.edu.tw/ETD-db/ETD-search/view_etd?URN=etd-0606118-151922.
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
Council of Science Editors:
Yang C. Image classification via successive core tensor selection procedure. [Thesis]. NSYSU; 2018. Available from: http://etd.lib.nsysu.edu.tw/ETD-db/ETD-search/view_etd?URN=etd-0606118-151922
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
7.
Pham, Thanh Van.
Detection of plant characteristics and a comparison of effectiveness between 2d and 3d data visualization in supporting human perception of plant characteristics
.
Degree: 2018, Texas A&M University – Corpus Christi
URL: https://tamucc-ir.tdl.org/handle/1969.6/87018
► Efficient agriculture requires the assessment of plant characteristics. A higher crop yield can be achieved with good quality plant characteristic data. In this research, a…
(more)
▼ Efficient agriculture requires the assessment of plant characteristics. A higher crop yield can be achieved with good quality plant characteristic
data. In this research, a system was developed using the algorithms presented here to automatically extract plant characteristics. The automatically extracted values were compared with ground-truth
data to evaluate the accuracy of the system. As well, the effectiveness of using 2 or 3-dimensional
data visualization for determining these characteristics is studied. An experiment was conducted to investigate how effective plant characteristics are evaluated when using 2D or 3D
data visualizations. Participants were presented with either plant pictures (2D) or 3D plant models and tasked with identifying plant height and the number of leaves. Task completion times and accuracy rates were gathered for performance analysis.
Advisors/Committee Members: King, Scott A (advisor), Lee, Byung Cheol (committeeMember), Sheta, Alaa (committeeMember).
Subjects/Keywords: Agriculture;
Characteristic;
Data Visualization;
Feature extraction;
Plant
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Pham, T. V. (2018). Detection of plant characteristics and a comparison of effectiveness between 2d and 3d data visualization in supporting human perception of plant characteristics
. (Thesis). Texas A&M University – Corpus Christi. Retrieved from https://tamucc-ir.tdl.org/handle/1969.6/87018
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
Chicago Manual of Style (16th Edition):
Pham, Thanh Van. “Detection of plant characteristics and a comparison of effectiveness between 2d and 3d data visualization in supporting human perception of plant characteristics
.” 2018. Thesis, Texas A&M University – Corpus Christi. Accessed April 11, 2021.
https://tamucc-ir.tdl.org/handle/1969.6/87018.
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
MLA Handbook (7th Edition):
Pham, Thanh Van. “Detection of plant characteristics and a comparison of effectiveness between 2d and 3d data visualization in supporting human perception of plant characteristics
.” 2018. Web. 11 Apr 2021.
Vancouver:
Pham TV. Detection of plant characteristics and a comparison of effectiveness between 2d and 3d data visualization in supporting human perception of plant characteristics
. [Internet] [Thesis]. Texas A&M University – Corpus Christi; 2018. [cited 2021 Apr 11].
Available from: https://tamucc-ir.tdl.org/handle/1969.6/87018.
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
Council of Science Editors:
Pham TV. Detection of plant characteristics and a comparison of effectiveness between 2d and 3d data visualization in supporting human perception of plant characteristics
. [Thesis]. Texas A&M University – Corpus Christi; 2018. Available from: https://tamucc-ir.tdl.org/handle/1969.6/87018
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation

Delft University of Technology
8.
Kreuk, Laura (author).
Sentiment Analysis: a comparison of feature sets for social data and reviews.
Degree: 2018, Delft University of Technology
URL: http://resolver.tudelft.nl/uuid:eca6e7b5-a846-424b-ba44-84c060c29d97
► Consumers share their experiences or opinion about products or brands in various channels nowadays, for example on review websites or social media. Sentiment analysis is…
(more)
▼ Consumers share their experiences or opinion about products or brands in various channels nowadays, for example on review websites or social media. Sentiment analysis is used to predict the sentiment of text from consumers about these products or brands in order to understand the tone of customers towards these products or brands. This thesis addresses sentiment analysis in the product domain on sentence level. In this thesis three data types are used which are collected by Unilever, review data which is text that contains the opinion of a customer towards a specific product. Social data, which can be tweets, Facebook messages, Instagram messages etc. and phone data which is a summary of a phone call of a customer about a specific product. When conducting sentiment analysis one solution is to extract features from the data which can be given to a machine learning algorithm together with sentiment labels given by human annotators. The machine learning algorithm will generate a classifier which can predict a label for sentences. In sentiment analysis literature it is often not clear why certain features are chosen or for which data type certain features will work well. In this research we compare the differences when using several feature sets for the different data types. We propose three feature sets for review data and three feature sets for social data. We focus on two aspects, comparing the different feature sets and comparing the data types. In our results we do not find significant differences in performance between the feature sets. The results suggest there might be feature sets which can improve sentiment analysis specifically for the data type, but a general feature set with standard features can be comparable to that result.
Computer Science | Data Science and Technology | Information Architecture
Advisors/Committee Members: Tintarev, Nava (mentor), Houben, Geert-Jan (graduation committee), Urbano Merino, Julian (graduation committee), Delft University of Technology (degree granting institution).
Subjects/Keywords: Sentiment Analysis; feature extraction; reviews; social data
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Kreuk, L. (. (2018). Sentiment Analysis: a comparison of feature sets for social data and reviews. (Masters Thesis). Delft University of Technology. Retrieved from http://resolver.tudelft.nl/uuid:eca6e7b5-a846-424b-ba44-84c060c29d97
Chicago Manual of Style (16th Edition):
Kreuk, Laura (author). “Sentiment Analysis: a comparison of feature sets for social data and reviews.” 2018. Masters Thesis, Delft University of Technology. Accessed April 11, 2021.
http://resolver.tudelft.nl/uuid:eca6e7b5-a846-424b-ba44-84c060c29d97.
MLA Handbook (7th Edition):
Kreuk, Laura (author). “Sentiment Analysis: a comparison of feature sets for social data and reviews.” 2018. Web. 11 Apr 2021.
Vancouver:
Kreuk L(. Sentiment Analysis: a comparison of feature sets for social data and reviews. [Internet] [Masters thesis]. Delft University of Technology; 2018. [cited 2021 Apr 11].
Available from: http://resolver.tudelft.nl/uuid:eca6e7b5-a846-424b-ba44-84c060c29d97.
Council of Science Editors:
Kreuk L(. Sentiment Analysis: a comparison of feature sets for social data and reviews. [Masters Thesis]. Delft University of Technology; 2018. Available from: http://resolver.tudelft.nl/uuid:eca6e7b5-a846-424b-ba44-84c060c29d97

University of Waterloo
9.
Farid, Mina H.
Query Optimization for On-Demand Information Extraction Tasks over Text Databases.
Degree: 2012, University of Waterloo
URL: http://hdl.handle.net/10012/6593
► Many modern applications involve analyzing large amounts of data that comes from unstructured text documents. In its original format, data contains information that, if extracted,…
(more)
▼ Many modern applications involve analyzing large amounts of data that comes from unstructured text documents. In its original format, data contains information that, if extracted, can give more insight and help in the decision-making process. The ability to answer structured SQL queries over unstructured data allows for more complex data analysis. Querying unstructured data can be accomplished with the help of information extraction (IE) techniques. The traditional way is by using the Extract-Transform-Load (ETL) approach, which performs all possible extractions over the document corpus and stores the extracted relational results in a data warehouse. Then, the extracted data is queried. The ETL approach produces results that are out of date and causes an explosion in the number of possible relations and attributes to extract. Therefore, new approaches to perform extraction on-the-fly were developed; however, previous efforts relied on specialized extraction operators, or particular IE algorithms, which limited the optimization opportunities of such queries.
In this work, we propose an on-line approach that integrates the engine of the database management system with IE systems using a new type of view called extraction views. Queries on text documents are evaluated using these extraction views, which get populated at query-time with newly extracted data. Our approach enables the optimizer to apply all well-defined optimization techniques. The optimizer selects the best execution plan using a defined cost model that considers a user-defined balance between the cost and quality of extraction, and we explain the trade-off between the two factors. The main contribution is the ability to run on-demand information extraction to consider latest changes in the data, while avoiding unnecessary extraction from irrelevant text documents.
Subjects/Keywords: Database; Query Optimization; Information Extraction; Data Quality
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Farid, M. H. (2012). Query Optimization for On-Demand Information Extraction Tasks over Text Databases. (Thesis). University of Waterloo. Retrieved from http://hdl.handle.net/10012/6593
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
Chicago Manual of Style (16th Edition):
Farid, Mina H. “Query Optimization for On-Demand Information Extraction Tasks over Text Databases.” 2012. Thesis, University of Waterloo. Accessed April 11, 2021.
http://hdl.handle.net/10012/6593.
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
MLA Handbook (7th Edition):
Farid, Mina H. “Query Optimization for On-Demand Information Extraction Tasks over Text Databases.” 2012. Web. 11 Apr 2021.
Vancouver:
Farid MH. Query Optimization for On-Demand Information Extraction Tasks over Text Databases. [Internet] [Thesis]. University of Waterloo; 2012. [cited 2021 Apr 11].
Available from: http://hdl.handle.net/10012/6593.
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
Council of Science Editors:
Farid MH. Query Optimization for On-Demand Information Extraction Tasks over Text Databases. [Thesis]. University of Waterloo; 2012. Available from: http://hdl.handle.net/10012/6593
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation

University of Georgia
10.
Kale, Sayali Shashikant.
Tracking mental disorders across Twitter users.
Degree: 2016, University of Georgia
URL: http://hdl.handle.net/10724/35355"
► The prevalence of mental health disorders is often undetected, leading to a serious issue which continues to affect all parts of society. Recurrent psychological patterns…
(more)
▼ The prevalence of mental health disorders is often undetected, leading to a serious issue which continues to affect all parts of society. Recurrent psychological patterns can be identified with the help of popular social networking websites.
These patterns can depict one’s thoughts and feelings in everyday life. Our research targets Twitter data to identify users who could potentially suffer from mental disorders, and classify them based on the intensity of linguistic usage and different
behavioral features using sentiment analysis techniques. To confront the growing problem of mental disorders, we demonstrate a novel approach for the extraction of data and focus on the analysis of depression, schizophrenia, anxiety disorders, drug abuse
and seasonal affective disorders. Our system can be used not only to identify, but also to quantify users' progression by following them on Twitter for a certain period of time. This can eventually help medical professionals and public health experts to
monitor symptoms and progression patterns of mental disorders in social media users.
Subjects/Keywords: Twitter; Sentiment Analysis; Mental Health; Data Extraction
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Kale, S. S. (2016). Tracking mental disorders across Twitter users. (Thesis). University of Georgia. Retrieved from http://hdl.handle.net/10724/35355"
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
Chicago Manual of Style (16th Edition):
Kale, Sayali Shashikant. “Tracking mental disorders across Twitter users.” 2016. Thesis, University of Georgia. Accessed April 11, 2021.
http://hdl.handle.net/10724/35355".
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
MLA Handbook (7th Edition):
Kale, Sayali Shashikant. “Tracking mental disorders across Twitter users.” 2016. Web. 11 Apr 2021.
Vancouver:
Kale SS. Tracking mental disorders across Twitter users. [Internet] [Thesis]. University of Georgia; 2016. [cited 2021 Apr 11].
Available from: http://hdl.handle.net/10724/35355".
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
Council of Science Editors:
Kale SS. Tracking mental disorders across Twitter users. [Thesis]. University of Georgia; 2016. Available from: http://hdl.handle.net/10724/35355"
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation

University of Windsor
11.
Peravali, Bindu.
Comparative Mining of B2C Web Sites by Discovering Web Database Schemas.
Degree: MS, Computer Science, 2016, University of Windsor
URL: https://scholar.uwindsor.ca/etd/5861
► Discovering potentially useful and previously unknown historical knowledge from heterogeneous E-Commerce (B2C) web site contents to answer comparative queries such as “list all laptop prices…
(more)
▼ Discovering potentially useful and previously unknown historical knowledge from heterogeneous E-Commerce (B2C) web site contents to answer comparative queries such as “list all laptop prices from Walmart and Staples between 2013 and 2015 including make, type, screen size, CPU power, year of make”, would require the difficult task of finding the schema of web documents from different web pages, extracting target information and performing web content
data integration, building their virtual or physical
data warehouse and mining from it. Automatic
data extractors (wrappers) such as the WebOMiner system use
data extraction techniques based on parsing the web page html source code into a document object model (DOM) tree, traversing the DOM for pattern discovery to recognize and extract different web
data types (e.g., text, image, links, and lists). Some limitations of the existing systems include using complicated matching techniques such as tree matching, non-deterministic finite state automata (NFA), domain ontology and inability to answer complex comparative historical and derived queries. This thesis proposes building the WebOMiner_S which uses web structure and content mining approaches on the DOM¬ tree html code to simplify and make more easily extendable the WebOMiner system
data extraction process. We propose to replace the use of NFA in the WebOMiner with a frequent structure finder algorithm which uses regular expression matching in Java XPATH parser with its methods to dynamically discover the most frequent structure (which is the most frequently repeated blocks in the html code represented as tags < div class = “ ′′ >) in the Dom tree. This approach eliminates the need for any supervised training or updating the wrapper for each new B2C web page making the approach simpler, more easily extendable and automated. Experiments show that the WebOMiner_S achieves a 100% precision and 100% recall in identifying the product records, 95.55% precision and 100% recall in identifying the
data columns.
Advisors/Committee Members: Ezeife, Christie.
Subjects/Keywords: Automatic Web Data Extraction; Data integration; Web Content Mining; Wrappers
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Peravali, B. (2016). Comparative Mining of B2C Web Sites by Discovering Web Database Schemas. (Masters Thesis). University of Windsor. Retrieved from https://scholar.uwindsor.ca/etd/5861
Chicago Manual of Style (16th Edition):
Peravali, Bindu. “Comparative Mining of B2C Web Sites by Discovering Web Database Schemas.” 2016. Masters Thesis, University of Windsor. Accessed April 11, 2021.
https://scholar.uwindsor.ca/etd/5861.
MLA Handbook (7th Edition):
Peravali, Bindu. “Comparative Mining of B2C Web Sites by Discovering Web Database Schemas.” 2016. Web. 11 Apr 2021.
Vancouver:
Peravali B. Comparative Mining of B2C Web Sites by Discovering Web Database Schemas. [Internet] [Masters thesis]. University of Windsor; 2016. [cited 2021 Apr 11].
Available from: https://scholar.uwindsor.ca/etd/5861.
Council of Science Editors:
Peravali B. Comparative Mining of B2C Web Sites by Discovering Web Database Schemas. [Masters Thesis]. University of Windsor; 2016. Available from: https://scholar.uwindsor.ca/etd/5861
12.
Ales, Zacharie.
Extraction et partitionnement pour la recherche de régularités : application à l’analyse de dialogues : Extraction and clustering for regularities identification : application to dialogues analysis.
Degree: Docteur es, Mathématiques/Informatique, 2014, Rouen, INSA
URL: http://www.theses.fr/2014ISAM0015
► Dans le cadre de l’aide à l’analyse de dialogues, un corpus de dialogues peut être représenté par un ensemble de tableaux d’annotations encodant les différents…
(more)
▼ Dans le cadre de l’aide à l’analyse de dialogues, un corpus de dialogues peut être représenté par un ensemble de tableaux d’annotations encodant les différents énoncés des dialogues. Afin d’identifier des schémas dialogiques mis en oeuvre fréquemment, nous définissons une méthodologie en deux étapes : extraction de motifs récurrents, puis partitionnement de ces motifs en classes homogènes constituant ces régularités. Deux méthodes sont développées afin de réaliser l’extraction de motifs récurrents : LPCADC et SABRE. La première est une adaptation d’un algorithme de programmation dynamique tandis que la seconde est issue d’une modélisation formelle du problème d’extraction d’alignements locaux dans un couple de tableaux d’annotations.Le partitionnement de motifs récurrents est réalisé par diverses heuristiques de la littérature ainsi que deux formulations originales du problème de K-partitionnement sous la forme de programmes linéaires en nombres entiers. Lors d’une étude polyèdrale, nous caractérisons des facettes d’un polyèdre associé à ces formulations (notamment les inégalités de 2-partitions, les inégalités 2-chorded cycles et les inégalités de clique généralisées). Ces résultats théoriques permettent la mise en place d’un algorithme de plans coupants résolvant efficacement le problème.Nous développons le logiciel d’aide à la décision VIESA, mettant en oeuvre ces différentes méthodes et permettant leur évaluation au cours de deux expérimentations réalisées par un expert psychologue. Des régularités correspondant à des stratégies dialogiques que des extractions manuelles n’avaient pas permis d’obtenir sont ainsi identifiées.
In the context of dialogue analysis, a corpus of dialogues can be represented as a set of arrays of annotations encoding the dialogue utterances. In order to identify the frequently used dialogue schemes, we design a two-step methodology in which recurrent patterns are first extracted and then partitioned into homogenous classes constituting the regularities. Two methods are developed to extract recurrent patterns: LPCA-DC and SABRE. The former is an adaptation of a dynamic programming algorithm whereas the latter is obtained from a formal modeling of the extraction of local alignment problem in annotations arrays.The partitioning of recurrent patterns is realised using various heuristics from the literature as well as two original formulations of the K-partitioning problem in the form of mixed integer linear programs. Throughout a polyhedral study of a polyhedron associated to these formulations, facets are characterized (in particular: 2-chorded cycle inequalities, 2-partition inequalities and general clique inequalities). These theoretical results allow the establishment of an efficient cutting plane algorithm.We developed a decision support software called VIESA which implements these different methods and allows their evaluation during two experiments realised by a psychologist. Thus, regularities corresponding to dialogical strategies that previous manual extractions failed to…
Advisors/Committee Members: Vercouter, Laurent (thesis director), Gout, Christian (thesis director).
Subjects/Keywords: Extraction de régularités; K-partitionnement; Approche polyèdrale; Combinatorial optimization; Regularity extraction; Data mining
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Ales, Z. (2014). Extraction et partitionnement pour la recherche de régularités : application à l’analyse de dialogues : Extraction and clustering for regularities identification : application to dialogues analysis. (Doctoral Dissertation). Rouen, INSA. Retrieved from http://www.theses.fr/2014ISAM0015
Chicago Manual of Style (16th Edition):
Ales, Zacharie. “Extraction et partitionnement pour la recherche de régularités : application à l’analyse de dialogues : Extraction and clustering for regularities identification : application to dialogues analysis.” 2014. Doctoral Dissertation, Rouen, INSA. Accessed April 11, 2021.
http://www.theses.fr/2014ISAM0015.
MLA Handbook (7th Edition):
Ales, Zacharie. “Extraction et partitionnement pour la recherche de régularités : application à l’analyse de dialogues : Extraction and clustering for regularities identification : application to dialogues analysis.” 2014. Web. 11 Apr 2021.
Vancouver:
Ales Z. Extraction et partitionnement pour la recherche de régularités : application à l’analyse de dialogues : Extraction and clustering for regularities identification : application to dialogues analysis. [Internet] [Doctoral dissertation]. Rouen, INSA; 2014. [cited 2021 Apr 11].
Available from: http://www.theses.fr/2014ISAM0015.
Council of Science Editors:
Ales Z. Extraction et partitionnement pour la recherche de régularités : application à l’analyse de dialogues : Extraction and clustering for regularities identification : application to dialogues analysis. [Doctoral Dissertation]. Rouen, INSA; 2014. Available from: http://www.theses.fr/2014ISAM0015

University of Oxford
13.
Cheng, Wang.
AMBER : a domain-aware template based system for data extraction.
Degree: PhD, 2015, University of Oxford
URL: http://ora.ox.ac.uk/objects/uuid:ff49d786-bfd8-4cd4-a69c-19e81cb95920
;
https://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.667031
► The web is the greatest information source in human history, yet finding all offers for flats with gardens in London, Paris, and Berlin or all…
(more)
▼ The web is the greatest information source in human history, yet finding all offers for flats with gardens in London, Paris, and Berlin or all restaurants open after a screening of the latest blockbuster remain hard tasks – as that data is not easily amenable to processing. Extracting web data into databases for easier processing has been a resource-intensive process, requiring human supervision for every source from which to extract. This has been changing with approaches that replace human annotators with automated annotations. Such approaches could be successfully applied to restricted settings such as single attribute extraction or for domains with significant redundancy among sources. Multi-attribute objects are often presented on (i) Result pages, where multiple objects are presented on a single page as lists, tables or grids, with most important attributes and a summary description, (ii) Detail pages, where each page provides a detailed list of attributes and long description for a single entity, often in rich format. Both result and detail pages are having their own advantages. Extracting objects from result pages is orders of magnitude faster than from detail pages, and the links to detail pages are often only accessible through result pages. Detail pages have a complete list of attributes and full description of the entity. Early web data extraction approaches requires manual annotations for each web site to reach high accuracy, while a number of domain independent approaches only focus on unsupervised repeated structure segmentation. The former is limited in scaling and automation, while the latter is lacked in accuracy. Recent automated data extraction systems are often informed with an ontology and a set of object and attribute recognizers, however they have focused on extracting simple objects with few attributes from single-entity pages and avoided result pages. We present an automatic ontology-based multi-attribute object extraction system AMBER, which deals with both result and detail pages, achieves very high accuracy (>96%) with zero site-specific supervision, and is able to solve practical issues that arise in real-life data extraction tasks. AMBER is also applied as an important component of DIADEM, the first automatic full-site extraction system that is able to extract structured data from different domains without site-specific supervision, and has been tested through a large-scale evaluation (>10, 000) sites. On the result page side, AMBER achieves high accuracy through a novel domain- aware, path-based template discovery algorithm, and integrates annotations for all parts of the extraction, from identifying the primary list of objects, over segment- ing the individual objects, to aligning the attributes. Yet, AMBER is able to tolerate significant noise in the annotations, by combining these annotations with a novel algorithm for finding regular structures based on XPATH expressions that capture regular tree structures. On the detail page side, AMBER integrates boilerplate removal, dynamic…
Subjects/Keywords: 006.3; Applications and algorithms; Program development and tools; data extraction; web extraction; result page analysis
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Cheng, W. (2015). AMBER : a domain-aware template based system for data extraction. (Doctoral Dissertation). University of Oxford. Retrieved from http://ora.ox.ac.uk/objects/uuid:ff49d786-bfd8-4cd4-a69c-19e81cb95920 ; https://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.667031
Chicago Manual of Style (16th Edition):
Cheng, Wang. “AMBER : a domain-aware template based system for data extraction.” 2015. Doctoral Dissertation, University of Oxford. Accessed April 11, 2021.
http://ora.ox.ac.uk/objects/uuid:ff49d786-bfd8-4cd4-a69c-19e81cb95920 ; https://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.667031.
MLA Handbook (7th Edition):
Cheng, Wang. “AMBER : a domain-aware template based system for data extraction.” 2015. Web. 11 Apr 2021.
Vancouver:
Cheng W. AMBER : a domain-aware template based system for data extraction. [Internet] [Doctoral dissertation]. University of Oxford; 2015. [cited 2021 Apr 11].
Available from: http://ora.ox.ac.uk/objects/uuid:ff49d786-bfd8-4cd4-a69c-19e81cb95920 ; https://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.667031.
Council of Science Editors:
Cheng W. AMBER : a domain-aware template based system for data extraction. [Doctoral Dissertation]. University of Oxford; 2015. Available from: http://ora.ox.ac.uk/objects/uuid:ff49d786-bfd8-4cd4-a69c-19e81cb95920 ; https://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.667031

University of Cincinnati
14.
Ghanem, Amer G.
Identifying Patterns of Epistemic Organization through
Network-Based Analysis of Text Corpora.
Degree: PhD, Engineering and Applied Science: Computer Science and
Engineering, 2015, University of Cincinnati
URL: http://rave.ohiolink.edu/etdc/view?acc_num=ucin1448274706
► The growth of on-line textual content has exploded in recent years, creating truly massive text corpora. As the quantity of text available on- line increases,…
(more)
▼ The growth of on-line textual content has exploded in
recent years, creating truly massive text corpora. As the quantity
of text available on- line increases, professionals from different
industries such as marketing and politics are realizing the
importance of extracting useful information and insights from this
treasure trove of
data. It is also clear, however, that doing so
requires methods that go beyond those developed for classical
data
processing or even natural language processing. In particular,
there is great need for efficient methods that can make sense of
the semantic content of this
data, and allows new knowledge to be
inferred from it.The research in this dissertation describes a new
method for identify- ing latent structures (topics) in texts
through the application of community
extraction techniques on
associative networks of words. Since humans rep- resent knowledge
in terms of associations, it is asserted that deriving top- ics
from associative networks represents a more cognitively meaningful
approach than using purely statistical patterns.The topic
identification method proposed in this thesis is called Topic
Extraction through Partitioning of Lexical Associative Networks
(TExPLAN). It begins by constructing an associative network of
words where the strength of their association indicates the
frequency of their co-occurrence in documents. Once the word
network is constructed, the algorithm proceeds in two stages. In
the first stage, a partitioning of the word network takes place
using a community
extraction method to extract disjoint seed
topics. The second stage of TExPLAN uses the connectivity of words
across the boundaries of seed topics to assign a relevance measure
to each word in each topic, thus generating a set of topics where
each one covers all the words in the vocabulary, as is the case
with LDA.The topics extracted by TExPLAN are used to define an
epistemic metric space in which epistemic entities such as words,
texts, documents, collections of documents, etc. can be embedded
and compared. Once the dimensions are defined, the entities are
visualized in two-dimensional space using multidimensional scaling.
Because of its generality, the different types of entities can be
analyzed jointly in the epistemic space. For this part of the
thesis, we demonstrate the capabilities of the approach by applying
it to the DBLP dataset, identifying similar conferences based on
their locations in the epistemic space and deriving areas of
interest associated with each conference. We are also able to
analyze the epistemic diversity of conferences and determine which
ones tend to attract more diverse authors and publications. Another
part of the analysis focuses on authors and their participation in
conferences. We define prominent status and answer questions about
authors that have this status. We also look at the different ways
an author can become prominent, and tie that to their epistemic
diversity. Finally, we look at prominent authors who tend to
publish documents that are relatively far from the…
Advisors/Committee Members: Minai, Ali (Committee Chair).
Subjects/Keywords: Computer Science; Data Mining; Text Mining; Topic Extraction; Semantic Analysis; Community Extraction; Semantic Spaces
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Ghanem, A. G. (2015). Identifying Patterns of Epistemic Organization through
Network-Based Analysis of Text Corpora. (Doctoral Dissertation). University of Cincinnati. Retrieved from http://rave.ohiolink.edu/etdc/view?acc_num=ucin1448274706
Chicago Manual of Style (16th Edition):
Ghanem, Amer G. “Identifying Patterns of Epistemic Organization through
Network-Based Analysis of Text Corpora.” 2015. Doctoral Dissertation, University of Cincinnati. Accessed April 11, 2021.
http://rave.ohiolink.edu/etdc/view?acc_num=ucin1448274706.
MLA Handbook (7th Edition):
Ghanem, Amer G. “Identifying Patterns of Epistemic Organization through
Network-Based Analysis of Text Corpora.” 2015. Web. 11 Apr 2021.
Vancouver:
Ghanem AG. Identifying Patterns of Epistemic Organization through
Network-Based Analysis of Text Corpora. [Internet] [Doctoral dissertation]. University of Cincinnati; 2015. [cited 2021 Apr 11].
Available from: http://rave.ohiolink.edu/etdc/view?acc_num=ucin1448274706.
Council of Science Editors:
Ghanem AG. Identifying Patterns of Epistemic Organization through
Network-Based Analysis of Text Corpora. [Doctoral Dissertation]. University of Cincinnati; 2015. Available from: http://rave.ohiolink.edu/etdc/view?acc_num=ucin1448274706

INP Toulouse
15.
Poulain, Vincent.
Fusion d'images optique et radar à haute résolution pour la mise à jour de bases de données cartographiques : Fusion of high resolution optical and SAR images to update cartographic databases.
Degree: Docteur es, Signal, Image, Acoustique et Optimisation, 2010, INP Toulouse
URL: http://www.theses.fr/2010INPT0093
► Cette thèse se situe dans le cadre de l'interprétation d'images satellite à haute résolution, et concerne plus spécifiquement la mise à jour de bases de…
(more)
▼ Cette thèse se situe dans le cadre de l'interprétation d'images satellite à haute résolution, et concerne plus spécifiquement la mise à jour de bases de données cartographiques grâce à des images optique et radar à haute résolution. Cette étude présente une chaîne de traitement générique pour la création ou la mise à jour de bases de données représentant les routes ou les bâtiments en milieu urbain. En fonction des données disponibles, différents scénarios sont envisagés. Le traitement est effectué en deux étapes. D'abord nous cherchons les objets qui doivent être retirés de la base de données. La seconde étape consiste à rechercher dans les images de nouveaux objets à ajouter dans la base de données. Pour réaliser ces deux étapes, des descripteurs sont construits dans le but de caractériser les objets d'intérêt dans les images d'entrée. L'inclusion ou élimination des objets dans la base de données est basée sur un score obtenu après fusion des descripteurs dans le cadre de la théorie de Dempster-Shafer. Les résultats présentés dans cette thèse illustrent l'intérêt d'une fusion multi-capteurs. De plus l'intégration aisée de nouveaux descripteurs permet à la chaîne d'être améliorable et adaptable à d'autres objets.
This work takes place in the framework of high resolution remote sensing image analysis. It focuses on the issue of cartographic database creation or updating with optical and SAR images. The goal of this work is to build a generic processing chain to update or create a cartographic database representing roads and buildings in built-up areas. According to available data, various scenarios are foreseen. The proposed processing chain is composed of two steps. First, if a database is available, the presence of each database object is checked in the images. The second step consist of looking for new objects that should be included in the database. To determine if an object should be present in the updated database, relevant features are extracted from images in the neighborhood of the considered object. Those features are based on caracteristics of roads and buildings in SAR and optical images. The object removal/inclusion in the DB is based on a score obtained by the fusion of features in the framework of the Dempster-Shafer evidence theory. Results highlight the interest of multi sensor fusion. Moreover the chosen framework allows the easy integration of new features in the processing chain.
Advisors/Committee Members: Marthon, Philippe (thesis director), Tourneret, Jean-Yves (thesis director).
Subjects/Keywords: Traitement d'image; Fusion de données; Extraction de routes; Extraction de bâtiments; Image optique; Image RSO; Image processing; Data fusion; Road extraction; Building extraction; Optical image; SAR image
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Poulain, V. (2010). Fusion d'images optique et radar à haute résolution pour la mise à jour de bases de données cartographiques : Fusion of high resolution optical and SAR images to update cartographic databases. (Doctoral Dissertation). INP Toulouse. Retrieved from http://www.theses.fr/2010INPT0093
Chicago Manual of Style (16th Edition):
Poulain, Vincent. “Fusion d'images optique et radar à haute résolution pour la mise à jour de bases de données cartographiques : Fusion of high resolution optical and SAR images to update cartographic databases.” 2010. Doctoral Dissertation, INP Toulouse. Accessed April 11, 2021.
http://www.theses.fr/2010INPT0093.
MLA Handbook (7th Edition):
Poulain, Vincent. “Fusion d'images optique et radar à haute résolution pour la mise à jour de bases de données cartographiques : Fusion of high resolution optical and SAR images to update cartographic databases.” 2010. Web. 11 Apr 2021.
Vancouver:
Poulain V. Fusion d'images optique et radar à haute résolution pour la mise à jour de bases de données cartographiques : Fusion of high resolution optical and SAR images to update cartographic databases. [Internet] [Doctoral dissertation]. INP Toulouse; 2010. [cited 2021 Apr 11].
Available from: http://www.theses.fr/2010INPT0093.
Council of Science Editors:
Poulain V. Fusion d'images optique et radar à haute résolution pour la mise à jour de bases de données cartographiques : Fusion of high resolution optical and SAR images to update cartographic databases. [Doctoral Dissertation]. INP Toulouse; 2010. Available from: http://www.theses.fr/2010INPT0093

Vilnius Gediminas Technical University
16.
Barauskas,
Antanas.
Duomenų gavimas iš daugialypių šaltinių ir jų
struktūrizavimas.
Degree: Master, Informatics, 2014, Vilnius Gediminas Technical University
URL: http://vddb.laba.lt/obj/LT-eLABa-0001:E.02~2014~D_20140619_092257-64025
;
► Šio darbo idėja yra Išgauti-Pertvarkyti-Įkelti (angl. ETL) principu veikiančios sistemos sukūrimas. Sistema išgauna duomenis iš skirtingo tipo šaltinių, juos tinkamai pertvarko ir tik tuomet įkelia…
(more)
▼ Šio darbo idėja yra
Išgauti-Pertvarkyti-Įkelti (angl. ETL) principu veikiančios
sistemos sukūrimas. Sistema išgauna duomenis iš skirtingo tipo
šaltinių, juos tinkamai pertvarko ir tik tuomet įkelia į parinktą
saugojimo vietą. Išnagrinėti pagrindiniai duomenų gavimo būdai ir
populiariausi šiuo metu ETL įrankiai. Sukurta debesų kompiuterija
paremtos daugiakomponentinės duomenų gavimo iš daugialypių šaltinių
ir jų struktūrizavimo vieningu formatu sistemos architektūra ir
prototipas. Skirtingai nuo duomenis kaupiančių sistemų, ši sistema
duomenis išgauna tik tuomet, kai jie reikalingi. Duomenų saugojimui
naudojama grafu paremta duomenų bazė, kuri leidžia saugoti ne tik
duomenis bet ir jų tarpusavio ryšių informaciją. Darbo apimtis: 48
puslapiai, 19 paveikslėlių, 10 lentelių ir 30 informacijos
šaltinių.
The aim of this work is to create ETL
(Extract-Transform-Load) system for data extraction from different
types of data sources, proper transformation of the extracted data
and loading the transformed data into the selected place of
storage. The main techniques of data extraction and the most
popular ETL tools available today have been analyzed. An
architectural solution based on cloud computing, as well as, a
prototype of the system for data extraction from multiple sources
and data structurization have been created. Unlike the traditional
data storing - based systems, the proposed system allows to extract
data only in case it is needed for analysis. The graph database
employed for data storage enables to store not only the data, but
also the information about the relations of the entities.
Structure: 48 pages, 19 figures, 10 tables and 30
references.
Advisors/Committee Members: Kulvietis, Genadijus (Master’s thesis supervisor), Mamčenko, Jelena (Master’s thesis reviewer), Šileikienė, Irma (Master’s thesis reviewer), Krupovnickas, Algirdas (Master’s degree committee chair), Kulvietis, Genadijus (Master’s degree committee member), Ostašius, Egidijus (Master’s degree committee member), Šileikienė, Irma (Master’s degree committee member), Mamčenko, Jelena (Master’s degree committee member), Šešok, Dmitrij (Master’s degree committee member), Krylovas, Aleksandras (Master’s degree committee member).
Subjects/Keywords: Duomenų
gavimas; Duomenų
pertvarkymas; Duomenų
įkėlimas; Duomenų
saugykla; Data
extraction; Data
transformation; Data loading; Data
warehouse
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Barauskas,
Antanas. (2014). Duomenų gavimas iš daugialypių šaltinių ir jų
struktūrizavimas. (Masters Thesis). Vilnius Gediminas Technical University. Retrieved from http://vddb.laba.lt/obj/LT-eLABa-0001:E.02~2014~D_20140619_092257-64025 ;
Note: this citation may be lacking information needed for this citation format:
Author name may be incomplete
Chicago Manual of Style (16th Edition):
Barauskas,
Antanas. “Duomenų gavimas iš daugialypių šaltinių ir jų
struktūrizavimas.” 2014. Masters Thesis, Vilnius Gediminas Technical University. Accessed April 11, 2021.
http://vddb.laba.lt/obj/LT-eLABa-0001:E.02~2014~D_20140619_092257-64025 ;.
Note: this citation may be lacking information needed for this citation format:
Author name may be incomplete
MLA Handbook (7th Edition):
Barauskas,
Antanas. “Duomenų gavimas iš daugialypių šaltinių ir jų
struktūrizavimas.” 2014. Web. 11 Apr 2021.
Note: this citation may be lacking information needed for this citation format:
Author name may be incomplete
Vancouver:
Barauskas,
Antanas. Duomenų gavimas iš daugialypių šaltinių ir jų
struktūrizavimas. [Internet] [Masters thesis]. Vilnius Gediminas Technical University; 2014. [cited 2021 Apr 11].
Available from: http://vddb.laba.lt/obj/LT-eLABa-0001:E.02~2014~D_20140619_092257-64025 ;.
Note: this citation may be lacking information needed for this citation format:
Author name may be incomplete
Council of Science Editors:
Barauskas,
Antanas. Duomenų gavimas iš daugialypių šaltinių ir jų
struktūrizavimas. [Masters Thesis]. Vilnius Gediminas Technical University; 2014. Available from: http://vddb.laba.lt/obj/LT-eLABa-0001:E.02~2014~D_20140619_092257-64025 ;
Note: this citation may be lacking information needed for this citation format:
Author name may be incomplete

Utah State University
17.
Raza, Ali.
Test Data Extraction and Comparison with Test Data Generation.
Degree: MS, Computer Science, 2011, Utah State University
URL: https://digitalcommons.usu.edu/etd/982
► Testing an integrated information system that relies on data from multiple sources can be a challenge, particularly when the data is confidential. This thesis…
(more)
▼ Testing an integrated information system that relies on
data from multiple sources can be a challenge, particularly when the
data is confidential. This thesis describes a novel test
data extraction approach, called semantic-based test
data extraction for integrated systems (iSTDE) that solves many of the problems associated with creating realistic test
data for integrated information systems containing confidential
data. iSTDE reads a consistent cross-section of
data from the production databases, manipulates that
data to obscure individual identities while still preserving overall semantic
data characteristics that are critical to thorough system testing, and then moves that test
data to an external test environment.
This thesis also presents a theoretical study that compares test-
data extraction with a competing technique, named test-
data generation. Specifically, this thesis a) describes a comparison method that includes a comprehensive list of characteristics essential for testing the database applications organized into seven different areas, b) presents an analysis of the relative strengths and weaknesses of the different test-
data creation techniques, and c) reports a number of specific conclusions that will help testers make appropriate choices.
Advisors/Committee Members: Stephen W. Clyde, Vicki Allan, Renée Bryce, ;.
Subjects/Keywords: Data Integration; Data Sensitization/Anonymization; Health Informatics; Software Engineering; Test Data Extraction; Testing Data-Centric Applications; Computer Sciences
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Raza, A. (2011). Test Data Extraction and Comparison with Test Data Generation. (Masters Thesis). Utah State University. Retrieved from https://digitalcommons.usu.edu/etd/982
Chicago Manual of Style (16th Edition):
Raza, Ali. “Test Data Extraction and Comparison with Test Data Generation.” 2011. Masters Thesis, Utah State University. Accessed April 11, 2021.
https://digitalcommons.usu.edu/etd/982.
MLA Handbook (7th Edition):
Raza, Ali. “Test Data Extraction and Comparison with Test Data Generation.” 2011. Web. 11 Apr 2021.
Vancouver:
Raza A. Test Data Extraction and Comparison with Test Data Generation. [Internet] [Masters thesis]. Utah State University; 2011. [cited 2021 Apr 11].
Available from: https://digitalcommons.usu.edu/etd/982.
Council of Science Editors:
Raza A. Test Data Extraction and Comparison with Test Data Generation. [Masters Thesis]. Utah State University; 2011. Available from: https://digitalcommons.usu.edu/etd/982
18.
Patil, Preeti S.
Improved extraction mechanism in ETL process for building
of a data warehouse;.
Degree: Computer Sciences, 2013, INFLIBNET
URL: http://shodhganga.inflibnet.ac.in/handle/10603/7023
► Today the Data warehouse plays an important role in decision making, data analysis, and strategic information and so on. Extraction Transformation and Loading (ETL) process…
(more)
▼ Today the Data warehouse plays an important role in
decision making, data analysis, and strategic information and so
on. Extraction Transformation and Loading (ETL) process is used
very popularly in building up of the data warehouse. In todays
competitive Business world Mergers and Acquisitions are very
common. It requires Extraction, Transformation and Loading of a
huge amount of Organizational Data movement. This work proposed
with respect to the Time and Space domains by the improvement in
the extraction speed and file size. The existing approach is
modified to apply both the standard methods of extraction, the Full
Extraction and Incremental Extraction Procedures in the ETL
Process. Though flat files have been widely used for simplification
of the database file and existing systems does not provide any
security for the same. In this thesis we are proposing the various
security methods to apply over a flat file during the extraction
process in ETL. These security methods can be highly recommended
for very high sensitive data. Applications of security measures
presented in this thesis are effective to reduce the possibility of
the data leakage risk. To further reduce the size of the file the
use of the delimited flat files rather than fixed length flat files
converted from database files is suggested. Consequently results
with reducing the extraction time of the data file. Such conversion
proposal would certainly improve the speed of the extraction as
presented in this thesis work. This work has provided preparatory
inputs for the building process of Data warehouse and how ETL
contributes to the major role in the building process. As these
flat files are created from the database files the problems like
redundancy, consistency, and more searching time does not occur and
we can make use of the advantages of flat file during extraction
process. This work provides security to the flat files before
extraction and then it is extracted from the source to the
destination system.
Appendix p. 113-118, References p. 119-133, List of
publications p. 134-136
Advisors/Committee Members: Rao, Srikantha.
Subjects/Keywords: Computer Sciences; Change Data Capture; Data warehouse; Extraction Transformation and Loading
Process
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Patil, P. S. (2013). Improved extraction mechanism in ETL process for building
of a data warehouse;. (Thesis). INFLIBNET. Retrieved from http://shodhganga.inflibnet.ac.in/handle/10603/7023
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
Chicago Manual of Style (16th Edition):
Patil, Preeti S. “Improved extraction mechanism in ETL process for building
of a data warehouse;.” 2013. Thesis, INFLIBNET. Accessed April 11, 2021.
http://shodhganga.inflibnet.ac.in/handle/10603/7023.
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
MLA Handbook (7th Edition):
Patil, Preeti S. “Improved extraction mechanism in ETL process for building
of a data warehouse;.” 2013. Web. 11 Apr 2021.
Vancouver:
Patil PS. Improved extraction mechanism in ETL process for building
of a data warehouse;. [Internet] [Thesis]. INFLIBNET; 2013. [cited 2021 Apr 11].
Available from: http://shodhganga.inflibnet.ac.in/handle/10603/7023.
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
Council of Science Editors:
Patil PS. Improved extraction mechanism in ETL process for building
of a data warehouse;. [Thesis]. INFLIBNET; 2013. Available from: http://shodhganga.inflibnet.ac.in/handle/10603/7023
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation

Uppsala University
19.
Wrede, Fredrik.
An Explorative Parameter Sweep: Spatial-temporal Data Mining in Stochastic Reaction-diffusion Simulations.
Degree: Biology Education Centre, 2016, Uppsala University
URL: http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-280287
► Stochastic reaction-diffusion simulations has become an efficient approach for modelling spatial aspects of intracellular biochemical reaction networks. By accounting for intrinsic noise due to…
(more)
▼ Stochastic reaction-diffusion simulations has become an efficient approach for modelling spatial aspects of intracellular biochemical reaction networks. By accounting for intrinsic noise due to low copy number of chemical species, stochastic reaction-diffusion simulations have the ability to more accurately predict and model biological systems. As with many simulations software, exploration of the parameters associated with the model can be needed to yield new knowledge about the underlying system. The exploration can be conducted by executing parameter sweeps for a model. However, with little or no prior knowledge about the modelled system, the effort for practitioners to explore the parameter space can get overwhelming. To account for this problem we perform a feasibility study on an explorative behavioural analysis of stochastic reaction-diffusion simulations by applying spatial-temporal data mining to large parameter sweeps. By reducing individual simulation outputs into a feature space involving simple time series and distribution analytics, we were able to find similar behaving simulations after performing an agglomerative hierarchical clustering.
Subjects/Keywords: Big data; feature extraction; clustering; stochastic reaction-diffusion simulation; spatial-temporal; data mining; cloud computing
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Wrede, F. (2016). An Explorative Parameter Sweep: Spatial-temporal Data Mining in Stochastic Reaction-diffusion Simulations. (Thesis). Uppsala University. Retrieved from http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-280287
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
Chicago Manual of Style (16th Edition):
Wrede, Fredrik. “An Explorative Parameter Sweep: Spatial-temporal Data Mining in Stochastic Reaction-diffusion Simulations.” 2016. Thesis, Uppsala University. Accessed April 11, 2021.
http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-280287.
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
MLA Handbook (7th Edition):
Wrede, Fredrik. “An Explorative Parameter Sweep: Spatial-temporal Data Mining in Stochastic Reaction-diffusion Simulations.” 2016. Web. 11 Apr 2021.
Vancouver:
Wrede F. An Explorative Parameter Sweep: Spatial-temporal Data Mining in Stochastic Reaction-diffusion Simulations. [Internet] [Thesis]. Uppsala University; 2016. [cited 2021 Apr 11].
Available from: http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-280287.
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
Council of Science Editors:
Wrede F. An Explorative Parameter Sweep: Spatial-temporal Data Mining in Stochastic Reaction-diffusion Simulations. [Thesis]. Uppsala University; 2016. Available from: http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-280287
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation

University of Sydney
20.
Wang, Ying.
On the visual similarity analysis and visualization of art image data
.
Degree: 2014, University of Sydney
URL: http://hdl.handle.net/2123/11734
► This thesis introduces a framework for analyzing the underlying visual similarities among art images. Three types of art related images are investigated. They are: (1)…
(more)
▼ This thesis introduces a framework for analyzing the underlying visual similarities among art images. Three types of art related images are investigated. They are: (1) painterly rendered images which are ``painting-like" images generated by painterly rendering algorithms; (2) painting images which are digitized copies of paintings; (3) graphic design images which are graphic art used in various logos, trademarks and symbols. However, the main focus of this thesis is given to the later two types of ``real" art images: \ie painting and graphic design images. In the proposed framework, image features used for analyzing art images are defined to ``translate" qualitative art concepts or principles into quantitative numerical form. And then a series of Self-Organizing Map (SOM) based methods are introduced to discover, analyze and visualize the underlying visual similarities of art images. Furthermore, two Self-Organizing Map Best Matching Unit Entropy (SOM-BMU Entropy) based approaches are proposed to compare and visualize the data similarity of multiple sets of images. Based on the proposed framework, applications such as reverse painterly rendering, painting artistic influence analysis and logo similarity retrieval are also presented. Different from previous art imaging systems, the proposed framework aims to bridge the gaps between artistic concepts, image features and numeric solutions, which allows the numeric analysis results to be explained by art concepts and to be visualize easily. Therefore, using the proposed framework, art image data can be better organized and explored; visual similarities of art images can be better understood and explained.
Subjects/Keywords: Visual similarity analysis;
Data visualization;
Art image data;
Image feature extraction;
Self-organizing map (SOM)
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Wang, Y. (2014). On the visual similarity analysis and visualization of art image data
. (Thesis). University of Sydney. Retrieved from http://hdl.handle.net/2123/11734
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
Chicago Manual of Style (16th Edition):
Wang, Ying. “On the visual similarity analysis and visualization of art image data
.” 2014. Thesis, University of Sydney. Accessed April 11, 2021.
http://hdl.handle.net/2123/11734.
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
MLA Handbook (7th Edition):
Wang, Ying. “On the visual similarity analysis and visualization of art image data
.” 2014. Web. 11 Apr 2021.
Vancouver:
Wang Y. On the visual similarity analysis and visualization of art image data
. [Internet] [Thesis]. University of Sydney; 2014. [cited 2021 Apr 11].
Available from: http://hdl.handle.net/2123/11734.
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
Council of Science Editors:
Wang Y. On the visual similarity analysis and visualization of art image data
. [Thesis]. University of Sydney; 2014. Available from: http://hdl.handle.net/2123/11734
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation

University of Alberta
21.
Hasan, Maryam.
Extracting Structured Knowledge from Textual Data in
Software Repositories.
Degree: MS, Computing Science, 2011, University of Alberta
URL: https://era.library.ualberta.ca/files/cf95jb62q
► Software team members, as they communicate and coordinate their work with others throughout the life-cycle of their projects, generate different kinds of textual artifacts. Despite…
(more)
▼ Software team members, as they communicate and
coordinate their work with others throughout the life-cycle of
their projects, generate different kinds of textual artifacts.
Despite the variety of works in the area of mining software
artifacts, relatively little research has focused on communication
artifacts. Software communication artifacts, in addition to source
code artifacts, contain useful semantic information that is not
fully explored by existing approaches. This thesis, presents the
development of a text analysis method and tool to extract and
represent useful pieces of information from a wide range of textual
data sources associated with software projects. Our text analysis
system integrates Natural Language Processing techniques and
statistical text analysis methods, with software domain knowledge.
The extracted information is represented as RDF-style triples which
constitute interesting relations between developers and software
products. We applied the developed system to analyze five different
textual information, i.e., source code commits, bug reports, email
messages, chat logs, and wiki pages. In the evaluation of our
system, we found its precision to be 82%, its recall 58%, and its
F-measure 68%.
Subjects/Keywords: Mining Software Repositories, Textual Data, Text Mining,
Knowledge Extraction
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Hasan, M. (2011). Extracting Structured Knowledge from Textual Data in
Software Repositories. (Masters Thesis). University of Alberta. Retrieved from https://era.library.ualberta.ca/files/cf95jb62q
Chicago Manual of Style (16th Edition):
Hasan, Maryam. “Extracting Structured Knowledge from Textual Data in
Software Repositories.” 2011. Masters Thesis, University of Alberta. Accessed April 11, 2021.
https://era.library.ualberta.ca/files/cf95jb62q.
MLA Handbook (7th Edition):
Hasan, Maryam. “Extracting Structured Knowledge from Textual Data in
Software Repositories.” 2011. Web. 11 Apr 2021.
Vancouver:
Hasan M. Extracting Structured Knowledge from Textual Data in
Software Repositories. [Internet] [Masters thesis]. University of Alberta; 2011. [cited 2021 Apr 11].
Available from: https://era.library.ualberta.ca/files/cf95jb62q.
Council of Science Editors:
Hasan M. Extracting Structured Knowledge from Textual Data in
Software Repositories. [Masters Thesis]. University of Alberta; 2011. Available from: https://era.library.ualberta.ca/files/cf95jb62q

University of Michigan
22.
Chen, Zhe.
Information Extraction on Para-Relational Data.
Degree: PhD, Computer Science and Engineering, 2016, University of Michigan
URL: http://hdl.handle.net/2027.42/120853
► Para-relational data (such as spreadsheets and diagrams) refers to a type of nearly relational data that shares the important qualities of relational data but does…
(more)
▼ Para-relational
data (such as spreadsheets and diagrams) refers to a type of nearly
relational
data that shares the important qualities of relational
data but does not
present itself in a relational format. Para-relational
data often conveys highly valuable
information and is widely used in many different areas. If we can convert para-relational
data into the relational format, many existing tools can be leveraged for a
variety of interesting applications, such as
data analysis with relational query systems
and
data integration applications.
This dissertation aims to convert para-relational
data into a high-quality relational
form with little user assistance. We have developed four standalone systems, each
addressing a specific type of para-relational
data. Senbazuru is a prototype spreadsheet
database management system that extracts relational information from a large
number of spreadsheets. Anthias is an extension of the Senbazuru system to convert
a broader range of spreadsheets into a relational format. Lyretail is an
extraction
system to detect long-tail dictionary entities on webpages. Finally, DiagramFlyer is
a web-based search system that obtains a large number of diagrams automatically
extracted from web-crawled PDFs. Together, these four systems demonstrate that
converting para-relational
data into the relational format is possible today, and also
suggest directions for future systems.
Advisors/Committee Members: Cafarella, Michael John (committee member), Mei, Qiaozhu (committee member), Adar, Eytan (committee member), Jagadish, Hosagrahar V (committee member), Mozafari, Barzan (committee member).
Subjects/Keywords: information extraction; data mining; text mining; Computer Science; Engineering
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Chen, Z. (2016). Information Extraction on Para-Relational Data. (Doctoral Dissertation). University of Michigan. Retrieved from http://hdl.handle.net/2027.42/120853
Chicago Manual of Style (16th Edition):
Chen, Zhe. “Information Extraction on Para-Relational Data.” 2016. Doctoral Dissertation, University of Michigan. Accessed April 11, 2021.
http://hdl.handle.net/2027.42/120853.
MLA Handbook (7th Edition):
Chen, Zhe. “Information Extraction on Para-Relational Data.” 2016. Web. 11 Apr 2021.
Vancouver:
Chen Z. Information Extraction on Para-Relational Data. [Internet] [Doctoral dissertation]. University of Michigan; 2016. [cited 2021 Apr 11].
Available from: http://hdl.handle.net/2027.42/120853.
Council of Science Editors:
Chen Z. Information Extraction on Para-Relational Data. [Doctoral Dissertation]. University of Michigan; 2016. Available from: http://hdl.handle.net/2027.42/120853

Vanderbilt University
23.
Osterman, Travis John.
Extracting Detailed Tobacco Exposure From The Electronic Health Record.
Degree: MS, Biomedical Informatics, 2017, Vanderbilt University
URL: http://hdl.handle.net/1803/13004
► Lung cancer is the leading cause of cancer-related death in the United States and worldwide. Natural language processing (NLP) tools exist to determine smoking status…
(more)
▼ Lung cancer is the leading cause of cancer-related death in the United States and worldwide. Natural language processing (NLP) tools exist to determine smoking status (ever-smoker vs. never-smoker) from electronic health record
data, but no system to date extracts detailed smoking
data needed to assess a patient’s eligibility for lung cancer screening. Here we describe the Smoking History And Pack-year
Extraction System (SHAPES), a rules-based, NLP system to quantify tobacco exposure from electronic clinical notes.
SHAPES was developed based on 261 patient records with 9,573 clinical notes and validated on 352 randomly selected patient records with 4,040 notes. F-measures for never-smoking status, ever-smoking status, rate of smoking, duration of smoking, quantity of cigarettes, and years quit were 0.86, 0.82, 0.79, 0.62, 0.64, and 0.61, respectively. Sixteen of 22 individuals eligible for lung cancer screening were identified (precision = 0.94, recall = 0.73).
SHAPES was compared to a previously validated smoking classification system using a phenome wide association study (PheWAS). SHAPES predicted similar significant associations with 66% less sample size (10,000 vs. 35,788), and detected 411 (268%) more associations in the full dataset than when using just ever/never smoking status.
Using smoking
data from SHAPES, a smoking genome by environment interaction study found 57 statistically significant interactions between smoking and diseases including previously describes interactions between ischemic heart disease and rs1746537, obesity and rs10871777, and type 2 diabetes and rs2943641.
These studies support the use of SHAPES for lung cancer screening and other research requiring quantitative smoking history. External validation needs to be performed prior to implementation at other medical centers.
Advisors/Committee Members: Mia Levy, M.D., Ph.D. (committee member), Pierre Massion, M.D. (committee member), Josh Denny, M.D., M.S. (Committee Chair).
Subjects/Keywords: data extraction; lung cancer screening; phewas; gxe; smoking; natural language processing
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Osterman, T. J. (2017). Extracting Detailed Tobacco Exposure From The Electronic Health Record. (Thesis). Vanderbilt University. Retrieved from http://hdl.handle.net/1803/13004
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
Chicago Manual of Style (16th Edition):
Osterman, Travis John. “Extracting Detailed Tobacco Exposure From The Electronic Health Record.” 2017. Thesis, Vanderbilt University. Accessed April 11, 2021.
http://hdl.handle.net/1803/13004.
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
MLA Handbook (7th Edition):
Osterman, Travis John. “Extracting Detailed Tobacco Exposure From The Electronic Health Record.” 2017. Web. 11 Apr 2021.
Vancouver:
Osterman TJ. Extracting Detailed Tobacco Exposure From The Electronic Health Record. [Internet] [Thesis]. Vanderbilt University; 2017. [cited 2021 Apr 11].
Available from: http://hdl.handle.net/1803/13004.
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
Council of Science Editors:
Osterman TJ. Extracting Detailed Tobacco Exposure From The Electronic Health Record. [Thesis]. Vanderbilt University; 2017. Available from: http://hdl.handle.net/1803/13004
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation

University of North Texas
24.
Liu, Siyuan.
Learning from small data set for object recognition in mobile platforms.
Degree: 2016, University of North Texas
URL: https://digital.library.unt.edu/ark:/67531/metadc849633/
► Did you stand at a door with a bunch of keys and tried to find the right one to unlock the door? Did you hold…
(more)
▼ Did you stand at a door with a bunch of keys and tried to find the right one to unlock the door? Did you hold a flower and wonder the name of it? A need of object recognition could rise anytime and any where in our daily lives. With the development of mobile devices object recognition applications become possible to provide immediate assistance. However, performing complex tasks in even the most advanced mobile platforms still faces great challenges due to the limited computing resources and computing power.
In this thesis, we present an object recognition system that resides and executes within a mobile device, which can efficiently extract image features and perform learning and classification. To account for the computing constraint, a novel feature
extraction method that minimizes the
data size and maintains
data consistency is proposed. This system leverages principal component analysis method and is able to update the trained classifier when new examples become available . Our system relieves users from creating a lot of examples and makes it user friendly.
The experimental results demonstrate that a learning method trained with a very small number of examples can achieve recognition accuracy above 90% in various acquisition conditions. In addition, the system is able to perform learning efficiently.
Advisors/Committee Members: Yuan, Xiaohui, Fu, Song, Takabi, Hassan.
Subjects/Keywords: object recognition; machine learning; mobile platforms; small data set; feature extraction
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share






McMaster University
25.
Jacob, Anand.
IMPLEMENTING EFORM-BASED BASELINE RISK DATA EXTRACTION FROM HIGH QUALITY PAPERS FOR THE BRISKET DATABASE AND TOOL.
Degree: MSc, 2015, McMaster University
URL: http://hdl.handle.net/11375/17410
► This thesis was undertaken to investigate if an eForm-based extractor interface would improve the efficiency of the baseline risk extraction process for BRiskeT (Baseline Risk…
(more)
▼ This thesis was undertaken to investigate if an eForm-based extractor interface would improve the efficiency of the baseline risk extraction process for BRiskeT (Baseline Risk e-Tool). The BRiskeT database will contain the extracted baseline risk data from top prognostic research articles. BRiskeT utilizes McMaster University’s PLUS (Premium Literature Service) database to thoroughly vet articles prior to their inclusion in BRiskeT. The articles that have met inclusion criteria are then passed into the extractor interface that was developed for the purpose of this thesis, which has been called MacPrognosis. MacPrognosis displays these articles to a data extractor who fills out an electronic form which gives an overview of the baseline risk information in an article. The baseline risk information is subsequently saved to the BRiskeT database, which can then be queried according to the end user’s needs.
One of the goals in switching from a paper-based extraction system to an eForm-based system was to save time in the extraction process. Another goal for MacPrognosis was to create an eForm that allowed baseline risk information to be extracted from as many disciplines as possible. To test whether MacPrognosis succeeded in saving extraction time and improving the proportion of articles from which baseline risk data could be extracted, it was subsequently utilized to extract data from a large test set of articles. The results of the extraction process were then compared with results from a previously conducted data extraction pilot utilizing a paper-based system which was created during the feasibility analysis for BRiskeT in 2012.
The new eForm based extractor interface not only sped up the process of data extraction, but may also increase the proportion of articles from which data can be successfully extracted with minor future alterations when compared to a paper-based model of extraction.
Thesis
Master of Science (MSc)
Advisors/Committee Members: Iorio, Iorio, eHealth.
Subjects/Keywords: Baseline Risk; Data Extraction; Database; Primary Literature; McMaster PLUS Database
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Jacob, A. (2015). IMPLEMENTING EFORM-BASED BASELINE RISK DATA EXTRACTION FROM HIGH QUALITY PAPERS FOR THE BRISKET DATABASE AND TOOL. (Masters Thesis). McMaster University. Retrieved from http://hdl.handle.net/11375/17410
Chicago Manual of Style (16th Edition):
Jacob, Anand. “IMPLEMENTING EFORM-BASED BASELINE RISK DATA EXTRACTION FROM HIGH QUALITY PAPERS FOR THE BRISKET DATABASE AND TOOL.” 2015. Masters Thesis, McMaster University. Accessed April 11, 2021.
http://hdl.handle.net/11375/17410.
MLA Handbook (7th Edition):
Jacob, Anand. “IMPLEMENTING EFORM-BASED BASELINE RISK DATA EXTRACTION FROM HIGH QUALITY PAPERS FOR THE BRISKET DATABASE AND TOOL.” 2015. Web. 11 Apr 2021.
Vancouver:
Jacob A. IMPLEMENTING EFORM-BASED BASELINE RISK DATA EXTRACTION FROM HIGH QUALITY PAPERS FOR THE BRISKET DATABASE AND TOOL. [Internet] [Masters thesis]. McMaster University; 2015. [cited 2021 Apr 11].
Available from: http://hdl.handle.net/11375/17410.
Council of Science Editors:
Jacob A. IMPLEMENTING EFORM-BASED BASELINE RISK DATA EXTRACTION FROM HIGH QUALITY PAPERS FOR THE BRISKET DATABASE AND TOOL. [Masters Thesis]. McMaster University; 2015. Available from: http://hdl.handle.net/11375/17410

Universidade Nova
26.
Alves, Ricardo João de Freitas.
Declarative approach to data extraction of web pages.
Degree: 2009, Universidade Nova
URL: http://www.rcaap.pt/detail.jsp?id=oai:run.unl.pt:10362/5822
► Thesis submitted to Faculdade de Ciências e Tecnologia of the Universidade Nova de Lisboa, in partial fulfilment of the requirements for the degree of Master…
(more)
▼ Thesis submitted to Faculdade de Ciências e Tecnologia of the Universidade Nova de Lisboa, in partial fulfilment of the requirements for the degree of Master in Computer Science
In the last few years, we have been witnessing a noticeable WEB evolution with the introduction of significant improvements at technological level, such as the emergence of XHTML, CSS,Javascript, and Web2.0, just to name ones. This, combined with other factors such as physical
expansion of the Web, as well as its low cost, have been the great motivator for the organizations and the general public to join, with a consequent growth in the number of users and thus influencing the volume of the largest global data repository.
In consequence, there was an increasing need for regular data acquisition from the WEB, and because of its frequency, length or complexity, it would only be viable to obtain through automatic extractors. However, two main difficulties are inherent to automatic extractors. First, much of the
Web's information is presented in visual formats mainly directed for human reading. Secondly, the introduction of dynamic webpages, which are brought together in local memory from different sources, causing some pages not to have a source file.
Therefore, this thesis proposes a new and more modern extractor, capable of supporting the Web evolution, as well as being generic, so as to be able to be used in any situation, and capable of being extended and easily adaptable to a more particular use. This project is an extension of an earlier one which had the capability of extractions on semi-structured text files. However it evolved to a modular extraction system capable of extracting data from webpages, semi-structured text files and be expanded to support other data source types. It also contains a more complete and generic
validation system and a new data delivery system capable of performing the earlier deliveries as well as new generic ones.
A graphical editor was also developed to support the extraction system features and to allow a domain expert without computer knowledge to create extractions with only a few simple and intuitive interactions on the rendered webpage.
Advisors/Committee Members: Pires, João.
Subjects/Keywords: Web wrappers; Web data extraction; Graphical wrapper construction
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Alves, R. J. d. F. (2009). Declarative approach to data extraction of web pages. (Thesis). Universidade Nova. Retrieved from http://www.rcaap.pt/detail.jsp?id=oai:run.unl.pt:10362/5822
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
Chicago Manual of Style (16th Edition):
Alves, Ricardo João de Freitas. “Declarative approach to data extraction of web pages.” 2009. Thesis, Universidade Nova. Accessed April 11, 2021.
http://www.rcaap.pt/detail.jsp?id=oai:run.unl.pt:10362/5822.
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
MLA Handbook (7th Edition):
Alves, Ricardo João de Freitas. “Declarative approach to data extraction of web pages.” 2009. Web. 11 Apr 2021.
Vancouver:
Alves RJdF. Declarative approach to data extraction of web pages. [Internet] [Thesis]. Universidade Nova; 2009. [cited 2021 Apr 11].
Available from: http://www.rcaap.pt/detail.jsp?id=oai:run.unl.pt:10362/5822.
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
Council of Science Editors:
Alves RJdF. Declarative approach to data extraction of web pages. [Thesis]. Universidade Nova; 2009. Available from: http://www.rcaap.pt/detail.jsp?id=oai:run.unl.pt:10362/5822
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation

Rice University
27.
Shu, Anhei.
Data Mining of Chinese Social Media.
Degree: PhD, Engineering, 2014, Rice University
URL: http://hdl.handle.net/1911/88119
► We present measurements and analysis of censorship on Weibo, a popular microblogging site in China. Since we were limited in the rate at which we…
(more)
▼ We present measurements and analysis of censorship on Weibo, a popular microblogging site in China. Since we were limited in the rate at which we could download posts, we identified users likely to participate in sensitive topics and recursively followed their social contacts, biasing our search toward a subset of Weibo where we hoped to be more likely to observe censorship. Our architecture enables us to detect post deletions within one minute of the deletion event, giving us a high-fidelity view of what is being deleted by the censors and when.
We found that deletions happen most heavily in the first hour after a post has been submitted. Focusing on original posts, not reposts/retweets, we observed that nearly 30% of the total deletion events occur within 5-30 minutes. Nearly 90% of the deletions happen within the first 24 hours.
Leveraging our
data, we also consider a variety of hypotheses about the mechanisms used by Weibo for censorship, such as the extent to which they use retrospective keyword-based censorship, and how repost/retweet popularity interacts with censorship.
By leveraging natural language processing techniques we also perform a topical analysis of the deleted posts, overcoming the usage of neologisms, named entities, and informal language that typifies Chinese social media. Using Independent Component Analysis, we find that the topics where mass removal happens the fastest are those that combine events that are hot topics in Weibo as a whole (e.g., the Beijing rainstorms or a sex scandal) with themes common to sensitive posts (e.g., Beijing, government, China, and policeman).
Air pollution is a pressing concern for industrialized countries. Air quality measurements and their interpretations often take on political overtones. Similar concerns reflect the our understanding of what levels of measured pollution correspond to different levels of human nuisance, impairment, or injury. In this paper, we consider air pollution metrics from four large Chinese cities (U.S. embassy/consulate
data, and Chinese domestic measurements) and compare them to a large volume of discussions on Weibo (a popular Chinese microblogging system). In the city with the worst PM2.5, Beijing, we found a strong correlation (R=0.82) between Chinese use of pollution-related terms and the ambient pollution. In other Chinese cities with lower pollution, the correlation was weaker. Nonetheless, our results show that social media may be a valuable proxy measurement for pollution, which may be quite valuable when traditional measurement stations are unavailable (or whose output is censored or misreported).
Advisors/Committee Members: Wallach, Daniel S. (advisor), Jermaine, Christopher M (committee member), Bronk, Chris (committee member).
Subjects/Keywords: Social network; microblog; data mining; topic extraction; censorship
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Shu, A. (2014). Data Mining of Chinese Social Media. (Doctoral Dissertation). Rice University. Retrieved from http://hdl.handle.net/1911/88119
Chicago Manual of Style (16th Edition):
Shu, Anhei. “Data Mining of Chinese Social Media.” 2014. Doctoral Dissertation, Rice University. Accessed April 11, 2021.
http://hdl.handle.net/1911/88119.
MLA Handbook (7th Edition):
Shu, Anhei. “Data Mining of Chinese Social Media.” 2014. Web. 11 Apr 2021.
Vancouver:
Shu A. Data Mining of Chinese Social Media. [Internet] [Doctoral dissertation]. Rice University; 2014. [cited 2021 Apr 11].
Available from: http://hdl.handle.net/1911/88119.
Council of Science Editors:
Shu A. Data Mining of Chinese Social Media. [Doctoral Dissertation]. Rice University; 2014. Available from: http://hdl.handle.net/1911/88119

University of Victoria
28.
Taherimakhsousi, Nina.
Context aware face recognition.
Degree: Department of Computer Science, 2020, University of Victoria
URL: http://hdl.handle.net/1828/11441
► In common face recognition systems the recognition rate is not sufficient for today's applications, and systems only work in conditional databases and fail in unconstrained…
(more)
▼ In common face recognition systems the recognition rate is not sufficient for today's applications, and systems only work in conditional databases and fail in unconstrained conditions. The problem addressed in this dissertation is how to exploit context information to enhance face recognition. Therefore, this dissertation focuses on the investigation of dynamic context management and adaptivity to: (i) improve context awareness and the exploit of the value of contextual information to enhance the recognition rate in face recognition systems, and (ii) improve the dynamic capabilities of adaptivity in face recognition systems by controlling the relevance of contextual information collecting, analyzing and searching context. Context awareness and adaptivity pose significant challenges for face recognition systems. Regarding context awareness, the first challenge addressed in this dissertation is
data collection that can automatically analyze images in order to categorize and summarize contextual information. The second challenge arises from
data extraction due to the big size of database of faces. Concerning adaptivity, the third challenge is to improve adaptive learning and classifier method with respect to variations. The fourth challenge, related also to adaptivity, concerns the high rate of videos generated by users from a dense urban area to decentralized cloud infrastructure. The fifth and sixth challenges concern the human's visual system in terms of contextual information in face recognition. Given these challenges, to improve context awareness and adaptivity in face recognition systems we made four contributions. First, we proposed our framework for location-based face recognition. The framework comprises location-centric image databases to recognize faces in images that have been taken at nearby locations frequently visited by individuals. Second, we defined contextual information and architectural designed for context aware face recognition systems. Third, we designed contextual information
extraction algorithm with an architecture for context aware video-based face recognition, which decentralizes cloud computing on the SAVI network infrastructure. Fourth, we designed an experimental study of face recognition by humans. The experimental study provided insights into the nature of cues that the human visual system relies upon for achieving its impressive performance serving as the building blocks for the developed context aware face recognition system.
Advisors/Committee Members: Muller, Hausi A. (supervisor).
Subjects/Keywords: Context Aware; Self-adaptive Systems; Dynamic context management; Data extraction
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Taherimakhsousi, N. (2020). Context aware face recognition. (Thesis). University of Victoria. Retrieved from http://hdl.handle.net/1828/11441
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
Chicago Manual of Style (16th Edition):
Taherimakhsousi, Nina. “Context aware face recognition.” 2020. Thesis, University of Victoria. Accessed April 11, 2021.
http://hdl.handle.net/1828/11441.
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
MLA Handbook (7th Edition):
Taherimakhsousi, Nina. “Context aware face recognition.” 2020. Web. 11 Apr 2021.
Vancouver:
Taherimakhsousi N. Context aware face recognition. [Internet] [Thesis]. University of Victoria; 2020. [cited 2021 Apr 11].
Available from: http://hdl.handle.net/1828/11441.
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
Council of Science Editors:
Taherimakhsousi N. Context aware face recognition. [Thesis]. University of Victoria; 2020. Available from: http://hdl.handle.net/1828/11441
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
29.
Balan, Shilpa.
A Study Of Data Informatics: Data Analysis And Knowledge Discovery Via A Novel Data Mining Algorithm.
Degree: PhD, Management Information Systems, 2014, University of Mississippi
URL: https://egrove.olemiss.edu/etd/914
► Frequent pattern mining (fpm) has become extremely popular among data mining researchers because it provides interesting and valuable patterns from large datasets. The decreasing cost…
(more)
▼ Frequent pattern mining (fpm) has become extremely popular among
data mining researchers because it provides interesting and valuable patterns from large datasets. The decreasing cost of storage devices and the increasing availability of processing power make it possible for researchers to build and analyze gigantic datasets in various scientific and business domains. A filtering process is needed, however, to generate patterns that are relevant. This dissertation contributes to addressing this need. An experimental system named fpmies (frequent pattern mining information
extraction system) was built to extract information from electronic documents automatically. Collocation analysis was used to analyze the relationship of words. Template mining was used to build the experimental system which is the foundation of fpmies. With the rising need for improved environmental performance, a dataset based on green supply chain practices of three companies was used to test fpmies. The new system was also tested by users resulting in a recall of 83.4%. The new algorithm's combination of semantic relationships with template mining significantly improves the recall of fpmies. The study's results also show that fpmies is much more efficient than manually trying to extract information. Finally, the performance of the fpmies system was compared with the most popular fpm algorithm, apriori, yielding a significantly improved recall and precision for fpmies (76.7% and 74.6% respectively) compared to that of apriori (30% recall and 24.6% precision).
Advisors/Committee Members: Sumali Conlon, Tony Ammeter, Milam Aiken.
Subjects/Keywords: Collocation; Data Mining; Fpmies; Information Extraction; Business Administration, Management, and Operations
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Balan, S. (2014). A Study Of Data Informatics: Data Analysis And Knowledge Discovery Via A Novel Data Mining Algorithm. (Doctoral Dissertation). University of Mississippi. Retrieved from https://egrove.olemiss.edu/etd/914
Chicago Manual of Style (16th Edition):
Balan, Shilpa. “A Study Of Data Informatics: Data Analysis And Knowledge Discovery Via A Novel Data Mining Algorithm.” 2014. Doctoral Dissertation, University of Mississippi. Accessed April 11, 2021.
https://egrove.olemiss.edu/etd/914.
MLA Handbook (7th Edition):
Balan, Shilpa. “A Study Of Data Informatics: Data Analysis And Knowledge Discovery Via A Novel Data Mining Algorithm.” 2014. Web. 11 Apr 2021.
Vancouver:
Balan S. A Study Of Data Informatics: Data Analysis And Knowledge Discovery Via A Novel Data Mining Algorithm. [Internet] [Doctoral dissertation]. University of Mississippi; 2014. [cited 2021 Apr 11].
Available from: https://egrove.olemiss.edu/etd/914.
Council of Science Editors:
Balan S. A Study Of Data Informatics: Data Analysis And Knowledge Discovery Via A Novel Data Mining Algorithm. [Doctoral Dissertation]. University of Mississippi; 2014. Available from: https://egrove.olemiss.edu/etd/914

University of Notre Dame
30.
Alexandri Gregor Zavodny.
Analysis of Large-Scale Unstructured Urban Range Scan
Data</h1>.
Degree: Computer Science and Engineering, 2009, University of Notre Dame
URL: https://curate.nd.edu/show/3n203x8339h
► Efficient 3D scanning technology has led to acquisition of very large datasets for application areas as diverse as terrain and urban modeling, but relatively…
(more)
▼ Efficient 3D scanning technology has led to
acquisition of very large datasets for application areas as diverse
as terrain and urban modeling, but relatively few techniques exist
to automatically extract meaningful regions from this
data, and the
largest datasets examined in the literature rarely exceed millions
of points in size. In this thesis, we present an efficient
algorithm for identification of locally planar regions in
large-scale GPS-registered scan
data containing hundreds of
millions of points. We define hard, statistical
measures of accuracy and examine the performance of our algorithm
on two real-world datasets with vastly varying characteristics. We
examine the usefulness of a number of extensions to the algorithm,
and confirm our assumptions regarding the accuracy of the parallel
nature of our approach. Simulating a high-performance distributed
computing platform, we are able to process scan
data of
approximately 100 million points in under 20 minutes, and a much
larger dataset of over three billion points in under 14
hours.
Advisors/Committee Members: Scott Emrich, Committee Member, Kevin W. Bowyer, Committee Member, Patrick J. Flynn, Committee Chair.
Subjects/Keywords: range scan data; object extraction
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Zavodny, A. G. (2009). Analysis of Large-Scale Unstructured Urban Range Scan
Data</h1>. (Thesis). University of Notre Dame. Retrieved from https://curate.nd.edu/show/3n203x8339h
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
Chicago Manual of Style (16th Edition):
Zavodny, Alexandri Gregor. “Analysis of Large-Scale Unstructured Urban Range Scan
Data</h1>.” 2009. Thesis, University of Notre Dame. Accessed April 11, 2021.
https://curate.nd.edu/show/3n203x8339h.
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
MLA Handbook (7th Edition):
Zavodny, Alexandri Gregor. “Analysis of Large-Scale Unstructured Urban Range Scan
Data</h1>.” 2009. Web. 11 Apr 2021.
Vancouver:
Zavodny AG. Analysis of Large-Scale Unstructured Urban Range Scan
Data</h1>. [Internet] [Thesis]. University of Notre Dame; 2009. [cited 2021 Apr 11].
Available from: https://curate.nd.edu/show/3n203x8339h.
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
Council of Science Editors:
Zavodny AG. Analysis of Large-Scale Unstructured Urban Range Scan
Data</h1>. [Thesis]. University of Notre Dame; 2009. Available from: https://curate.nd.edu/show/3n203x8339h
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
◁ [1] [2] [3] [4] [5] … [12] ▶
.