You searched for subject:(data integration)
.
Showing records 1 – 30 of
627 total matches.
◁ [1] [2] [3] [4] [5] … [21] ▶

University of Ottawa
1.
Awada, Rana.
Data Sharing and Exchange: Semantics and Query Answering
.
Degree: 2015, University of Ottawa
URL: http://hdl.handle.net/10393/32080
► Exchanging and integrating data that belong to worlds of different vocabularies are two prominent problems in the database literature. While data coordination deals with managing…
(more)
▼ Exchanging and integrating data that belong to worlds of different vocabularies are two prominent problems in the database literature. While data coordination deals with managing and integrating data between autonomous yet related sources with possibly distinct vocabularies, data exchange is defined as the problem of extracting data from a source and materializing it in an independent target to conform to the target schema. These two problems, however, have never been studied in a unified setting which allows both the exchange of the data as well as the coordination of different vocabularies between different sources. Our thesis shows that such a unified setting exhibits data integration capabilities that are beyond the ones provided by data exchange and data coordination separately. In this thesis, we propose a new setting – called DSE, for Data Sharing and Exchange – which allows the exchange of data between independent source and target applications that possess independent schemas, as well as independent yet related domains of constants. To facilitate this type of exchange, we extend the source-to-target dependencies used in the ordinary data exchange setting which allow the association between the source and the target at the schema level, with the mapping table construct introduced in the classical data coordination setting which defines the association between the source and the target at the instance level. A mapping table construct defines for each source element, the set of associated (or corresponding) elements in the domain of the target. The semantics of this association relationship between source and target elements change with different requirements of different applications. Ordinary DE settings can represent DSE settings; however, we show that there exist DSE settings with particular semantics of related values in mapping tables where DE is not the best exchange solution to adopt. The thesis introduces two DSE settings with such a property. We call the first DSE with unique identity semantics. The semantics of a mapping table in this DSE setting specifies that each source element should be uniquely mapped to at least one target element that is associated with it in the mapping table.
ii In this setting, classical DE is one method to perform a data exchange; however, it is not the best method to adopt, since it can not represent exchange applications, that require – as DC applications – to compute both portions as well as complete sets of certain answers for conjunctive queries. In addition, we show that adopting known DE universal solutions as semantics for such DSE settings is not the best in terms of efficiency when computing certain answers for conjunctive queries. The second DSE setting that the thesis introduces with the same property is called DSE with equality semantics. This setting captures interesting meaning of related data in a mapping table. Such semantics impose that each source element in a mapping table is related to a target element only if both elements are equivalent (i.e they have the…
Subjects/Keywords: Data Exchange;
Data Integration
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Awada, R. (2015). Data Sharing and Exchange: Semantics and Query Answering
. (Thesis). University of Ottawa. Retrieved from http://hdl.handle.net/10393/32080
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
Chicago Manual of Style (16th Edition):
Awada, Rana. “Data Sharing and Exchange: Semantics and Query Answering
.” 2015. Thesis, University of Ottawa. Accessed January 17, 2021.
http://hdl.handle.net/10393/32080.
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
MLA Handbook (7th Edition):
Awada, Rana. “Data Sharing and Exchange: Semantics and Query Answering
.” 2015. Web. 17 Jan 2021.
Vancouver:
Awada R. Data Sharing and Exchange: Semantics and Query Answering
. [Internet] [Thesis]. University of Ottawa; 2015. [cited 2021 Jan 17].
Available from: http://hdl.handle.net/10393/32080.
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
Council of Science Editors:
Awada R. Data Sharing and Exchange: Semantics and Query Answering
. [Thesis]. University of Ottawa; 2015. Available from: http://hdl.handle.net/10393/32080
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation

University of Houston
2.
Kundooru, Shyam Sunder Reddy 1991-.
Generalized Framework for Time-Sensitive Decision Support Systems.
Degree: MS, Computer Science, 2015, University of Houston
URL: http://hdl.handle.net/10657/1739
► The University of Houston has various monitoring systems that feed information to Emergency Operations Centers for emergency decisions. Emergency decisions must be made utilizing the…
(more)
▼ The University of Houston has various monitoring systems that feed information to Emergency Operations Centers for emergency decisions. Emergency decisions must be made utilizing the information available to Emergency Operations Center which is received from different sensors feeding the information. The information systems in the University are disparate and do not communicate with each other. These enterprise systems are independent with no integrated view. Most of the systems operate on and store the same information about the campus and people while using various models appropriate to operational needs. As the monitoring systems are not integrated with each other, it becomes difficult to estimate the level of emergency by considering the incidents identified by the individual systems.
In this Thesis, we realized a
data model design to enable an aggregation framework in time-sensitive decision support systems. We propose a framework for aggregating the
data available through the information sources towards achieving an integrated view. The standard used for integrating the various systems is IF-MAP (Interface for Metadata Access Point) Standard. Using the Publish – Subscribe communication paradigm and aggregating the
data based on time window and location, disparate
data sources can be integrated and effective communication between the systems can be achieved, which helps in making the emergency decisions. The decision can now be based on effective utilization of all the systems involved in the particular incident. Aggregation has been performed on various sources of information like Police Dispatch system, card access system, video feeds and general facilities system at University of Houston.
Advisors/Committee Members: Subhlok, Jaspal (advisor), Subhlok, Jaspal (committee member), Gurkan, Deniz (committee member), Chapman, Barbara M. (committee member).
Subjects/Keywords: Data Integration; Data Aggregation
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Kundooru, S. S. R. 1. (2015). Generalized Framework for Time-Sensitive Decision Support Systems. (Masters Thesis). University of Houston. Retrieved from http://hdl.handle.net/10657/1739
Chicago Manual of Style (16th Edition):
Kundooru, Shyam Sunder Reddy 1991-. “Generalized Framework for Time-Sensitive Decision Support Systems.” 2015. Masters Thesis, University of Houston. Accessed January 17, 2021.
http://hdl.handle.net/10657/1739.
MLA Handbook (7th Edition):
Kundooru, Shyam Sunder Reddy 1991-. “Generalized Framework for Time-Sensitive Decision Support Systems.” 2015. Web. 17 Jan 2021.
Vancouver:
Kundooru SSR1. Generalized Framework for Time-Sensitive Decision Support Systems. [Internet] [Masters thesis]. University of Houston; 2015. [cited 2021 Jan 17].
Available from: http://hdl.handle.net/10657/1739.
Council of Science Editors:
Kundooru SSR1. Generalized Framework for Time-Sensitive Decision Support Systems. [Masters Thesis]. University of Houston; 2015. Available from: http://hdl.handle.net/10657/1739

Anna University
3.
Amshakala K.
Information theory based data dependency extraction and
its application in data integration;.
Degree: Information theory based data dependency extraction
and its application in data integration, 2015, Anna University
URL: http://shodhganga.inflibnet.ac.in/handle/10603/54547
► As a huge volume of data is getting generated every day data integration becomes important to provide a uniform view over the data collected from…
(more)
▼ As a huge volume of data is getting generated every
day data integration becomes important to provide a uniform view
over the data collected from multiple resources Effective data
analysis methods are required to work with such massive integrated
data and efficient data quality tools are also vital to ensure the
correctness of the integrated data The data store for many
organizations is the databases and integrity constraints are the
primary means for ensuring data integrity in databases Functional
Dependencies are the common type of integrity constraints that are
specified by the database designer when the database schema is
designed In the quest for capturing more information from data in
the form of constraints functional dependencies are extended in
several ways Among various data dependencies conditional functional
dependencies fuzzy functional dependencies and matching
dependencies are widely used for data cleaning operations In
addition to the constraints specified by the designer there are
other constraints hidden in data and it becomes essential to mine
dependencies from data Several dependency discovery methods are
proposed in this work to extract functional dependencies and its
extensions newlineThe proposed work also uses information theory
measure to extract various types of data dependencies from data
newline newline
reference p168-180.
Advisors/Committee Members: Nedunchezhian R.
Subjects/Keywords: Data integration; Fuzzy functional dependencies
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
K, A. (2015). Information theory based data dependency extraction and
its application in data integration;. (Thesis). Anna University. Retrieved from http://shodhganga.inflibnet.ac.in/handle/10603/54547
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
Chicago Manual of Style (16th Edition):
K, Amshakala. “Information theory based data dependency extraction and
its application in data integration;.” 2015. Thesis, Anna University. Accessed January 17, 2021.
http://shodhganga.inflibnet.ac.in/handle/10603/54547.
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
MLA Handbook (7th Edition):
K, Amshakala. “Information theory based data dependency extraction and
its application in data integration;.” 2015. Web. 17 Jan 2021.
Vancouver:
K A. Information theory based data dependency extraction and
its application in data integration;. [Internet] [Thesis]. Anna University; 2015. [cited 2021 Jan 17].
Available from: http://shodhganga.inflibnet.ac.in/handle/10603/54547.
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
Council of Science Editors:
K A. Information theory based data dependency extraction and
its application in data integration;. [Thesis]. Anna University; 2015. Available from: http://shodhganga.inflibnet.ac.in/handle/10603/54547
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation

Texas A&M University
4.
Feng, Shuo.
A Likelihood Based Framework for Data Integration with Application to eQTL Mapping.
Degree: PhD, Statistics, 2014, Texas A&M University
URL: http://hdl.handle.net/1969.1/153318
► We develop a new way of thinking about and integrating gene expression data (continuous) and genomic information data (binary) by jointly compressing the two data…
(more)
▼ We develop a new way of thinking about and integrating gene expression
data (continuous) and genomic information
data (binary) by jointly compressing the two
data sets and embedding their signals in low dimensional feature spaces with an information sharing mechanism, which connects the continuous
data to the binary
data, under the penalized log-likelihood framework. In particular, the continuous
data are modeled by a Gaussian likelihood and the binary
data are modeled by a
Bernoulli likelihood which is formed by transforming the feature space of the genomic information with a logit link. The smoothly clipped absolute deviation (SCAD) penalty, is added on the basis vectors of the low dimensional feature spaces for both
data sets, which is based on the assumption that only a small set of genetic variants are associated with a small fraction of gene expression and the fact that those basis vectors can be interpreted as weights assigned on the genetic variants and gene expression similar to the way the loading vectors of principal component analysis (PCA) or canonical correlation analysis (CCA) are interpreted. Algorithmically, a Majorization-Minimization (MM) algorithm with local linear approximation (LLA) to SCAD penalty is developed to effectively and efficiently solve the optimization problem involved, which produces closed-form updating rules. The effectiveness of our method is demonstrated by simulations in various setups with comparisons to
some popular competing methods and an application to eQTL mapping with real
data.
Advisors/Committee Members: Huang, Jianhua (advisor), Hu, Jianhua (advisor), Wu, Guoyao (committee member), Sherman, Michael (committee member).
Subjects/Keywords: Data integration; eQTL; GWAS; CCA
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Feng, S. (2014). A Likelihood Based Framework for Data Integration with Application to eQTL Mapping. (Doctoral Dissertation). Texas A&M University. Retrieved from http://hdl.handle.net/1969.1/153318
Chicago Manual of Style (16th Edition):
Feng, Shuo. “A Likelihood Based Framework for Data Integration with Application to eQTL Mapping.” 2014. Doctoral Dissertation, Texas A&M University. Accessed January 17, 2021.
http://hdl.handle.net/1969.1/153318.
MLA Handbook (7th Edition):
Feng, Shuo. “A Likelihood Based Framework for Data Integration with Application to eQTL Mapping.” 2014. Web. 17 Jan 2021.
Vancouver:
Feng S. A Likelihood Based Framework for Data Integration with Application to eQTL Mapping. [Internet] [Doctoral dissertation]. Texas A&M University; 2014. [cited 2021 Jan 17].
Available from: http://hdl.handle.net/1969.1/153318.
Council of Science Editors:
Feng S. A Likelihood Based Framework for Data Integration with Application to eQTL Mapping. [Doctoral Dissertation]. Texas A&M University; 2014. Available from: http://hdl.handle.net/1969.1/153318

Linköping University
5.
Elfving, Eric.
Automated annotation of protein families.
Degree: Bioinformatics, 2011, Linköping University
URL: http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-69393
► Introduction: The great challenge in bioinformatics is data integration. The amount of available data is always increasing and there are no common unified standards…
(more)
▼ Introduction: The great challenge in bioinformatics is data integration. The amount of available data is always increasing and there are no common unified standards of where, or how, the data should be stored. The aim of this workis to build an automated tool to annotate the different member families within the protein superfamily of medium-chain dehydrogenases/reductases (MDR), by finding common properties among the member proteins. The goal is to increase the understanding of the MDR superfamily as well as the different member families.This will add to the amount of knowledge gained for free when a new, unannotated, protein is matched as a member to a specific MDR member family.
Method: The different types of data available all needed different handling. Textual data was mainly compared as strings while numeric data needed some special handling such as statistical calculations. Ontological data was handled as tree nodes where ancestry between terms had to be considered. This was implemented as a plugin-based system to make the tool easy to extend with additional data sources of different types.
Results: The biggest challenge was data incompleteness yielding little (or no) results for some families and thus decreasing the statistical significance of the results. Results show that all the human and mouse MDR members have a Pfam ADH domain (ADH_N and/or ADH_zinc_N) and takes part in an oxidation-reduction process, often with NAD or NADP as cofactor. Many of the proteins contain zinc and are expressed in liver tissue.
Conclusions: A python based tool for automatic annotation has been created to annotate the different MDR member families. The tool is easily extendable to be used with new databases and much of the results agrees with information found in literature. The utility and necessity of this system, as well as the quality of its produced results, are expected to only increase over time, even if no additional extensions are produced, as the system itself is able to make further and more detailed inferences as more and more data become available.
Subjects/Keywords: data integration; Bioinformatics; Bioinformatik
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Elfving, E. (2011). Automated annotation of protein families. (Thesis). Linköping University. Retrieved from http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-69393
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
Chicago Manual of Style (16th Edition):
Elfving, Eric. “Automated annotation of protein families.” 2011. Thesis, Linköping University. Accessed January 17, 2021.
http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-69393.
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
MLA Handbook (7th Edition):
Elfving, Eric. “Automated annotation of protein families.” 2011. Web. 17 Jan 2021.
Vancouver:
Elfving E. Automated annotation of protein families. [Internet] [Thesis]. Linköping University; 2011. [cited 2021 Jan 17].
Available from: http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-69393.
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
Council of Science Editors:
Elfving E. Automated annotation of protein families. [Thesis]. Linköping University; 2011. Available from: http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-69393
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation

Montana State University
6.
Liu, Chang.
Heterogeneous data integration for operations and travel information sharing.
Degree: MS, College of Engineering, 2014, Montana State University
URL: https://scholarworks.montana.edu/xmlui/handle/1/8783
► The North/West Passage (N/WP) corridor follows I-90 and I-94 from Washington to Wisconsin. The Operations and Travel Information Integration Sharing (OTIIS) system provides traveler information…
(more)
▼ The North/West Passage (N/WP) corridor follows I-90 and I-94 from Washington to Wisconsin. The Operations and Travel Information
Integration Sharing (OTIIS) system provides traveler information on the eight state corridor-wide scales in a single website. This work presents the approach to ingest the heterogeneous
data from the Departments of Transportation (DOT) of the eight states along the N/WP corridor, and from third party sources such as NOAA Forecast Database. This thesis details the process of fetching, parsing the
data feeds, updating the database; introduces all the web services used in the project and describes how to resolve related geometry
data issues. The valuable potential benefits of such
data injectors would be not only feeding the OTIIS website with well formatted, real time travel information, but also facilitating the development of the API for potential use by other approved systems or developers.
Advisors/Committee Members: Chairperson, Graduate Committee: Qing Yang (advisor).
Subjects/Keywords: Data integration (Computer science).
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Liu, C. (2014). Heterogeneous data integration for operations and travel information sharing. (Masters Thesis). Montana State University. Retrieved from https://scholarworks.montana.edu/xmlui/handle/1/8783
Chicago Manual of Style (16th Edition):
Liu, Chang. “Heterogeneous data integration for operations and travel information sharing.” 2014. Masters Thesis, Montana State University. Accessed January 17, 2021.
https://scholarworks.montana.edu/xmlui/handle/1/8783.
MLA Handbook (7th Edition):
Liu, Chang. “Heterogeneous data integration for operations and travel information sharing.” 2014. Web. 17 Jan 2021.
Vancouver:
Liu C. Heterogeneous data integration for operations and travel information sharing. [Internet] [Masters thesis]. Montana State University; 2014. [cited 2021 Jan 17].
Available from: https://scholarworks.montana.edu/xmlui/handle/1/8783.
Council of Science Editors:
Liu C. Heterogeneous data integration for operations and travel information sharing. [Masters Thesis]. Montana State University; 2014. Available from: https://scholarworks.montana.edu/xmlui/handle/1/8783
7.
BATISTA, Maria da Conceição Moraes.
Schema quality analysis in a data integration system
.
Degree: 2008, Universidade Federal de Pernambuco
URL: http://repositorio.ufpe.br/handle/123456789/1335
► Qualidade da Informação (QI) tem se tornado um aspecto crítico nas organizações e em pesquisas da área de sistemas de informação. Informações de pouca qualidade…
(more)
▼ Qualidade da Informação (QI) tem se tornado um aspecto crítico nas organizações e em
pesquisas da área de sistemas de informação. Informações de pouca qualidade podem ter
impactos negativos na efetividade de uma organização. O crescimento do uso de
data
warehouses e acesso direto de gerentes e usários a informações obtidas de várias fontes
contribuíram para o crescimento da necessidade de qualidade nas informações das empresas.
A noção de QI em sistemas de informação emergiu nos últimos e vem sendo alvo de interesse
cada vez maior. Não existe ainda um acordo comum acerca de uma definição da QI. Apenas
um consenso de que tratase de um conceito de adequação ao uso . A informação é
considerada apropriada para o uso dentro da perspectiva dos requisitos e necessidades de um
usuário, ou seja, a qualidade da informação depende de sua utilidade.
O acesso integrado a informações distribuídas em múltiplas fontes de dados heterogêneas,
distribuídas e autônomas é um problema importante a ser resolvido em muitos domínios de
aplicações. Tipicamente existem algumas formas de se obter respostas a consultas globais,
sobre dados em fontes diferentes com diferentes combinações. entretanto é bastante custoso
obter todas as respostas possíveis. Enquanto muita pesquisa tem sido feita em relação a
processamento de consultas e seleção de planos com critérios de custo, pouco se conhece com
relação ao problema de incorporar aspectos de QI em esquemas globais de sistemas de
integração de dados.
Neste trabalho, nós propomos a análise da QI em um sistema de integração de dados, mais
especificamente a qualidade dos esquemas do sistema. O nosso principal objetivo é melhorar a
qualidade da execução das consultas. Nossa proposta baseiasse na hipótese de que uma
alternativa de otimizar o processamento de consultas seria a construção de esquemas com
altos escores de QI.
Assim, o foco deste trabalho está no desenvolvimento de mecanismos de análise da QI voltados
esquemas de integração de dados, especialmente o esquema global. Inicialmente, nós
construímos uma lista de critérios de QI e relacionamos estes critérios com os elementos
existentes em sistemas de integração de dados. Em seguida, direcionamos o foco para o
esquema integrado e especificamos formalmente critérios de qualidade de esquemas
minimalidade, completude do esquema e consistência de tipo. Também especificamos um
algoritmo de execução de ajustes de forma a melhorar a minimalidade e algoritmos para medir a
consistência de tipo nos esquemas. Com esses experimentos conseguimos mostrar que o
tempo de execução de uma consulta em um sistema de integração de dados pode diminuir se
esta consulta for submetida a um esquema com escores altos de minimalidade e consistência
de tipo
Advisors/Committee Members: SALGADO, Ana Carolina Brandão (advisor).
Subjects/Keywords: Information Quality;
Data Quality;
Data Integration
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
BATISTA, M. d. C. M. (2008). Schema quality analysis in a data integration system
. (Thesis). Universidade Federal de Pernambuco. Retrieved from http://repositorio.ufpe.br/handle/123456789/1335
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
Chicago Manual of Style (16th Edition):
BATISTA, Maria da Conceição Moraes. “Schema quality analysis in a data integration system
.” 2008. Thesis, Universidade Federal de Pernambuco. Accessed January 17, 2021.
http://repositorio.ufpe.br/handle/123456789/1335.
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
MLA Handbook (7th Edition):
BATISTA, Maria da Conceição Moraes. “Schema quality analysis in a data integration system
.” 2008. Web. 17 Jan 2021.
Vancouver:
BATISTA MdCM. Schema quality analysis in a data integration system
. [Internet] [Thesis]. Universidade Federal de Pernambuco; 2008. [cited 2021 Jan 17].
Available from: http://repositorio.ufpe.br/handle/123456789/1335.
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
Council of Science Editors:
BATISTA MdCM. Schema quality analysis in a data integration system
. [Thesis]. Universidade Federal de Pernambuco; 2008. Available from: http://repositorio.ufpe.br/handle/123456789/1335
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation

University of Illinois – Urbana-Champaign
8.
Zhi, Shi.
Integrating multiple conflicting sources by truth discovery and source quality estimation.
Degree: MS, 0112, 2014, University of Illinois – Urbana-Champaign
URL: http://hdl.handle.net/2142/50493
► Multiple descriptions about the same entity from different sources will inevitably result in data or information inconsistency. Among conflicting pieces of information, which one is…
(more)
▼ Multiple descriptions about the same entity from different sources will inevitably result in
data or information inconsistency. Among conflicting pieces of information, which one is the most trustworthy? How to detect the fraudulence of a rumor? Obviously, it is unrealistic to curate and validate the trustworthiness of every piece of information because of the high cost of human labeling and lack of experts. To find the truth of each entity, much research work has shown that considering the quality of information providers can improve the performance of
data integration. Due to different quality of
data sources, it is hard to find a general solution that works for every case. Therefore, we start from a general setting of truth analysis at first and narrow down to two basic problems in
data integration. We first propose a general framework to deal with numerical
data with flexibility of defining loss function. Source quality is represented by a vector to model the source credibility in different error interval. Then we propose a new method called No Truth Truth Model(NTTM) to deal with truth existence problem in low-quality
data. Preliminary experiments on real stock
data and slot filling
data show promising results.
Advisors/Committee Members: Han, Jiawei (advisor).
Subjects/Keywords: Truth Discovery; Data Integration; Data Quality
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Zhi, S. (2014). Integrating multiple conflicting sources by truth discovery and source quality estimation. (Thesis). University of Illinois – Urbana-Champaign. Retrieved from http://hdl.handle.net/2142/50493
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
Chicago Manual of Style (16th Edition):
Zhi, Shi. “Integrating multiple conflicting sources by truth discovery and source quality estimation.” 2014. Thesis, University of Illinois – Urbana-Champaign. Accessed January 17, 2021.
http://hdl.handle.net/2142/50493.
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
MLA Handbook (7th Edition):
Zhi, Shi. “Integrating multiple conflicting sources by truth discovery and source quality estimation.” 2014. Web. 17 Jan 2021.
Vancouver:
Zhi S. Integrating multiple conflicting sources by truth discovery and source quality estimation. [Internet] [Thesis]. University of Illinois – Urbana-Champaign; 2014. [cited 2021 Jan 17].
Available from: http://hdl.handle.net/2142/50493.
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
Council of Science Editors:
Zhi S. Integrating multiple conflicting sources by truth discovery and source quality estimation. [Thesis]. University of Illinois – Urbana-Champaign; 2014. Available from: http://hdl.handle.net/2142/50493
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation

University of Georgia
9.
Khalil, George Magdy.
Data fusion.
Degree: 2016, University of Georgia
URL: http://hdl.handle.net/10724/35357"
► Maximizing the utility of surveys while not adding questions is of utmost importance to surveillance systems. Public health agencies need to keep the ever-decreasing number…
(more)
▼ Maximizing the utility of surveys while not adding questions is of utmost importance to surveillance systems. Public health agencies need to keep the ever-decreasing number of participants from breaking off after an interview is started. A
common reason a participant breaks off is due to the length of the survey. It is therefore important that organizations conducting surveillance investigate innovative techniques of combining data from multiple, less extensive surveys. Data fusion is one
such technique that has been used to integrate databases to save time and money. Health insurance status is a good topic to use for the validation of data fusion because this variable is common to many data sources and has a body of literature
documenting factors associated with being insured. Besides data availability, respondents are thought to be accurate in reporting health insurance status and type (Call et al., 2008a). The goal of this research was to create "statistical twins" based on
health insurance status from two data sources. Matched respondents were considered "statistical twins" and used to test whether data fusion is an effective method of predicting a variable not originally asked in the survey, given the respondent’s
profile. Data from the Behavioral Risk Factor Surveillance System’s (BRFSS’s) survey and the National Health Interview Survey (NHIS) were matched by first harmonizing the variables from the two data sources. A propensity score was calculated, which was
then used to perform Mahalanobis and Nearest Neighbor matching across the two surveys. The efficiency of the match was then validated: 88.2% of the 297,734 BRFSS respondents reported being covered by a health insurance, while 83.0% of the 27, 921 NHIS
respondents reported currently being insured. Propensity scores were left-modal for both the NHIS and the BRFSS. Quantile- Quantile (QQ) plots, which plot the quantiles of one data set against another data revealed that after the match, the empirical
distributions were similar in the BRFSS and NHIS groups. Compared to the original BRFSS dataset, the 2-to1 Nearest Neighbor (NN) algorithm was the closest to the BRRFSS respondents (86.2% [86.0, 86.50] versus 88.2% [88.1, 88.3], respectively). This is
quite good considering national estimates differ by a few percentage points from survey to survey. Our imputed estimates are not within the confidence interval of the BRFSS. However, being within the narrow BRFSS confidence interval may be too rigorous a
standard because of the very large sample size of the BRFSS. Sensitivities and specificities reveal that 2-to-1 NN with replacement and Mahalanobis were more accurate than Nearest Neighbor methods with caliper, without replacement and 1-to-1
matching.
Subjects/Keywords: Data Fusion; Data Integration, Matching, BRFSS, NHIS
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Khalil, G. M. (2016). Data fusion. (Thesis). University of Georgia. Retrieved from http://hdl.handle.net/10724/35357"
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
Chicago Manual of Style (16th Edition):
Khalil, George Magdy. “Data fusion.” 2016. Thesis, University of Georgia. Accessed January 17, 2021.
http://hdl.handle.net/10724/35357".
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
MLA Handbook (7th Edition):
Khalil, George Magdy. “Data fusion.” 2016. Web. 17 Jan 2021.
Vancouver:
Khalil GM. Data fusion. [Internet] [Thesis]. University of Georgia; 2016. [cited 2021 Jan 17].
Available from: http://hdl.handle.net/10724/35357".
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
Council of Science Editors:
Khalil GM. Data fusion. [Thesis]. University of Georgia; 2016. Available from: http://hdl.handle.net/10724/35357"
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
10.
El-Roby, Ahmed.
Web Data Integration for Non-Expert Users.
Degree: 2018, University of Waterloo
URL: http://hdl.handle.net/10012/13179
► oday, there is an abundance of structured data available on the web in the form of RDF graphs and relational (i.e., tabular) data. This data…
(more)
▼ oday, there is an abundance of structured data available on the web in the form of RDF graphs and relational (i.e., tabular) data. This data comes from heterogeneous sources, and realizing its full value requires integrating these sources so that they can be queried together. Due to the scale and heterogeneity of the data sources on the web, integrating them is typically an automatic process. However, automatic data integration approaches are not completely accurate since they infer semantics from syntax in data sources with a high degree of heterogeneity. Therefore, these automatic approaches can be considered as a first step to quickly get reasonable quality data integration output that can be used in issuing queries over the data sources. A second step is refining this output over time while it is being used. Interacting with the data sources through the output of the data integration system and refining this output requires expertise in data management, which limits the scope of this activity to power users and consequently limits the usability of data integration systems.
This thesis focuses on helping non-expert users to access heterogeneous data sources through data integration systems, without requiring the users to have prior knowledge of the queried data sources or exposing them to the details of the output of the data integration system. In addition, the users can provide feedback over the answers to their queries, which can then be used to refine and improve the quality of the data integration output. The thesis studies both RDF and relational data. For RDF data, the thesis focuses on helping non-expert users to query heterogeneous RDF data sources, and utilizing their feedback over query answers to improve the quality of the interlinking between these data sources. For relational data, the thesis focuses on improving the quality of the mediated schema for a set of relational data sources and the semantic mappings between these sources based on user feedback over query answers.
Subjects/Keywords: Data Integration; Data Management; User Feedback; Pay-As-You-Go Data Integration; RDF; Linked Data
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
El-Roby, A. (2018). Web Data Integration for Non-Expert Users. (Thesis). University of Waterloo. Retrieved from http://hdl.handle.net/10012/13179
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
Chicago Manual of Style (16th Edition):
El-Roby, Ahmed. “Web Data Integration for Non-Expert Users.” 2018. Thesis, University of Waterloo. Accessed January 17, 2021.
http://hdl.handle.net/10012/13179.
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
MLA Handbook (7th Edition):
El-Roby, Ahmed. “Web Data Integration for Non-Expert Users.” 2018. Web. 17 Jan 2021.
Vancouver:
El-Roby A. Web Data Integration for Non-Expert Users. [Internet] [Thesis]. University of Waterloo; 2018. [cited 2021 Jan 17].
Available from: http://hdl.handle.net/10012/13179.
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
Council of Science Editors:
El-Roby A. Web Data Integration for Non-Expert Users. [Thesis]. University of Waterloo; 2018. Available from: http://hdl.handle.net/10012/13179
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation

Universidade de Lisboa
11.
Carreira, Paulo J.F.
Mapper: An Efficient Data Transformation Operator.
Degree: 2008, Universidade de Lisboa
URL: http://www.rcaap.pt/detail.jsp?id=oai:repositorio.ul.pt:10451/14295
► Data transformations are fundamental operations in legacy data migration, data integration, data cleaning, and data warehousing. These operations are often implemented as relational queries that…
(more)
▼ Data transformations are fundamental operations in legacy
data migration,
data integration,
data cleaning, and
data warehousing. These operations are often implemented as relational queries that aim at leveraging the optimization capabilities of most DBMSs. However, relational query languages like SQL are not expressive enough to specify one-to-many
data transformations, an important class of
data transformations that produce several output tuples for a single input tuple. These transformations are required for solving several types of
data heterogeneities, like those that occur when the source
data represents aggregations of the target
data. This thesis proposes a new relational operator, named
data mapper, as an extension to the relational algebra to address one-to-many
data transformations and focus on its optimization. It also provides algebraic rewriting rules and execution algorithms for the logical and physical optimization, respectively. As a result, queries may be expressed as a combination of standard relational operators and mappers. The proposed optimizations have been experimentally validated and the key factors that influence the obtained performance gains identified.
Advisors/Committee Members: Galhardas, Helena Isabel de Jesus, Silva, Mário Jorge Costa Gaspar da.
Subjects/Keywords: Relational Algebra; Data Transformation; Data Integration; Data Cleaning; Data Warehousing
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Carreira, P. J. F. (2008). Mapper: An Efficient Data Transformation Operator. (Thesis). Universidade de Lisboa. Retrieved from http://www.rcaap.pt/detail.jsp?id=oai:repositorio.ul.pt:10451/14295
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
Chicago Manual of Style (16th Edition):
Carreira, Paulo J F. “Mapper: An Efficient Data Transformation Operator.” 2008. Thesis, Universidade de Lisboa. Accessed January 17, 2021.
http://www.rcaap.pt/detail.jsp?id=oai:repositorio.ul.pt:10451/14295.
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
MLA Handbook (7th Edition):
Carreira, Paulo J F. “Mapper: An Efficient Data Transformation Operator.” 2008. Web. 17 Jan 2021.
Vancouver:
Carreira PJF. Mapper: An Efficient Data Transformation Operator. [Internet] [Thesis]. Universidade de Lisboa; 2008. [cited 2021 Jan 17].
Available from: http://www.rcaap.pt/detail.jsp?id=oai:repositorio.ul.pt:10451/14295.
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
Council of Science Editors:
Carreira PJF. Mapper: An Efficient Data Transformation Operator. [Thesis]. Universidade de Lisboa; 2008. Available from: http://www.rcaap.pt/detail.jsp?id=oai:repositorio.ul.pt:10451/14295
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation

Dalhousie University
12.
Rahman, Md Abdur.
A Data Mining Framework for Automatic Online Customer Lead
Generation.
Degree: Master of Computer Science, Faculty of Computer Science, 2012, Dalhousie University
URL: http://hdl.handle.net/10222/14587
► Customer lead generation is a crucial and challenging task for online real estate service providers. The business model of online real estate service differs from…
(more)
▼ Customer lead generation is a crucial and challenging
task for online real estate service providers. The business model
of online real estate service differs from typical B2B or B2C
e-commerce because it acts like a broker between the real estate
companies and the potential home buyers. Currently, there is no
suitable automatic customer lead generation system available for
online real estate service providers. This thesis aims at
developing a systematic solution framework of automatic customer
lead generation for online real estate service providers. This
framework includes
data modeling,
data integration from multiple
online web
data streams, as well as
data mining and system
evaluation for lead pattern discovery and lead prediction.
Extensive experiments were conducted based on a case study. The
results demonstrate that the proposed approach is able to empower
online real estate service providers for lead
data analysis and
automatically generate targeting customer leads.
Advisors/Committee Members: n/a (external-examiner), Dr. Qigang Gao (graduate-coordinator), Dr. Vlado Keselj (thesis-reader), Dr. Hai Wang and Dr. Qigang Gao (thesis-supervisor), Not Applicable (ethics-approval), Not Applicable (manuscripts), Not Applicable (copyright-release).
Subjects/Keywords: Data mining; Customer lead generation; Business Intelligence; Data modeling; Data integration
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Rahman, M. A. (2012). A Data Mining Framework for Automatic Online Customer Lead
Generation. (Masters Thesis). Dalhousie University. Retrieved from http://hdl.handle.net/10222/14587
Chicago Manual of Style (16th Edition):
Rahman, Md Abdur. “A Data Mining Framework for Automatic Online Customer Lead
Generation.” 2012. Masters Thesis, Dalhousie University. Accessed January 17, 2021.
http://hdl.handle.net/10222/14587.
MLA Handbook (7th Edition):
Rahman, Md Abdur. “A Data Mining Framework for Automatic Online Customer Lead
Generation.” 2012. Web. 17 Jan 2021.
Vancouver:
Rahman MA. A Data Mining Framework for Automatic Online Customer Lead
Generation. [Internet] [Masters thesis]. Dalhousie University; 2012. [cited 2021 Jan 17].
Available from: http://hdl.handle.net/10222/14587.
Council of Science Editors:
Rahman MA. A Data Mining Framework for Automatic Online Customer Lead
Generation. [Masters Thesis]. Dalhousie University; 2012. Available from: http://hdl.handle.net/10222/14587

University of Toronto
13.
Zhu, Sirui.
Integration of Commercial Vehicle GPS and Roadside Intercept Survey Data.
Degree: 2017, University of Toronto
URL: http://hdl.handle.net/1807/79465
► Technologies such as smart phone and GPS give collect spatial-temporal data with sample sizes that far exceed conventional survey methods, but lack the flexibility that…
(more)
▼ Technologies such as smart phone and GPS give collect spatial-temporal data with sample sizes that far exceed conventional survey methods, but lack the flexibility that a conventional survey offers. This study develops a data fusion method to impute new variables of interest for a large GPS data set, by establish link to a different data set that has the variables of interest and shares common data with the GPS data set. As a case study, this study uses Ontarioâ s Roadside Commercial Vehicle Survey (CVS) to enrich a GPS data from Xata Inc. The enrichment process has three parts, converting raw GPS data into GPS trips, matching CVS trips to GPS trips, and imputing the missing variables for GPS trips. The research concluded that imputation methods can produce a synthetic dataset with large sample size and rich information from roadside interview data with good accuracy at an aggregate level such as corridor.
M.A.S.
Advisors/Committee Members: Roorda, Matthew J, Civil Engineering.
Subjects/Keywords: Commercial Vehicle Survey; Data fusion; Data integration; Freight data; GPS; 0709
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Zhu, S. (2017). Integration of Commercial Vehicle GPS and Roadside Intercept Survey Data. (Masters Thesis). University of Toronto. Retrieved from http://hdl.handle.net/1807/79465
Chicago Manual of Style (16th Edition):
Zhu, Sirui. “Integration of Commercial Vehicle GPS and Roadside Intercept Survey Data.” 2017. Masters Thesis, University of Toronto. Accessed January 17, 2021.
http://hdl.handle.net/1807/79465.
MLA Handbook (7th Edition):
Zhu, Sirui. “Integration of Commercial Vehicle GPS and Roadside Intercept Survey Data.” 2017. Web. 17 Jan 2021.
Vancouver:
Zhu S. Integration of Commercial Vehicle GPS and Roadside Intercept Survey Data. [Internet] [Masters thesis]. University of Toronto; 2017. [cited 2021 Jan 17].
Available from: http://hdl.handle.net/1807/79465.
Council of Science Editors:
Zhu S. Integration of Commercial Vehicle GPS and Roadside Intercept Survey Data. [Masters Thesis]. University of Toronto; 2017. Available from: http://hdl.handle.net/1807/79465

University of Sydney
14.
Naiwala Pathirannehelage, Kaushala Samudini Jayawardana.
Prognostic Methods for Integrating Data from Complex Diseases
.
Degree: 2016, University of Sydney
URL: http://hdl.handle.net/2123/14315
► Statistics in medical research gained a vast surge with the development of high-throughput biotechnologies that provide thousands of measurements for each patient. These multi-layered data…
(more)
▼ Statistics in medical research gained a vast surge with the development of high-throughput biotechnologies that provide thousands of measurements for each patient. These multi-layered data has the clear potential to improve the disease prognosis. Data integration is increasingly becoming essential in this context, to address problems such as increasing the power, inconsistencies between studies, obtaining more reliable biomarkers and gaining a broader understanding of the disease. This thesis focuses on addressing the challenges in the development of statistical methods while contributing to the methodological advancements in this field. We propose a clinical data analysis framework to obtain a model with good prediction accuracy addressing missing data and model instability. A detailed pre-processing pipeline is proposed for miRNA data that removes unwanted noise and offers improved concordance with qRT-PCR data. Platform specific models are developed to uncover biomarkers using mRNA, protein and miRNA data, to identify the source with the most important prognostic information. This thesis explores two types of data integration: horizontal; the integration of same type of data, and vertical; the integration of data from different platforms for the same patient. We use multiple miRNA datasets to develop a meta-analysis framework addressing the challenges in horizontal data integration using a multi-step validation protocol. In the vertical data integration, we extend the pre-validation principle and derive platform dependent weights to utilise the weighted Lasso. Our study revealed that integration of multi-layered data is instrumental in improving the prediction accuracy and in obtaining more biologically relevant biomarkers. A novel visualisation technique to look at prediction accuracy at patient level revealed vital findings with translational impact in personalised medicine.
Subjects/Keywords: Prediction;
Data integration;
biomarker;
melanoma;
high-throughput data;
Clinical data
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Naiwala Pathirannehelage, K. S. J. (2016). Prognostic Methods for Integrating Data from Complex Diseases
. (Thesis). University of Sydney. Retrieved from http://hdl.handle.net/2123/14315
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
Chicago Manual of Style (16th Edition):
Naiwala Pathirannehelage, Kaushala Samudini Jayawardana. “Prognostic Methods for Integrating Data from Complex Diseases
.” 2016. Thesis, University of Sydney. Accessed January 17, 2021.
http://hdl.handle.net/2123/14315.
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
MLA Handbook (7th Edition):
Naiwala Pathirannehelage, Kaushala Samudini Jayawardana. “Prognostic Methods for Integrating Data from Complex Diseases
.” 2016. Web. 17 Jan 2021.
Vancouver:
Naiwala Pathirannehelage KSJ. Prognostic Methods for Integrating Data from Complex Diseases
. [Internet] [Thesis]. University of Sydney; 2016. [cited 2021 Jan 17].
Available from: http://hdl.handle.net/2123/14315.
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
Council of Science Editors:
Naiwala Pathirannehelage KSJ. Prognostic Methods for Integrating Data from Complex Diseases
. [Thesis]. University of Sydney; 2016. Available from: http://hdl.handle.net/2123/14315
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation

University of Georgia
15.
Weaver, Bryan Douglas.
Implementation of the National Map road database with consideration for organizational and technical integration constraints.
Degree: 2014, University of Georgia
URL: http://hdl.handle.net/10724/22020
► The National Map (TNM) concept allows for a more effective and regenerative public mapping program but not without overcoming some significant challenges. Numerous federal initiatives…
(more)
▼ The National Map (TNM) concept allows for a more effective and regenerative public mapping program but not without overcoming some significant challenges. Numerous federal initiatives attempting to consolidate spatial information continue to
proceed with poor coordination. This thesis assesses TNM road database implementation plan. The major organizational and technical constraints to spatial data integration are presented. Literature and sample data are used to develop indices that measure
road data integration complexity, potentially assisting policy-makers in the development of relative cost models for various integration strategies. Results suggest an overwhelming need for a comprehensive requirements analysis for TNM transportation
theme. The establishment of an overarching authority over federal, domestic mapping agencies is recommended. Integrated public road GIS data and systems is a daunting goal but one that remains in the best interest of the nation and should be pursued by
the administration in the spirit of responsible governance.
Subjects/Keywords: The National Map; Data integration; Institutional integration; Information sharing; Transportation; Federal mapping programs; Integration measures
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Weaver, B. D. (2014). Implementation of the National Map road database with consideration for organizational and technical integration constraints. (Thesis). University of Georgia. Retrieved from http://hdl.handle.net/10724/22020
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
Chicago Manual of Style (16th Edition):
Weaver, Bryan Douglas. “Implementation of the National Map road database with consideration for organizational and technical integration constraints.” 2014. Thesis, University of Georgia. Accessed January 17, 2021.
http://hdl.handle.net/10724/22020.
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
MLA Handbook (7th Edition):
Weaver, Bryan Douglas. “Implementation of the National Map road database with consideration for organizational and technical integration constraints.” 2014. Web. 17 Jan 2021.
Vancouver:
Weaver BD. Implementation of the National Map road database with consideration for organizational and technical integration constraints. [Internet] [Thesis]. University of Georgia; 2014. [cited 2021 Jan 17].
Available from: http://hdl.handle.net/10724/22020.
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
Council of Science Editors:
Weaver BD. Implementation of the National Map road database with consideration for organizational and technical integration constraints. [Thesis]. University of Georgia; 2014. Available from: http://hdl.handle.net/10724/22020
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
16.
Jeanmougin, Marine.
Statistical methods for robust analysis of transcriptome data by integration of biological prior knowledge : Méthodes statistiques pour une analyse robuste du transcriptome à travers l'intégration d'a priori biologique.
Degree: Docteur es, Mathématiques appliquées, 2012, Evry-Val d'Essonne
URL: http://www.theses.fr/2012EVRY0029
► Au cours de la dernière décennie, les progrès en Biologie Moléculaire ont accéléré le développement de techniques d'investigation à haut-débit. En particulier, l'étude du transcriptome…
(more)
▼ Au cours de la dernière décennie, les progrès en Biologie Moléculaire ont accéléré le développement de techniques d'investigation à haut-débit. En particulier, l'étude du transcriptome a permis des avancées majeures dans la recherche médicale. Dans cette thèse, nous nous intéressons au développement de méthodes statistiques dédiées au traitement et à l'analyse de données transcriptomiques à grande échelle. Nous abordons le problème de sélection de signatures de gènes à partir de méthodes d'analyse de l'expression différentielle et proposons une étude de comparaison de différentes approches, basée sur plusieurs stratégies de simulations et sur des données réelles. Afin de pallier les limites de ces méthodes classiques qui s'avèrent peu reproductibles, nous présentons un nouvel outil, DiAMS (DIsease Associated Modules Selection), dédié à la sélection de modules de gènes significatifs. DiAMS repose sur une extension du score-local et permet l'intégration de données d'expressions et de données d'interactions protéiques. Par la suite, nous nous intéressons au problème d'inférence de réseaux de régulation de gènes. Nous proposons une méthode de reconstruction à partir de modèles graphiques Gaussiens, basée sur l'introduction d'a priori biologique sur la structure des réseaux. Cette approche nous permet d'étudier les interactions entre gènes et d'identifier des altérations dans les mécanismes de régulation, qui peuvent conduire à l'apparition ou à la progression d'une maladie. Enfin l'ensemble de ces développements méthodologiques sont intégrés dans un pipeline d'analyse que nous appliquons à l'étude de la rechute métastatique dans le cancer du sein.
Recent advances in Molecular Biology have led biologists toward high-throughput genomic studies. In particular, the investigation of the human transcriptome offers unprecedented opportunities for understanding cellular and disease mechanisms. In this PhD, we put our focus on providing robust statistical methods dedicated to the treatment and the analysis of high-throughput transcriptome data. We discuss the differential analysis approaches available in the literature for identifying genes associated with a phenotype of interest and propose a comparison study. We provide practical recommendations on the appropriate method to be used based on various simulation models and real datasets. With the eventual goal of overcoming the inherent instability of differential analysis strategies, we have developed an innovative approach called DiAMS, for DIsease Associated Modules Selection. This method was applied to select significant modules of genes rather than individual genes and involves the integration of both transcriptome and protein interactions data in a local-score strategy. We then focus on the development of a framework to infer gene regulatory networks by integration of a biological informative prior over network structures using Gaussian graphical models. This approach offers the possibility of exploring the molecular relationships between genes, leading to the…
Advisors/Committee Members: Ambroise, Christophe (thesis director), Guedj, Mickaël (thesis director).
Subjects/Keywords: Intégration de données hétérogènes; Heterogeneous data integration
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Jeanmougin, M. (2012). Statistical methods for robust analysis of transcriptome data by integration of biological prior knowledge : Méthodes statistiques pour une analyse robuste du transcriptome à travers l'intégration d'a priori biologique. (Doctoral Dissertation). Evry-Val d'Essonne. Retrieved from http://www.theses.fr/2012EVRY0029
Chicago Manual of Style (16th Edition):
Jeanmougin, Marine. “Statistical methods for robust analysis of transcriptome data by integration of biological prior knowledge : Méthodes statistiques pour une analyse robuste du transcriptome à travers l'intégration d'a priori biologique.” 2012. Doctoral Dissertation, Evry-Val d'Essonne. Accessed January 17, 2021.
http://www.theses.fr/2012EVRY0029.
MLA Handbook (7th Edition):
Jeanmougin, Marine. “Statistical methods for robust analysis of transcriptome data by integration of biological prior knowledge : Méthodes statistiques pour une analyse robuste du transcriptome à travers l'intégration d'a priori biologique.” 2012. Web. 17 Jan 2021.
Vancouver:
Jeanmougin M. Statistical methods for robust analysis of transcriptome data by integration of biological prior knowledge : Méthodes statistiques pour une analyse robuste du transcriptome à travers l'intégration d'a priori biologique. [Internet] [Doctoral dissertation]. Evry-Val d'Essonne; 2012. [cited 2021 Jan 17].
Available from: http://www.theses.fr/2012EVRY0029.
Council of Science Editors:
Jeanmougin M. Statistical methods for robust analysis of transcriptome data by integration of biological prior knowledge : Méthodes statistiques pour une analyse robuste du transcriptome à travers l'intégration d'a priori biologique. [Doctoral Dissertation]. Evry-Val d'Essonne; 2012. Available from: http://www.theses.fr/2012EVRY0029

Delft University of Technology
17.
Feliksik, E.P. (author).
RDF Gears, a data integration framework for the Semantic Web.
Degree: 2011, Delft University of Technology
URL: http://resolver.tudelft.nl/uuid:6e8f2999-71cf-420b-b5fc-235de3569285
► This thesis describes the design and implementation of RDF Gears, a data integration framework for the Semantic Web. The RDF Gears Language combines the Semantic…
(more)
▼ This thesis describes the design and implementation of RDF Gears, a data integration framework for the Semantic Web. The RDF Gears Language combines the Semantic Web technologies with the Nested Relational Algebra. It provides an expressive Domain Specific Language for the development of workflows integrating RDF data with other sources. It allows Semantic Web developers and researchers to work on new, domain specific algorithms without wasting time on the implementation details of data transformation, storage and optimization. A web-based user interface is presented to create RGL workflows that are visualized with a graphical syntax. An execution engine is developed to function as a workflow interpreter. It implements aggressive pipelining, lazy evaluation and other optimizations. A comparison with the Silk Framework shows that this first implementation is already quite efficient.
Web Information Systems
Software Technology
Electrical Engineering, Mathematics and Computer Science
Advisors/Committee Members: Hidders, A.J.H. (mentor).
Subjects/Keywords: rdf; semantic web; data integration; nrc
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Feliksik, E. P. (. (2011). RDF Gears, a data integration framework for the Semantic Web. (Masters Thesis). Delft University of Technology. Retrieved from http://resolver.tudelft.nl/uuid:6e8f2999-71cf-420b-b5fc-235de3569285
Chicago Manual of Style (16th Edition):
Feliksik, E P (author). “RDF Gears, a data integration framework for the Semantic Web.” 2011. Masters Thesis, Delft University of Technology. Accessed January 17, 2021.
http://resolver.tudelft.nl/uuid:6e8f2999-71cf-420b-b5fc-235de3569285.
MLA Handbook (7th Edition):
Feliksik, E P (author). “RDF Gears, a data integration framework for the Semantic Web.” 2011. Web. 17 Jan 2021.
Vancouver:
Feliksik EP(. RDF Gears, a data integration framework for the Semantic Web. [Internet] [Masters thesis]. Delft University of Technology; 2011. [cited 2021 Jan 17].
Available from: http://resolver.tudelft.nl/uuid:6e8f2999-71cf-420b-b5fc-235de3569285.
Council of Science Editors:
Feliksik EP(. RDF Gears, a data integration framework for the Semantic Web. [Masters Thesis]. Delft University of Technology; 2011. Available from: http://resolver.tudelft.nl/uuid:6e8f2999-71cf-420b-b5fc-235de3569285

University of Minnesota
18.
Orreggio, Giordi.
Translational Cancer Research Data Quality – The Context Factor.
Degree: PhD, Health Informatics, 2017, University of Minnesota
URL: http://hdl.handle.net/11299/191477
► Cronbach’s alpha indicates that as the count of items in a set increases, so does the level of relationship between them. Translational cancer research (TCR)…
(more)
▼ Cronbach’s alpha indicates that as the count of items in a set increases, so does the level of relationship between them. Translational cancer research (TCR) data is an example of increasing items within a set. As a national priority, TRC is well-funded contributing to continued increase in data organizations produce, the number of organizations producing data, and the amount of sharing in which each organization participates. However, rather than leveraging the data relationships – a contextual approach – intrinsic measures such as accuracy and completeness remain referenced most often in data quality (DQ) articles and conceptual frameworks. The purpose of this set of studies is to expand our knowledge of TCR data quality (DQ) by examining context-sensitive DQ methods. The knowledge gained could be incorporated into future TCR DQ efforts, leading to more informative and actionable data, and quicker development of better clinical treatments.
Subjects/Keywords: cancer; context; data; integration; quality; research
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Orreggio, G. (2017). Translational Cancer Research Data Quality – The Context Factor. (Doctoral Dissertation). University of Minnesota. Retrieved from http://hdl.handle.net/11299/191477
Chicago Manual of Style (16th Edition):
Orreggio, Giordi. “Translational Cancer Research Data Quality – The Context Factor.” 2017. Doctoral Dissertation, University of Minnesota. Accessed January 17, 2021.
http://hdl.handle.net/11299/191477.
MLA Handbook (7th Edition):
Orreggio, Giordi. “Translational Cancer Research Data Quality – The Context Factor.” 2017. Web. 17 Jan 2021.
Vancouver:
Orreggio G. Translational Cancer Research Data Quality – The Context Factor. [Internet] [Doctoral dissertation]. University of Minnesota; 2017. [cited 2021 Jan 17].
Available from: http://hdl.handle.net/11299/191477.
Council of Science Editors:
Orreggio G. Translational Cancer Research Data Quality – The Context Factor. [Doctoral Dissertation]. University of Minnesota; 2017. Available from: http://hdl.handle.net/11299/191477

Virginia Tech
19.
Fu, Yi.
Differential Dependency Network and Data Integration for Detecting Network Rewiring and Biomarkers.
Degree: PhD, Electrical Engineering, 2020, Virginia Tech
URL: http://hdl.handle.net/10919/96634
► We witnessed the start of the human genome project decades ago and stepped into the era of omics since then. Omics are comprehensive approaches for…
(more)
▼ We witnessed the start of the human genome project decades ago and stepped into the era of omics since then. Omics are comprehensive approaches for analyzing genome-wide biomolecular profiles. The rapid development of high-throughput technologies enables us to produce an enormous amount of omics
data such as genomics, transcriptomics, and proteomics
data, which makes researchers swim in a sea of omics information that once never imagined. Yet, the era of omics brings new challenges to us: to process the huge volumes of
data, to summarize the
data, to reveal the interactions between entities, to link various types of omics
data, and to discover mechanisms hidden behind omics
data.
In processing omics
data, one factor that weakens the strengths of follow up
data analysis is sample impurity. We call impure tumor samples contaminated by normal cells as heterogeneous samples. The genomic signals measured from heterogeneous samples are a mixture of signals from both tumor cells and normal cells. To correct the mixed signals and get true signals from pure tumor cells, we propose a computational approach called BACOM 2.0 to estimate normal cell fraction and corrected genomics signals accordingly. By introducing a novel normalization method that identifies the neutral component in mixed signals of genomic copy number
data, BACOM 2.0 could accurately detect genes' deletion types and abnormal chromosome numbers in tumor cells.
In cells, genes connect to other genes and form complex biological networks to perform their functions. Dysregulated genes can cause structural change in biological networks, also known as network rewiring. In a biological network with network rewiring events, a large quantity of network rewiring linking to a single hub gene suggests concentrated gene dysregulation. This hub gene has more impact on the network and hence is more likely to associate with the functional change of the network, which ultimately leads to abnormal phenotypes such as cancer diseases. Therefore, the hub genes linked with network rewiring are potential indicators of disease status or known as biomarkers. Differential dependency network (DDN) method was proposed to detect network rewiring events and biomarkers from omics
data.
However, the DDN method still has a few drawbacks. Firstly, for two groups of
data with unequal sample sizes, DDN consistently detects false targets of network rewiring. The permutation test, which uses the same method on randomly shuffled samples is supposed to distinguish the true targets from random effects, however, is also suffered from the same reason and could let pass those false targets. We propose a new formulation that corrects the mistakes brought by unequal group size and design a simulation study to test the new formulation's correctness. Secondly, the time used for computing in solving DDN problems is unbearably long when processing omics
data with a large number of samples scale or a large number of genes. We propose several strategies to increase DDN's computation speed, including three…
Advisors/Committee Members: Wang, Yue J. (committeechair), Haghighat, Alireza (committee member), Zhang, Zhen (committee member), Clancy, Thomas Charles (committee member), Yu, Guoqiang (committee member).
Subjects/Keywords: molecular data integration; differential network analysis; biomarker
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Fu, Y. (2020). Differential Dependency Network and Data Integration for Detecting Network Rewiring and Biomarkers. (Doctoral Dissertation). Virginia Tech. Retrieved from http://hdl.handle.net/10919/96634
Chicago Manual of Style (16th Edition):
Fu, Yi. “Differential Dependency Network and Data Integration for Detecting Network Rewiring and Biomarkers.” 2020. Doctoral Dissertation, Virginia Tech. Accessed January 17, 2021.
http://hdl.handle.net/10919/96634.
MLA Handbook (7th Edition):
Fu, Yi. “Differential Dependency Network and Data Integration for Detecting Network Rewiring and Biomarkers.” 2020. Web. 17 Jan 2021.
Vancouver:
Fu Y. Differential Dependency Network and Data Integration for Detecting Network Rewiring and Biomarkers. [Internet] [Doctoral dissertation]. Virginia Tech; 2020. [cited 2021 Jan 17].
Available from: http://hdl.handle.net/10919/96634.
Council of Science Editors:
Fu Y. Differential Dependency Network and Data Integration for Detecting Network Rewiring and Biomarkers. [Doctoral Dissertation]. Virginia Tech; 2020. Available from: http://hdl.handle.net/10919/96634

University of Georgia
20.
Pennington, Cary Marcus.
Building federated bioinformatics databases using Web services.
Degree: 2014, University of Georgia
URL: http://hdl.handle.net/10724/26120
► Bioinformatics laboratories around the world continue to generate massive amounts of genomic and functional genomic data. Access to these data resources via Federated Databases through…
(more)
▼ Bioinformatics laboratories around the world continue to generate massive amounts of genomic and functional genomic data. Access to these data resources via Federated Databases through the Web is an important emerging technology. Federated
databases allow several databases to be integrated while not discarding existing databases or losing local control over the administration of the databases. In order to achieve this desirable goal, technology needed to emerge to handle the heterogeneity
and local autonomy of distributed database systems. Web service technology, including semantic Web service technology, provides a new opportunity to make federated databases a practical reality. This thesis presents architecture and an implementation to
build federated databases using Web services. In particular, it demonstrates how Web service technology can provide the flexibility to create a dynamic federation of databases with sufficient abstraction to maintain the autonomy of the component systems
and robustness to handle the heterogeneous nature of the disparate databases. A case study is conducted that involves federating six existing bioinformatics databases, CryptoDB, GiardiaDB, PlasmoDB, ToxoDB, TrichDB and TryTrypDB, to create the EuPathDB
(formerly ApiDB) federated database.
Subjects/Keywords: Web Services; Data Integration; Database; Federation; Bioinformatics
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Pennington, C. M. (2014). Building federated bioinformatics databases using Web services. (Thesis). University of Georgia. Retrieved from http://hdl.handle.net/10724/26120
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
Chicago Manual of Style (16th Edition):
Pennington, Cary Marcus. “Building federated bioinformatics databases using Web services.” 2014. Thesis, University of Georgia. Accessed January 17, 2021.
http://hdl.handle.net/10724/26120.
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
MLA Handbook (7th Edition):
Pennington, Cary Marcus. “Building federated bioinformatics databases using Web services.” 2014. Web. 17 Jan 2021.
Vancouver:
Pennington CM. Building federated bioinformatics databases using Web services. [Internet] [Thesis]. University of Georgia; 2014. [cited 2021 Jan 17].
Available from: http://hdl.handle.net/10724/26120.
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
Council of Science Editors:
Pennington CM. Building federated bioinformatics databases using Web services. [Thesis]. University of Georgia; 2014. Available from: http://hdl.handle.net/10724/26120
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation

Uppsala University
21.
Kuipers, Wietse.
Financial Data Integration : A case study analyzing factors that impact the integration of financial data between systems.
Degree: Business Studies, 2016, Uppsala University
URL: http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-296723
► Enterprise systems play a central role in the business processes and management of data within an organization. However it is not uncommon for organizations…
(more)
▼ Enterprise systems play a central role in the business processes and management of data within an organization. However it is not uncommon for organizations to posses a multitude of autonomous systems. This thesis examines the way organizations can integrate financial data from different autonomous source systems and examines different factors that can have an impact on data integration processes. The empirical findings were gathered through a case study at Sandvik, a large Swedish industrial firm, making use of qualitative research techniques. The findings contribute to create an in-depth understanding of financial data integration processes. The empirical findings show how an organization can accomplish financial data integration without tight coupling of autonomous systems. Moreover the research contributes by describing various organization and technological factors that impact data integration. The findings indicate that a decentralized organizational structure and singular system architecture play an important role in financial data integration processes. Hereby the research helps to further explore the topic integration within enterprise system research and provides context behind the organizational and technological factors that influence financial data integration processes.
Subjects/Keywords: financial data integration; system integration; enterprise system; autonomous systems; financial reporting
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Kuipers, W. (2016). Financial Data Integration : A case study analyzing factors that impact the integration of financial data between systems. (Thesis). Uppsala University. Retrieved from http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-296723
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
Chicago Manual of Style (16th Edition):
Kuipers, Wietse. “Financial Data Integration : A case study analyzing factors that impact the integration of financial data between systems.” 2016. Thesis, Uppsala University. Accessed January 17, 2021.
http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-296723.
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
MLA Handbook (7th Edition):
Kuipers, Wietse. “Financial Data Integration : A case study analyzing factors that impact the integration of financial data between systems.” 2016. Web. 17 Jan 2021.
Vancouver:
Kuipers W. Financial Data Integration : A case study analyzing factors that impact the integration of financial data between systems. [Internet] [Thesis]. Uppsala University; 2016. [cited 2021 Jan 17].
Available from: http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-296723.
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
Council of Science Editors:
Kuipers W. Financial Data Integration : A case study analyzing factors that impact the integration of financial data between systems. [Thesis]. Uppsala University; 2016. Available from: http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-296723
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation

Colorado State University
22.
Mitra, Saptashwa.
Adaptive spatiotemporal data integration using distributed query relaxation over heterogeneous observational datasets.
Degree: MS(M.S.), Computer Science, 2018, Colorado State University
URL: http://hdl.handle.net/10217/191379
► Combining data from disparate sources enhances the opportunity to explore different aspects of the phenomena under consideration. However, there are several challenges in doing so…
(more)
▼ Combining
data from disparate sources enhances the opportunity to explore different aspects of the phenomena under consideration. However, there are several challenges in doing so effectively that include inter alia, the heterogeneity in
data representation and format, collection patterns, and
integration of foreign
data attributes in a ready-to-use condition. In this study, we propose a scalable query-oriented
data integration framework that provides estimations for spatiotemporally aligned
data points. We have designed Confluence, a distributed
data integration framework that dynamically generates accurate interpolations for the targeted spatiotemporal scopes along with an estimate of the uncertainty involved with such estimation. Confluence orchestrates computations to evaluate spatial and temporal query joins and to interpolate values. Our methodology facilitates distributed query evaluations with a dynamic relaxation of query constraints. Query evaluations are locality-aware and we leverage model-based dynamic parameter selection to provide accurate estimation for
data points. We have included empirical benchmarks that profile the suitability of our approach in terms of accuracy, latency, and throughput at scale.
Advisors/Committee Members: Pallickara, Sangmi Lee (advisor), Pallickara, Shrideep (committee member), Li, Kaigang (committee member).
Subjects/Keywords: data integration; real time queries; vector data; raster data; data fusion; spatiotemporal
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Mitra, S. (2018). Adaptive spatiotemporal data integration using distributed query relaxation over heterogeneous observational datasets. (Masters Thesis). Colorado State University. Retrieved from http://hdl.handle.net/10217/191379
Chicago Manual of Style (16th Edition):
Mitra, Saptashwa. “Adaptive spatiotemporal data integration using distributed query relaxation over heterogeneous observational datasets.” 2018. Masters Thesis, Colorado State University. Accessed January 17, 2021.
http://hdl.handle.net/10217/191379.
MLA Handbook (7th Edition):
Mitra, Saptashwa. “Adaptive spatiotemporal data integration using distributed query relaxation over heterogeneous observational datasets.” 2018. Web. 17 Jan 2021.
Vancouver:
Mitra S. Adaptive spatiotemporal data integration using distributed query relaxation over heterogeneous observational datasets. [Internet] [Masters thesis]. Colorado State University; 2018. [cited 2021 Jan 17].
Available from: http://hdl.handle.net/10217/191379.
Council of Science Editors:
Mitra S. Adaptive spatiotemporal data integration using distributed query relaxation over heterogeneous observational datasets. [Masters Thesis]. Colorado State University; 2018. Available from: http://hdl.handle.net/10217/191379

University of Illinois – Urbana-Champaign
23.
Zhao, Bo.
Truth finding in databases.
Degree: PhD, 0112, 2013, University of Illinois – Urbana-Champaign
URL: http://hdl.handle.net/2142/42470
► In practical data integration systems, it is common for the data sources being integrated to provide conflicting information about the same entity. Consequently, a major…
(more)
▼ In practical
data integration systems, it is common for the
data sources being integrated to provide conflicting information about the same entity. Consequently, a major challenge for
data integration is to derive the most complete and accurate integrated records from diverse and sometimes conflicting sources. We term this challenge the truth finding problem. We observe that some sources are generally more reliable than others, and therefore a good model of source quality is the key to solving the truth finding problem. In this thesis, we propose probabilistic models that can automatically infer true records and source quality without any supervision on both categorical
data and numerical
data. We further develop a new entity matching framework that considers source quality based on truth-finding models.
On categorical
data, in contrast to previous methods, our principled approach leverages a generative process of two types of errors (false positive and false negative) by modeling two different aspects of source quality. In so doing, ours is also the first approach designed to merge multi-valued attribute types. Our method is scalable, due to an efficient sampling-based inference algorithm that needs very few iterations in practice and enjoys linear time complexity, with an even faster incremental variant. Experiments on two real world datasets show that our new method outperforms existing state-of-the-art approaches to the truth finding problem on categorical
data.
While in practice, numerical
data is not only ubiquitous but also of high value, e.g. price, weather, census, polls and economic statistics. Quality issues on numerical
data can also be even more common and severe than categorical
data due to its characteristics. Therefore, in this thesis we propose a new truth-finding method specially designed for handling numerical
data. Based on Bayesian probabilistic models, our method can leverage the characteristics of numerical
data in a principled way, when modeling the dependencies among source quality, truth, and claimed values. Experiments on two real world datasets show that our new method outperforms existing state-of-the-art approaches in both effectiveness and efficiency.
We further observe that modeling source quality not only can help decide the truth but also can help match entities across different sources. Therefore, as a natural next step, we integrate truth finding with entity matching so that we could infer matching of entities, true attributes of entities and source quality in a joint fashion. This is the first entity matching approach that involves modeling source quality and truth finding. Experiments show that our approach can outperform state-of-the-art baselines.
Advisors/Committee Members: Han, Jiawei (advisor), Han, Jiawei (Committee Chair), Zhai, ChengXiang (committee member), Roth, Dan (committee member), Yu, Philip S. (committee member).
Subjects/Keywords: data integration; truth finding; data fusion; data quality; entity matching; data mining; probabilistic graphical models
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Zhao, B. (2013). Truth finding in databases. (Doctoral Dissertation). University of Illinois – Urbana-Champaign. Retrieved from http://hdl.handle.net/2142/42470
Chicago Manual of Style (16th Edition):
Zhao, Bo. “Truth finding in databases.” 2013. Doctoral Dissertation, University of Illinois – Urbana-Champaign. Accessed January 17, 2021.
http://hdl.handle.net/2142/42470.
MLA Handbook (7th Edition):
Zhao, Bo. “Truth finding in databases.” 2013. Web. 17 Jan 2021.
Vancouver:
Zhao B. Truth finding in databases. [Internet] [Doctoral dissertation]. University of Illinois – Urbana-Champaign; 2013. [cited 2021 Jan 17].
Available from: http://hdl.handle.net/2142/42470.
Council of Science Editors:
Zhao B. Truth finding in databases. [Doctoral Dissertation]. University of Illinois – Urbana-Champaign; 2013. Available from: http://hdl.handle.net/2142/42470

Tulane University
24.
Qu, Zhe.
High-dimensional statistical data integration.
Degree: 2019, Tulane University
URL: https://digitallibrary.tulane.edu/islandora/object/tulane:106916
► [email protected]
Modern biomedical studies often collect multiple types of high-dimensional data on a common set of objects. A representative model for the integrative analysis of…
(more)
▼ [email protected]tulane.edu
Modern biomedical studies often collect multiple types of high-dimensional data on a common set of objects. A representative model for the integrative analysis of multiple data types is to decompose each data matrix into a low-rank common-source matrix generated by latent factors shared across all data types, a low-rank distinctive-source matrix corresponding to each data type, and an additive noise matrix. We propose a novel decomposition method, called the decomposition-based generalized canonical correlation analysis, which appropriately defines those matrices by imposing a desirable orthogonality constraint on distinctive latent factors that aims to sufficiently capture the common latent factors. To further delineate the common and distinctive patterns between two data types, we propose another new decomposition method, called the common and distinctive pattern analysis. This method takes into account the common and distinctive information between the coefficient matrices of the common latent factors. We develop consistent estimation approaches for both proposed decompositions under high-dimensional settings, and demonstrate their finite-sample performance via extensive simulations. We illustrate the superiority of proposed methods over the state of the arts by real-world data examples obtained from The Cancer Genome Atlas and Human Connectome Project.
1
Zhe Qu
Advisors/Committee Members: Hyman, James (Thesis advisor), School of Science & Engineering Mathematics (Degree granting institution).
Subjects/Keywords: High-dimensional data analysis; Data integration; Canonical correlation analysis
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Qu, Z. (2019). High-dimensional statistical data integration. (Thesis). Tulane University. Retrieved from https://digitallibrary.tulane.edu/islandora/object/tulane:106916
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
Chicago Manual of Style (16th Edition):
Qu, Zhe. “High-dimensional statistical data integration.” 2019. Thesis, Tulane University. Accessed January 17, 2021.
https://digitallibrary.tulane.edu/islandora/object/tulane:106916.
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
MLA Handbook (7th Edition):
Qu, Zhe. “High-dimensional statistical data integration.” 2019. Web. 17 Jan 2021.
Vancouver:
Qu Z. High-dimensional statistical data integration. [Internet] [Thesis]. Tulane University; 2019. [cited 2021 Jan 17].
Available from: https://digitallibrary.tulane.edu/islandora/object/tulane:106916.
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
Council of Science Editors:
Qu Z. High-dimensional statistical data integration. [Thesis]. Tulane University; 2019. Available from: https://digitallibrary.tulane.edu/islandora/object/tulane:106916
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
25.
Tomazela, Bruno.
MPPI: um modelo de procedência para subsidiar processos de integração.
Degree: Mestrado, Ciências de Computação e Matemática Computacional, 2010, University of São Paulo
URL: http://www.teses.usp.br/teses/disponiveis/55/55134/tde-15042010-143510/
;
► A procedência dos dados consiste no conjunto de metadados que possibilita identificar as fontes e os processos de transformação aplicados aos dados, desde a criação…
(more)
▼ A procedência dos dados consiste no conjunto de metadados que possibilita identificar as fontes e os processos de transformação aplicados aos dados, desde a criação até o estado atual desses dados. Existem diversas motivações para se incorporar a procedência ao processo de integração, tais como avaliar a qualidade dos dados das fontes heterogêneas, realizar processos de auditoria dos dados e de atribuição de autoria aos proprietários dos dados e reproduzir decisões de integração. Nesta dissertação é proposto o MPPI, um modelo de procedência para subsidiar processos de integração. O modelo enfoca sistemas nos quais as fontes de dados podem ser atualizadas somente pelos seus proprietários, impossibilitando que a integração retifique eventuais conflitos de dados diretamente nessas fontes. O principal requisito do MPPI é que ele ofereça suporte ao tratamento de todas as decisões de integração realizadas em processos anteriores, de forma que essas decisões possam ser reaplicadas automaticamente em processos de integração subsequentes. O modelo MPPI possui quatro características. A primeira delas consiste no mapeamento da procedência dos dados em operações de cópia, edição, inserção e remoção, e no armazenamento dessas operações em um repositório de operações. A segunda característica é o tratamento de operações de sobreposição, por meio da proposta das políticas blind, restrict, undo e redo. A terceira característica consiste na identificação de anomalias decorrentes do fato de que fontes de dados autônomas podem alterar os seus dados entre processos de integração, e na proposta de quatro tipos de validação das operações frente a essas anomalias: validação completa, da origem, do destino, ou nenhuma. A quarta característica consiste na reaplicação de operações, por meio da proposta dos métodos VRS (do inglês Validate and Reapply in Separate) e VRT (do inglês Validate and Reapply in Tandem) e da reordenação segura do repositório, os quais garantem que todas as decisões de integração tomadas pelo usuário em processos de integração anteriores sejam resolvidas automaticamente e da mesma forma em processos de integração subsequentes. A validação do modelo MPPI foi realizada por meio de testes de desempenho que investigaram o tratamento de operações de sobreposição, o método VRT e a reordenação segura, considerando como base as demais características do modelo. Os resultados obtidos mostraram a viabilidade de implementação das políticas propostas para tratamento de operações de sobreposição em sistemas de integração reais. Os resultados também mostraram que o método VRT proporcionou ganhos de desempenho significativos frente à coleta quando o objetivo é restabelecer resultados de processos de integração que já foram executados pelo menos uma vez. O ganho médio de desempenho do método VRT foi de pelo menos 93%. Ademais, os testes também mostraram que reordenar as operações antes da reaplicação pode melhorar ainda mais o desempenho do método VRT
Data provenance is the set of metadata that allows for the identification of…
Advisors/Committee Members: Ciferri, Cristina Dutra de Aguiar.
Subjects/Keywords: Data integration; Data provenance; Integração de dados; Procedência dos dados
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Tomazela, B. (2010). MPPI: um modelo de procedência para subsidiar processos de integração. (Masters Thesis). University of São Paulo. Retrieved from http://www.teses.usp.br/teses/disponiveis/55/55134/tde-15042010-143510/ ;
Chicago Manual of Style (16th Edition):
Tomazela, Bruno. “MPPI: um modelo de procedência para subsidiar processos de integração.” 2010. Masters Thesis, University of São Paulo. Accessed January 17, 2021.
http://www.teses.usp.br/teses/disponiveis/55/55134/tde-15042010-143510/ ;.
MLA Handbook (7th Edition):
Tomazela, Bruno. “MPPI: um modelo de procedência para subsidiar processos de integração.” 2010. Web. 17 Jan 2021.
Vancouver:
Tomazela B. MPPI: um modelo de procedência para subsidiar processos de integração. [Internet] [Masters thesis]. University of São Paulo; 2010. [cited 2021 Jan 17].
Available from: http://www.teses.usp.br/teses/disponiveis/55/55134/tde-15042010-143510/ ;.
Council of Science Editors:
Tomazela B. MPPI: um modelo de procedência para subsidiar processos de integração. [Masters Thesis]. University of São Paulo; 2010. Available from: http://www.teses.usp.br/teses/disponiveis/55/55134/tde-15042010-143510/ ;

Universidade Nova
26.
Grade, Nuno Daniel Gouveia de Sousa.
Data queries over heterogeneous sources.
Degree: 2013, Universidade Nova
URL: http://www.rcaap.pt/detail.jsp?id=oai:run.unl.pt:10362/10053
► Dissertação para obtenção do Grau de Mestre em Engenharia Informática
Enterprises typically have their data spread over many software systems, such as custom made applications,…
(more)
▼ Dissertação para obtenção do Grau de Mestre em
Engenharia Informática
Enterprises typically have their data spread over many software systems, such as
custom made applications, CRM systems like SalesForce, CMS systems, or ERP systems
like SAP. In these setting, it is often desired to integrate information from many data
sources to accomplish some business goal in an application. Data may be stored locally
or in the cloud in a wide variety of ways, demanding for explicit transformation processes
to be defined, reason why it is hard for developers to integrate it. Moreover, the amount
of external data can be large and the difference of efficiency between a smart and a naive
way of retrieving and filtering data from different locations can be great. Hence, it is
clear that developers would benefit greatly from language abstractions to help them build
queries over heterogeneous data sources and from an optimization process that avoids
large and unnecessary data transfers during the execution of queries.
This project was developed at OutSystems and aims at extending a real product, which
makes it even more challenging. We followed a generic approach that can be implemented
in any framework, not only focused on the product of OutSystems.
Advisors/Committee Members: Seco, João, Ferrão, Lúcio.
Subjects/Keywords: Data integration; Web services; SalesForce; SAP; Query optimization; Remote data sources
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Grade, N. D. G. d. S. (2013). Data queries over heterogeneous sources. (Thesis). Universidade Nova. Retrieved from http://www.rcaap.pt/detail.jsp?id=oai:run.unl.pt:10362/10053
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
Chicago Manual of Style (16th Edition):
Grade, Nuno Daniel Gouveia de Sousa. “Data queries over heterogeneous sources.” 2013. Thesis, Universidade Nova. Accessed January 17, 2021.
http://www.rcaap.pt/detail.jsp?id=oai:run.unl.pt:10362/10053.
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
MLA Handbook (7th Edition):
Grade, Nuno Daniel Gouveia de Sousa. “Data queries over heterogeneous sources.” 2013. Web. 17 Jan 2021.
Vancouver:
Grade NDGdS. Data queries over heterogeneous sources. [Internet] [Thesis]. Universidade Nova; 2013. [cited 2021 Jan 17].
Available from: http://www.rcaap.pt/detail.jsp?id=oai:run.unl.pt:10362/10053.
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
Council of Science Editors:
Grade NDGdS. Data queries over heterogeneous sources. [Thesis]. Universidade Nova; 2013. Available from: http://www.rcaap.pt/detail.jsp?id=oai:run.unl.pt:10362/10053
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation

University of Ottawa
27.
Mireku Kwakye, Michael.
A Practical Approach to Merging Multidimensional Data Models
.
Degree: 2011, University of Ottawa
URL: http://hdl.handle.net/10393/20457
► Schema merging is the process of incorporating data models into an integrated, consistent schema from which query solutions satisfying all incorporated models can be derived.…
(more)
▼ Schema merging is the process of incorporating data models into an integrated, consistent schema from which query solutions satisfying all incorporated models can be derived. The efficiency of such a process is reliant on the effective semantic representation of the chosen data models, as well as the mapping relationships between the elements of the source data models.
Consider a scenario where, as a result of company mergers or acquisitions, a number of related, but possible disparate data marts need to be integrated into a global data warehouse. The ability to retrieve data across these disparate, but related, data marts poses an important challenge. Intuitively, forming an all-inclusive data warehouse includes the tedious tasks of identifying related fact and dimension table attributes, as well as the design of a schema merge algorithm for the integration. Additionally, the evaluation of the combined set of correct answers to queries, likely to be independently posed to such data marts, becomes difficult to achieve.
Model management refers to a high-level, abstract programming language designed to efficiently manipulate schemas and mappings. Particularly, model management operations such as match, compose mappings, apply functions and merge, offer a way to handle the above-mentioned data integration problem within the domain of data warehousing.
In this research, we introduce a methodology for the integration of star schema source data marts into a single consolidated data warehouse based on model management. In our methodology, we discuss the development of three (3) main streamlined steps to facilitate the generation of a global data warehouse. That is, we adopt techniques for deriving attribute correspondences, and for schema mapping discovery. Finally, we formulate and design a merge algorithm, based on multidimensional star schemas; which is primarily the core contribution of this research. Our approach focuses on delivering a polynomial time solution needed for the expected volume of data and its associated large-scale query processing.
The experimental evaluation shows that an integrated schema, alongside instance data, can be derived based on the type of mappings adopted in the mapping discovery step. The adoption of Global-And-Local-As-View (GLAV) mapping models delivered a maximally-contained or exact representation of all fact and dimensional instance data tuples needed in query processing on the integrated data warehouse. Additionally, different forms of conflicts, such as semantic conflicts for related or unrelated dimension entities, and descriptive conflicts for differing attribute data types, were encountered and resolved in the developed solution. Finally, this research has highlighted some critical and inherent issues regarding functional dependencies in mapping models, integrity constraints at the source data marts, and multi-valued dimension attributes. These issues were encountered during the integration of the source data marts, as it has been the case of evaluating the queries processed…
Subjects/Keywords: Schema Merging;
Data Integration;
Model Management;
Data Warehousing
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Mireku Kwakye, M. (2011). A Practical Approach to Merging Multidimensional Data Models
. (Thesis). University of Ottawa. Retrieved from http://hdl.handle.net/10393/20457
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
Chicago Manual of Style (16th Edition):
Mireku Kwakye, Michael. “A Practical Approach to Merging Multidimensional Data Models
.” 2011. Thesis, University of Ottawa. Accessed January 17, 2021.
http://hdl.handle.net/10393/20457.
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
MLA Handbook (7th Edition):
Mireku Kwakye, Michael. “A Practical Approach to Merging Multidimensional Data Models
.” 2011. Web. 17 Jan 2021.
Vancouver:
Mireku Kwakye M. A Practical Approach to Merging Multidimensional Data Models
. [Internet] [Thesis]. University of Ottawa; 2011. [cited 2021 Jan 17].
Available from: http://hdl.handle.net/10393/20457.
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
Council of Science Editors:
Mireku Kwakye M. A Practical Approach to Merging Multidimensional Data Models
. [Thesis]. University of Ottawa; 2011. Available from: http://hdl.handle.net/10393/20457
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation

University of Ottawa
28.
Rahman, Md. Anisur.
Tabular Representation of Schema Mappings: Semantics and Algorithms
.
Degree: 2011, University of Ottawa
URL: http://hdl.handle.net/10393/20032
► Our thesis investigates a mechanism for representing schema mapping by tabular forms and checking utility of the new representation. Schema mapping is a high-level specification…
(more)
▼ Our thesis investigates a mechanism for representing schema mapping by tabular forms and checking utility of the new representation.
Schema mapping is a high-level specification that describes the relationship between two database schemas. Schema mappings constitute essential building blocks of data integration, data exchange and peer-to-peer data sharing systems. Global-and-local-as-view (GLAV) is one of the approaches for specifying the schema mappings. Tableaux are used for expressing queries and functional dependencies on a single database in a tabular form. In our thesis, we first introduce a tabular representation of GLAV mappings. We find that this tabular representation helps to solve many mapping-related algorithmic and semantic problems. For example, a well-known problem is to find the minimal instance of the target schema for a given instance of the source schema and a set of mappings between the source and the target schema. Second, we show that our proposed tabular mapping can be used as an operator on an instance of the source schema to produce an instance of the target schema which is `minimal' and `most general' in nature. There exists a tableaux-based mechanism for finding equivalence of two queries. Third, we extend that mechanism for deducing equivalence between two schema mappings using their corresponding tabular representations. Sometimes, there exist redundant conjuncts in a schema mapping which causes data exchange, data integration and data sharing operations more time consuming. Fourth, we present an algorithm that utilizes the tabular representations for reducing number of constraints in the schema mappings. At present, either schema-level mappings or data-level mappings are used for data sharing purposes. Fifth, we introduce and give the semantics of bi-level mapping that combines the schema-level and data-level mappings. We also show that bi-level mappings are more effective for data sharing systems. Finally, we implemented our algorithms and developed a software prototype to evaluate our proposed strategies.
Subjects/Keywords: Schema Mapping;
Tableaux;
Data Integration;
Data Exchange;
Optimization
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Rahman, M. A. (2011). Tabular Representation of Schema Mappings: Semantics and Algorithms
. (Thesis). University of Ottawa. Retrieved from http://hdl.handle.net/10393/20032
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
Chicago Manual of Style (16th Edition):
Rahman, Md Anisur. “Tabular Representation of Schema Mappings: Semantics and Algorithms
.” 2011. Thesis, University of Ottawa. Accessed January 17, 2021.
http://hdl.handle.net/10393/20032.
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
MLA Handbook (7th Edition):
Rahman, Md Anisur. “Tabular Representation of Schema Mappings: Semantics and Algorithms
.” 2011. Web. 17 Jan 2021.
Vancouver:
Rahman MA. Tabular Representation of Schema Mappings: Semantics and Algorithms
. [Internet] [Thesis]. University of Ottawa; 2011. [cited 2021 Jan 17].
Available from: http://hdl.handle.net/10393/20032.
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
Council of Science Editors:
Rahman MA. Tabular Representation of Schema Mappings: Semantics and Algorithms
. [Thesis]. University of Ottawa; 2011. Available from: http://hdl.handle.net/10393/20032
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation

University of Illinois – Chicago
29.
Liu, Cong.
Investigation of Feature Selection Methods in High-Throughput Omics Data Analysis.
Degree: 2017, University of Illinois – Chicago
URL: http://hdl.handle.net/10027/22136
► High-throughput technology, such as microarray and next generation sequencing has accelerated the identification of uncovered biomarkers and developing of novel diagnosis approach in precision medicine.…
(more)
▼ High-throughput technology, such as microarray and next generation sequencing has accelerated the identification of uncovered biomarkers and developing of novel diagnosis approach in precision medicine. Meanwhile, with the ability to measure tons of biomarkers simultaneously in one single experiment, collecting enough biological samples has become the bottleneck of
data accumulating. Feature selection is a common strategy tackle this ‘small n and large p’ scenario. Most of current feature selection methods are purely based on statistics theories. However, based on the experiences in analyzing high-throughput
data in various projects, I believe biological knowledge could play an important role in feature selection. Therefore, in this dissertation, I present computational investigations of the biological knowledge integrated feature selection methods when dealing with high-dimensional omics
data.
Firstly, I present two bioinformatics practices of analyzing high-throughput
data in biomedical researches including characterization of H3K27ac profile across different PM2.5 exposures, and investigation of batch stability in iPSC technology. Inspired by the experiences of biomedical research practices, I then design three biomedical knowledge integrated feature selection methods for high-dimensional omics
data analysis. (1) To integrate domain knowledge, I develop SKI, in which two ranks are generated before feature selection, one is based on marginal correlation from omics
data in hand, and another is external knowledge provided by domain experts, literatures or databases. By combining two ranks into a new rank, biomarkers are prescreened, and a further feature selection approach such as LASSO is performed. In a simulation study, I show SKI outperforms other methods without knowledge
integration. I then apply SKI in a gene expression dataset to predict drug-response in different cell lines. A higher prediction accuracy is achieved by using SKI method than regular LASSO-based method. (2) To integrate multi-omics
data, such as methylation and copy number variants, for survival
data analysis, I develop two methods SKI-Cox, and wLASSO-Cox. Cox regression is a common model for survival
data analysis. SKI-Cox prescreens genes based on different levels of omics
data, and further selects genes in a transcriptome-based Cox regression model. wLASSO-Cox puts the marginal utilities derived from Cox-regression model on other omics-
data as the penalty factors in a penalized Cox regression on mRNA expression. By simulation, I show two methods could select more true variables when analyzing omics based survival
data. And Better performance is achieved in terms of overall survival time predicting in glioblastoma and lung adenocarcinoma patients using TCGA dataset. (3) To integrate pathway or gene set information, my colleagues and I develop a redundancy removable pathway (RRP) based feature selection method for binary and multi-class classification problems. Both strategies in (1) and (2) have the limitation of considering the genes…
Advisors/Committee Members: Lu, Hui (advisor), Dai, Yang (committee member), Yang, Jie (committee member), Zhang, Kunpeng (committee member), Zhang, Wei (committee member), Lu, Hui (chair).
Subjects/Keywords: feature selection; high-throughput omics data; data integration
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Liu, C. (2017). Investigation of Feature Selection Methods in High-Throughput Omics Data Analysis. (Thesis). University of Illinois – Chicago. Retrieved from http://hdl.handle.net/10027/22136
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
Chicago Manual of Style (16th Edition):
Liu, Cong. “Investigation of Feature Selection Methods in High-Throughput Omics Data Analysis.” 2017. Thesis, University of Illinois – Chicago. Accessed January 17, 2021.
http://hdl.handle.net/10027/22136.
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
MLA Handbook (7th Edition):
Liu, Cong. “Investigation of Feature Selection Methods in High-Throughput Omics Data Analysis.” 2017. Web. 17 Jan 2021.
Vancouver:
Liu C. Investigation of Feature Selection Methods in High-Throughput Omics Data Analysis. [Internet] [Thesis]. University of Illinois – Chicago; 2017. [cited 2021 Jan 17].
Available from: http://hdl.handle.net/10027/22136.
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
Council of Science Editors:
Liu C. Investigation of Feature Selection Methods in High-Throughput Omics Data Analysis. [Thesis]. University of Illinois – Chicago; 2017. Available from: http://hdl.handle.net/10027/22136
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation

Iowa State University
30.
Tsai, Hsine-jen.
A spatial mediator model for integrating heterogeneous spatial data.
Degree: 2011, Iowa State University
URL: https://lib.dr.iastate.edu/etd/10285
► The complexity and richness of geospatial data create specific problems in heterogeneous data integration. To deal with this type of data integration, we propose a…
(more)
▼ The complexity and richness of geospatial data create specific problems in heterogeneous data integration. To deal with this type of data integration, we propose a spatial mediator embedded in a large distributed mobile environment (GeoGrid). The spatial mediator takes a user request from a field application and uses the request to select the appropriate data sources, constructs subqueries for the selected data sources, defines the process of combining the results from the subqueries, and develop an integration script that controls the integration process in order to respond to the request. The spatial mediator uses ontologies to support search for both geographic location based on symbolic terms as well as providing a term-based index to spatial data sources based on the relational model. In our approach, application designers only need to be aware of a minimum amount about the queries needed to supply users with the required data. The key part of this research has been the development of the spatial mediator that can dynamically respond to requests within the GeoGrid environment for geographic maps and related relational spatial data.
Subjects/Keywords: data integration; geospatial data; ontology; spatial mediation; Computer Sciences
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Tsai, H. (2011). A spatial mediator model for integrating heterogeneous spatial data. (Thesis). Iowa State University. Retrieved from https://lib.dr.iastate.edu/etd/10285
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
Chicago Manual of Style (16th Edition):
Tsai, Hsine-jen. “A spatial mediator model for integrating heterogeneous spatial data.” 2011. Thesis, Iowa State University. Accessed January 17, 2021.
https://lib.dr.iastate.edu/etd/10285.
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
MLA Handbook (7th Edition):
Tsai, Hsine-jen. “A spatial mediator model for integrating heterogeneous spatial data.” 2011. Web. 17 Jan 2021.
Vancouver:
Tsai H. A spatial mediator model for integrating heterogeneous spatial data. [Internet] [Thesis]. Iowa State University; 2011. [cited 2021 Jan 17].
Available from: https://lib.dr.iastate.edu/etd/10285.
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
Council of Science Editors:
Tsai H. A spatial mediator model for integrating heterogeneous spatial data. [Thesis]. Iowa State University; 2011. Available from: https://lib.dr.iastate.edu/etd/10285
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
◁ [1] [2] [3] [4] [5] … [21] ▶
.