You searched for subject:(big data)
.
Showing records 1 – 30 of
1751 total matches.
◁ [1] [2] [3] [4] [5] … [59] ▶

San Jose State University
1.
Desai, Khushali Yashodhar.
Big Data Quality Modeling And Validation.
Degree: MS, Computer Engineering, 2018, San Jose State University
URL: https://doi.org/10.31979/etd.c68w-98uf
;
https://scholarworks.sjsu.edu/etd_theses/4898
► The chief purpose of this study is to characterize various big data quality models and to validate each with an example. As the volume…
(more)
▼ The chief purpose of this study is to characterize various big data quality models and to validate each with an example. As the volume of data is increasing at an exponential speed in the era of the broadband Internet, the success of a product or decision largely depends upon selecting the highest quality raw materials, or data, to be used in production. However, working with data in high volumes, fast velocities, and various formats can be fraught with problems. Therefore, software industries need a quality check, especially for data being generated by either software or a sensor. This study explores various big data quality parameters and their definitions and proposes a quality model for each parameter. By using data from the Water Quality U. S. Geological Survey (USGS), San Francisco Bay, an example for each of the proposed big data quality models is given. To calculate composite data quality, prevalent methods such as Monte Carlo and neural networks were used. This thesis proposes eight big data quality parameters in total. Six out of eight of those models were coded and made into a final year project by a group of Master’s degree students at SJSU. A case study is carried out using linear regression analysis, and all the big data quality parameters are validated with positive results.
Subjects/Keywords: Big Data; Big Data Quality
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Desai, K. Y. (2018). Big Data Quality Modeling And Validation. (Masters Thesis). San Jose State University. Retrieved from https://doi.org/10.31979/etd.c68w-98uf ; https://scholarworks.sjsu.edu/etd_theses/4898
Chicago Manual of Style (16th Edition):
Desai, Khushali Yashodhar. “Big Data Quality Modeling And Validation.” 2018. Masters Thesis, San Jose State University. Accessed March 01, 2021.
https://doi.org/10.31979/etd.c68w-98uf ; https://scholarworks.sjsu.edu/etd_theses/4898.
MLA Handbook (7th Edition):
Desai, Khushali Yashodhar. “Big Data Quality Modeling And Validation.” 2018. Web. 01 Mar 2021.
Vancouver:
Desai KY. Big Data Quality Modeling And Validation. [Internet] [Masters thesis]. San Jose State University; 2018. [cited 2021 Mar 01].
Available from: https://doi.org/10.31979/etd.c68w-98uf ; https://scholarworks.sjsu.edu/etd_theses/4898.
Council of Science Editors:
Desai KY. Big Data Quality Modeling And Validation. [Masters Thesis]. San Jose State University; 2018. Available from: https://doi.org/10.31979/etd.c68w-98uf ; https://scholarworks.sjsu.edu/etd_theses/4898
2.
Zgraggen, Emanuel Albert Errol.
Towards Accessible Data Analysis.
Degree: Department of Computer Science, 2018, Brown University
URL: https://repository.library.brown.edu/studio/item/bdr:792684/
► In today's world data is ubiquitous. Increasingly large and complex datasets are gathered across many domains. Data analysis - making sense of all this data…
(more)
▼ In today's world
data is ubiquitous. Increasingly
large and complex datasets are gathered across many domains.
Data
analysis - making sense of all this
data - is exploratory by
nature, demanding rapid iterations, and all but the simplest
analysis tasks require humans in the loop to effectively steer the
process. Current tools that support this process are built for an
elite set of individuals: highly trained analysts or
data
scientists who have strong mathematics and computer science skills.
This however presents a bottleneck. Qualified
data scientists are
scarce and expensive which makes it often unfeasible to inform
decisions with
data. How do we empower
data enthusiasts,
stakeholders or
subject matter experts, who are not statisticians
or programmers, to directly tease out insights from
data? This
thesis presents work towards making
data analysis more accessible.
We invent a set of user experiences with approachable visual
metaphors where building blocks are directly manipulatable and
incrementally composable to support common
data analysis tasks at
the pace that matches the thought process of a humans. First, we
develop a system for back-of-the-envelope calculations that
revolves around handwriting recognition - all
data is represented
as digital ink - and gestural commands. Second, we introduce a
novel pen & touch system for
data exploration and analysis
which is based on four core interaction concepts. The combination
and interplay between those concepts supports a wide range of
common analytical tasks. The interface allows for incremental and
piecewise query specification where intermediate visualizations
serve as feedback as well as interactive handles to adjust query
parameters. Third, we present a visual query interface for event
sequence
data. This touch-based interface exposes the full
expressive power of regular expressions in an approachable way and
interleaves query specification with result visualizations. Fourth,
we present the results of an experiment where we analyze how
progressive visualizations affect exploratory analysis. Based on
these results, which suggest that progressive visualizations are a
viable solution to achieve scalability in
data exploration systems,
we develop a system entirely based on progressive computation that
allows users to interactively build complex analytics workflows.
And finally, we discuss and experimentally show that using visual
analysis tools might inflate false discovery rates among
user-extracted insights and suggest ways of ameliorating this
problem.
Advisors/Committee Members: van Dam, Andries (Advisor), Kraska, Tim (Reader), Drucker, Steven M. (Reader).
Subjects/Keywords: Big data
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Zgraggen, E. A. E. (2018). Towards Accessible Data Analysis. (Thesis). Brown University. Retrieved from https://repository.library.brown.edu/studio/item/bdr:792684/
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
Chicago Manual of Style (16th Edition):
Zgraggen, Emanuel Albert Errol. “Towards Accessible Data Analysis.” 2018. Thesis, Brown University. Accessed March 01, 2021.
https://repository.library.brown.edu/studio/item/bdr:792684/.
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
MLA Handbook (7th Edition):
Zgraggen, Emanuel Albert Errol. “Towards Accessible Data Analysis.” 2018. Web. 01 Mar 2021.
Vancouver:
Zgraggen EAE. Towards Accessible Data Analysis. [Internet] [Thesis]. Brown University; 2018. [cited 2021 Mar 01].
Available from: https://repository.library.brown.edu/studio/item/bdr:792684/.
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
Council of Science Editors:
Zgraggen EAE. Towards Accessible Data Analysis. [Thesis]. Brown University; 2018. Available from: https://repository.library.brown.edu/studio/item/bdr:792684/
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation

Wake Forest University
3.
Bowling, Roy Nathaniel.
Big Data and the Rhetorical Narrative.
Degree: 2014, Wake Forest University
URL: http://hdl.handle.net/10339/47448
► This project effectively illustrates a tactic by which the constructs of narrative inquiry from a humanist perspective, in particular the rhetorical narrative tradition, can migrate…
(more)
▼ This project effectively illustrates a tactic by which the constructs of narrative inquiry from a humanist perspective, in particular the rhetorical narrative tradition, can migrate into a larger methodology while simultaneously recognizing and training agents in narrative visualization via up-to-date computational tools. The Reddit platform in particular, served as a suitable illustration for a multifaceted approach to novel methods in narrative inquiry due to the free accessibility to online storytelling in addition to a substantial collection of unstructured data it offers. This permitted an effective exploration through means of analyzing mediated narratives while concurrently using computational methods to assemble, filter, and interpret "Reddit narratives." The project progresses in two parts. First I offer a model for contemporary rhetorical narrative analysis that embraces social media as a viable source of user-generated narrative data. The second half of the project illustrates a data analysis template that employs a rhetorical lens for the creation of narrative maps. Collectively this project proposes a model for continued rhetorical narrative inquiry that intersects at traditional qualitative analysis and the contemporary deployment of textual analytic software.
Subjects/Keywords: Big Data
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Bowling, R. N. (2014). Big Data and the Rhetorical Narrative. (Thesis). Wake Forest University. Retrieved from http://hdl.handle.net/10339/47448
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
Chicago Manual of Style (16th Edition):
Bowling, Roy Nathaniel. “Big Data and the Rhetorical Narrative.” 2014. Thesis, Wake Forest University. Accessed March 01, 2021.
http://hdl.handle.net/10339/47448.
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
MLA Handbook (7th Edition):
Bowling, Roy Nathaniel. “Big Data and the Rhetorical Narrative.” 2014. Web. 01 Mar 2021.
Vancouver:
Bowling RN. Big Data and the Rhetorical Narrative. [Internet] [Thesis]. Wake Forest University; 2014. [cited 2021 Mar 01].
Available from: http://hdl.handle.net/10339/47448.
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
Council of Science Editors:
Bowling RN. Big Data and the Rhetorical Narrative. [Thesis]. Wake Forest University; 2014. Available from: http://hdl.handle.net/10339/47448
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation

California State Polytechnic University – Pomona
4.
Singh, Darvesh Pari.
Big Data - Performance Analysis Of Hadoop.
Degree: MS, Computer Science, 2017, California State Polytechnic University – Pomona
URL: http://hdl.handle.net/10211.3/189329
► Clustering data stream is an important branch of mining data stream. Due to the dynamic nature of the data stream and large, traditional data mining…
(more)
▼ Clustering
data stream is an important branch of mining
data stream. Due to the dynamic nature of the
data stream and large, traditional
data mining algorithms cannot meet the requirements of the appropriate value and cluster number of online assessment. Focus on
data stream technology is designed to scan
data sets, and to maintain effective memory
data structures incrementally, which is smaller than the size of the entire
data set. Gaussian mixture model is proposed, called an extension method and genetic algorithm based on Gaussian mixture model text features new mining method. The method based on probability density
data stream clustering, which requires only the newly arrived
data, rather than the entire historical
data. GMMGA algorithm is used to determine the number of parameters and each Gaussian component clusters by Gaussian random genetic algorithms split and merge operations. In this algorithm, the robustness and accuracy of the number of clusters get increased, it can also save memory and run time.
Big data analysis requires
data mining and machine learning technology. This is a novel method for large
data analysis, which is very fast, scalable, and high precision. It tries to nil by a fixed number of iterations k-means to overcome the uncertainty of the number of iterations, without losing the accuracy of the work proposed. Therefore, we propose a method to find the center of mass of the linked document clustering using
data and links as well as the neighborhood Gaussian clustering algorithms with different characteristics. It was completed in the simulation environment Hadoop.
Advisors/Committee Members: Young, Gilbert (advisor), Ji, Hao (committee member).
Subjects/Keywords: big data
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Singh, D. P. (2017). Big Data - Performance Analysis Of Hadoop. (Masters Thesis). California State Polytechnic University – Pomona. Retrieved from http://hdl.handle.net/10211.3/189329
Chicago Manual of Style (16th Edition):
Singh, Darvesh Pari. “Big Data - Performance Analysis Of Hadoop.” 2017. Masters Thesis, California State Polytechnic University – Pomona. Accessed March 01, 2021.
http://hdl.handle.net/10211.3/189329.
MLA Handbook (7th Edition):
Singh, Darvesh Pari. “Big Data - Performance Analysis Of Hadoop.” 2017. Web. 01 Mar 2021.
Vancouver:
Singh DP. Big Data - Performance Analysis Of Hadoop. [Internet] [Masters thesis]. California State Polytechnic University – Pomona; 2017. [cited 2021 Mar 01].
Available from: http://hdl.handle.net/10211.3/189329.
Council of Science Editors:
Singh DP. Big Data - Performance Analysis Of Hadoop. [Masters Thesis]. California State Polytechnic University – Pomona; 2017. Available from: http://hdl.handle.net/10211.3/189329

University of KwaZulu-Natal
5.
Vela Vela, Junior.
The employees’ perception on the adoption of big data analytics by selected medical aid organisations in Durban.
Degree: 2017, University of KwaZulu-Natal
URL: http://hdl.handle.net/10413/15171
► The increase of number of data available in today’s world has prompted different industries to find a way to get the value out of the…
(more)
▼ The increase of number of
data available in today’s world has prompted different industries to find a way to get the value out of the
data available.
Big data analytics is a term used to describe the analysis of the enormous amount of
data. Therefore, practitioners and researchers are trying to understand the adoption of this new technology by companies, government, universities.
Big data analytics has been used by some medical aid companies to improve the quality of schemes and products provided to clients by collecting, analysing accurate
data. However, the rate of acceptance and use of
big data analytics by medical aids organisations in South Africa is still unknown. In this dissertation, we discuss the employees’ perceptions on the adoption of
big data analytics by medical aid organizations in Durban. The benefits and challenges of
big data analytics in medical aid organizations was also discussed.
A conceptual framework was developed to structure the problem being investigated in this dissertation. To this end, five perceived factors that might influence the employees’ perception on the adoption of
big data analytics were examined: - perceived performance expectancy, - perceive price value, - perceived social influence, - perceived facilitating conditions, - perceived characteristic of Innovation.
A survey research was used as a research strategy. An exploratory nature of the study was chosen. Thus, there is no conclusive outcomes in this dissertation. Results show that generally employees have a positive perception on the adoption of
big data analytics. Constructs such as perceived performance expectancy, perceived price value, and the perceived characteristics of innovation proved to be influencing the employees’ attitudes towards the adoption of
big data analytics.
Advisors/Committee Members: Subramaniam, Prabhakar Rontala. (advisor).
Subjects/Keywords: Big data analysis.; Technology.; Big data.; Adoption.
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Vela Vela, J. (2017). The employees’ perception on the adoption of big data analytics by selected medical aid organisations in Durban. (Thesis). University of KwaZulu-Natal. Retrieved from http://hdl.handle.net/10413/15171
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
Chicago Manual of Style (16th Edition):
Vela Vela, Junior. “The employees’ perception on the adoption of big data analytics by selected medical aid organisations in Durban.” 2017. Thesis, University of KwaZulu-Natal. Accessed March 01, 2021.
http://hdl.handle.net/10413/15171.
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
MLA Handbook (7th Edition):
Vela Vela, Junior. “The employees’ perception on the adoption of big data analytics by selected medical aid organisations in Durban.” 2017. Web. 01 Mar 2021.
Vancouver:
Vela Vela J. The employees’ perception on the adoption of big data analytics by selected medical aid organisations in Durban. [Internet] [Thesis]. University of KwaZulu-Natal; 2017. [cited 2021 Mar 01].
Available from: http://hdl.handle.net/10413/15171.
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
Council of Science Editors:
Vela Vela J. The employees’ perception on the adoption of big data analytics by selected medical aid organisations in Durban. [Thesis]. University of KwaZulu-Natal; 2017. Available from: http://hdl.handle.net/10413/15171
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation

Delft University of Technology
6.
Gloudemans, T.W. (author).
Aircraft Performance Parameter Estimation using Global ADS-B and Open Data.
Degree: 2016, Delft University of Technology
URL: http://resolver.tudelft.nl/uuid:6b210945-7004-430e-884e-72b27a7f5acc
► To enable low cost open source ATM simulations the University of Technology Delft is developing an open source ATM simulator Bluesky. A method was developed…
(more)
▼ To enable low cost open source ATM simulations the University of Technology Delft is developing an open source ATM simulator Bluesky. A method was developed to identify aircraft performance parameters using ADS-B and other open sources of data. The goal is to determine the operational flight envelope and get estimates for the lift- and drag coefficients. The method streams global ADS-B data from Flightradar24. By making assumptions on wind and flight strategies, estimation can be obtained for aircraft parameters. The nature of these assumptions limit the aircraft types being analyzed to commercial air-craft only. The method measures the operational flight envelope and estimates weight, lift and drag coefficients for multiple phases in flight. Next the operational flight envelope was compared to open data. Here it was found that the estimations showed similar values as the open data and that the operational flight envelope can be estimated using the method. The drag polar was compared to BADA, which showed a consistent underestimation of the drag polar.
Aerospace Engineering
Control and Simulation
Advisors/Committee Members: Hoekstra, J.M. (mentor), Ellerbroek, J. (mentor), Sun, J. (mentor).
Subjects/Keywords: ADSB; Big Data
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Gloudemans, T. W. (. (2016). Aircraft Performance Parameter Estimation using Global ADS-B and Open Data. (Masters Thesis). Delft University of Technology. Retrieved from http://resolver.tudelft.nl/uuid:6b210945-7004-430e-884e-72b27a7f5acc
Chicago Manual of Style (16th Edition):
Gloudemans, T W (author). “Aircraft Performance Parameter Estimation using Global ADS-B and Open Data.” 2016. Masters Thesis, Delft University of Technology. Accessed March 01, 2021.
http://resolver.tudelft.nl/uuid:6b210945-7004-430e-884e-72b27a7f5acc.
MLA Handbook (7th Edition):
Gloudemans, T W (author). “Aircraft Performance Parameter Estimation using Global ADS-B and Open Data.” 2016. Web. 01 Mar 2021.
Vancouver:
Gloudemans TW(. Aircraft Performance Parameter Estimation using Global ADS-B and Open Data. [Internet] [Masters thesis]. Delft University of Technology; 2016. [cited 2021 Mar 01].
Available from: http://resolver.tudelft.nl/uuid:6b210945-7004-430e-884e-72b27a7f5acc.
Council of Science Editors:
Gloudemans TW(. Aircraft Performance Parameter Estimation using Global ADS-B and Open Data. [Masters Thesis]. Delft University of Technology; 2016. Available from: http://resolver.tudelft.nl/uuid:6b210945-7004-430e-884e-72b27a7f5acc

Universidade do Minho
7.
Torres, Hugo Miguel Oliveira.
Benchmarking de tecnologias de Big Data aplicadas à saúde-medicina
.
Degree: 2017, Universidade do Minho
URL: http://hdl.handle.net/1822/54930
► Os avanços tecnológicos observados nas últimas décadas levaram a um aumento no volume e variedade dos dados gerados. Esses dados, quando armazenados, processados e analisados,…
(more)
▼ Os avanços tecnológicos observados nas últimas décadas levaram a um aumento no volume e variedade
dos dados gerados. Esses dados, quando armazenados, processados e analisados, podem fornecer novo
conhecimento e uma maior perceção do negócio, o que pode ajudar as organizações a obter vantagem
sobre os seus concorrentes. Está provado que
Big Data está relacionado com um aumento na eficiência
e eficácia em várias áreas. Embora muitos estudos tenham sido realizados com o intuito de provar o
valor do
Big Data na saúde/medicina, não existem muitos avanços efetuados na prática.
Pretende-se facilitar a adoção de tecnologias
Big Data na medicina e em organizações ligadas à saúde.
Vai ser discutido o potencial e os desafios na adoção de
Big Data, comparando várias tecnologias
Big
Data (Benchmarking) que foram utilizadas ou projetadas para ser aplicadas na área da saúde.
Nesta dissertação foi realizada uma análise às tecnologias
Big Data já existentes e aplicadas na área da
saúde. Foi analisada uma solução
Big Data desenvolvida para o projeto INTCare, uma solução baseada
em Hadoop proposta para o hospital Maharaja Yeshwatrao situado na Índia e também uma solução que
utiliza o Apache Spark. As três soluções mencionadas assentam em tecnologias open source. A solução
IBM PureData Solution For Healthcare Analytics utilizada no Seattle’s Children’s Hospital e a solução
Cisco Connected Health Solutions and Services fazem parte das soluções proprietárias analisadas.
O inicio deste documento apresenta uma descrição sucinta do contexto do projeto, qual a motivação e o
objetivo deste trabalho. Posteriormente é apresentado o Estado da Arte onde são explicados os vários
tópicos relacionados com o que foi feito e estudado. De seguida são apresentados os objetivos e as
abordagens metodológicas utlizadas e de que forma foram aplicadas no desenvolvimento desta
dissertação. Posteriormente são apresentadas várias ferramentas que integram as soluções encontradas.
Na secção seguinte são apresentadas as várias experiências que comparam as soluções escolhidas para
o benchmarking. Por fim, é apresentada a discussão dos resultados obtidos e sugestões que podem ser
adotadas para reduzir o risco de adoção de tecnologias
Big Data na área da saúde.
Advisors/Committee Members: Santos, Manuel (advisor), Portela, Filipe (advisor).
Subjects/Keywords: Big data;
Big data technologies;
Big data in healthcare;
Benchmarking
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Torres, H. M. O. (2017). Benchmarking de tecnologias de Big Data aplicadas à saúde-medicina
. (Masters Thesis). Universidade do Minho. Retrieved from http://hdl.handle.net/1822/54930
Chicago Manual of Style (16th Edition):
Torres, Hugo Miguel Oliveira. “Benchmarking de tecnologias de Big Data aplicadas à saúde-medicina
.” 2017. Masters Thesis, Universidade do Minho. Accessed March 01, 2021.
http://hdl.handle.net/1822/54930.
MLA Handbook (7th Edition):
Torres, Hugo Miguel Oliveira. “Benchmarking de tecnologias de Big Data aplicadas à saúde-medicina
.” 2017. Web. 01 Mar 2021.
Vancouver:
Torres HMO. Benchmarking de tecnologias de Big Data aplicadas à saúde-medicina
. [Internet] [Masters thesis]. Universidade do Minho; 2017. [cited 2021 Mar 01].
Available from: http://hdl.handle.net/1822/54930.
Council of Science Editors:
Torres HMO. Benchmarking de tecnologias de Big Data aplicadas à saúde-medicina
. [Masters Thesis]. Universidade do Minho; 2017. Available from: http://hdl.handle.net/1822/54930

Delft University of Technology
8.
Verheij, B.A. (author).
The process of big data solution adoption.
Degree: 2013, Delft University of Technology
URL: http://resolver.tudelft.nl/uuid:369986f0-dd24-4855-aa85-8f69a3325191
► This research concerned the process of big data solution adoption and the main issues that firms experience in this process. Using eight cases within the…
(more)
▼ This research concerned the process of big data solution adoption and the main issues that firms experience in this process. Using eight cases within the Dutch telecommunication and energy utility sectors, a conceptual model was constructed of issues in the process of big data solution adoption. Twelve main issues were identified in the process of big data solution adoption which were discussed and led to a number of research implications.
Management of Technology
Technology, Strategy & Entrepreneurship
Technology, Policy and Management
Advisors/Committee Members: Rook, L. (mentor), Van den Berg, J. (mentor), Van Beers, C. (mentor).
Subjects/Keywords: big data; big data solution; NoSQL; big data solution adoption
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Verheij, B. A. (. (2013). The process of big data solution adoption. (Masters Thesis). Delft University of Technology. Retrieved from http://resolver.tudelft.nl/uuid:369986f0-dd24-4855-aa85-8f69a3325191
Chicago Manual of Style (16th Edition):
Verheij, B A (author). “The process of big data solution adoption.” 2013. Masters Thesis, Delft University of Technology. Accessed March 01, 2021.
http://resolver.tudelft.nl/uuid:369986f0-dd24-4855-aa85-8f69a3325191.
MLA Handbook (7th Edition):
Verheij, B A (author). “The process of big data solution adoption.” 2013. Web. 01 Mar 2021.
Vancouver:
Verheij BA(. The process of big data solution adoption. [Internet] [Masters thesis]. Delft University of Technology; 2013. [cited 2021 Mar 01].
Available from: http://resolver.tudelft.nl/uuid:369986f0-dd24-4855-aa85-8f69a3325191.
Council of Science Editors:
Verheij BA(. The process of big data solution adoption. [Masters Thesis]. Delft University of Technology; 2013. Available from: http://resolver.tudelft.nl/uuid:369986f0-dd24-4855-aa85-8f69a3325191

University of Saskatchewan
9.
Turland, Madeline G 1993-.
Farmers’ willingness to participate in a big data sharing program: A study of Saskatchewan grain farmers.
Degree: 2018, University of Saskatchewan
URL: http://hdl.handle.net/10388/11053
► Big data in crop agriculture is information collected by sophisticated machinery at the farm level, as well as externally generated data, such as field satellite…
(more)
▼ Big data in crop agriculture is information collected by sophisticated machinery at the farm level, as well as externally generated
data, such as field satellite imagery. Although some of this
data is useful to individual farmers, much of it has little value to the farmer that collects it. Capturing the true value of
big data comes when it is aggregated over many farms, allowing researchers to find underlying bio-physical and economical relationships.
We conduct a hypothetical choice experiment to analyze farmers’ willingness to share
data by asking farmers in Saskatchewan whether they would participate in a
big data sharing program. The choice tasks varied the type of organization that operated the
big data program and included financial and non-financial incentives.
Heteroscedastic and random effects probit models are presented using the
data from the survey. The results are consistent across models and find that farmers are most willing to share their
data with university researchers, followed by crop input suppliers or grower associations, and financial institutions or equipment manufacturers. Farmers are least willing to share their
data with government. Farmers are more willing to share
data in the presence of a financial incentive or non-financial incentive such as comparative benchmark statistics or prescription maps generated from the
data submitted.
Checks for robustness and heterogeneity indicate there is no self-selection bias into the survey, and no heterogeneity in the results for financial incentive and farm revenue. A latent class logit model determines the farmer population may be heterogenous in their willingness to participate in a
big data sharing program, but homogenous in their ordering of preferences for organization, financial incentive, and non-financial incentive. In addition, demographic variables are not related to class membership.
Advisors/Committee Members: Slade, Peter, Micheels, Eric, Smyth, Stuart, Gray, Richard, Skolrud, Tristan, McDonald, Jill.
Subjects/Keywords: big data; data sharing
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Turland, M. G. 1. (2018). Farmers’ willingness to participate in a big data sharing program: A study of Saskatchewan grain farmers. (Thesis). University of Saskatchewan. Retrieved from http://hdl.handle.net/10388/11053
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
Chicago Manual of Style (16th Edition):
Turland, Madeline G 1993-. “Farmers’ willingness to participate in a big data sharing program: A study of Saskatchewan grain farmers.” 2018. Thesis, University of Saskatchewan. Accessed March 01, 2021.
http://hdl.handle.net/10388/11053.
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
MLA Handbook (7th Edition):
Turland, Madeline G 1993-. “Farmers’ willingness to participate in a big data sharing program: A study of Saskatchewan grain farmers.” 2018. Web. 01 Mar 2021.
Vancouver:
Turland MG1. Farmers’ willingness to participate in a big data sharing program: A study of Saskatchewan grain farmers. [Internet] [Thesis]. University of Saskatchewan; 2018. [cited 2021 Mar 01].
Available from: http://hdl.handle.net/10388/11053.
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
Council of Science Editors:
Turland MG1. Farmers’ willingness to participate in a big data sharing program: A study of Saskatchewan grain farmers. [Thesis]. University of Saskatchewan; 2018. Available from: http://hdl.handle.net/10388/11053
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation

Montana State University
10.
Ganesan Pillai, Karthik.
Mining spatiotemporal co-occurrence patterns from massive data sets with evolving regions.
Degree: PhD, College of Engineering, 2014, Montana State University
URL: https://scholarworks.montana.edu/xmlui/handle/1/9422
► Due to the current rates of data acquisition, the growth of data volumes in nearly all domains of our lives is reaching historic proportions [5],…
(more)
▼ Due to the current rates of
data acquisition, the growth of
data volumes in nearly all domains of our lives is reaching historic proportions [5], [6], [7]. Spatiotemporal
data mining has emerged in recent decades with the main goal focused on developing
data-driven mechanisms for the understanding of the spatiotemporal characteristics and patterns occurring in the massive repositories of
data. This work focuses on discovering spatiotemporal co-occurrence patterns (STCOPs) from large
data sets with evolving regions. Spatiotemporal co-occurrence patterns represent the subset of event types that occur together in both space and time. Major limitations of existing spatiotemporal
data mining models and techniques include the following. First, they do not take into account continuously evolving spatiotemporal events that have polygon-like representations. Second, they do not investigate and provide sufficient interest measures for the STCOPs discovery purposes. Third, computationally and storage efficient algorithms to discover STCOPs are missing. These limitations of existing approaches represent important hurdles while analyzing massive spatiotemporal
data sets in several application domains that generate
big data, including solar physics, which is an application of our interdisciplinary research. In this work, we address these limitations by i) introducing the problem of mining STCOPs from
data sets with extended (region-based) spatial representations that evolve over time, ii) developing a set of novel interest measures, and iii) providing a novel framework to model STCOPs. We also present and investigate three novel approaches to STCOPs mining. We follow this investigation by applying our algorithm to perform a novel
data-driven discovery of STCOPs from solar physics
data.
Advisors/Committee Members: Chairperson, Graduate Committee: John Paxton, Rafal A. AngryK (co-chair) (advisor).
Subjects/Keywords: Data mining.; Big data.
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Ganesan Pillai, K. (2014). Mining spatiotemporal co-occurrence patterns from massive data sets with evolving regions. (Doctoral Dissertation). Montana State University. Retrieved from https://scholarworks.montana.edu/xmlui/handle/1/9422
Chicago Manual of Style (16th Edition):
Ganesan Pillai, Karthik. “Mining spatiotemporal co-occurrence patterns from massive data sets with evolving regions.” 2014. Doctoral Dissertation, Montana State University. Accessed March 01, 2021.
https://scholarworks.montana.edu/xmlui/handle/1/9422.
MLA Handbook (7th Edition):
Ganesan Pillai, Karthik. “Mining spatiotemporal co-occurrence patterns from massive data sets with evolving regions.” 2014. Web. 01 Mar 2021.
Vancouver:
Ganesan Pillai K. Mining spatiotemporal co-occurrence patterns from massive data sets with evolving regions. [Internet] [Doctoral dissertation]. Montana State University; 2014. [cited 2021 Mar 01].
Available from: https://scholarworks.montana.edu/xmlui/handle/1/9422.
Council of Science Editors:
Ganesan Pillai K. Mining spatiotemporal co-occurrence patterns from massive data sets with evolving regions. [Doctoral Dissertation]. Montana State University; 2014. Available from: https://scholarworks.montana.edu/xmlui/handle/1/9422

University of Hawaii – Manoa
11.
Kang, Qiuling.
Sentiment analysis of big social data with Apache Hadoop.
Degree: 2015, University of Hawaii – Manoa
URL: http://hdl.handle.net/10125/101227
► M.S. University of Hawaii at Manoa 2014.
Twitter is a microblog service and is a very popular communication mechanism. Users of Twitter express their interests,…
(more)
▼ M.S. University of Hawaii at Manoa 2014.
Twitter is a microblog service and is a very popular communication mechanism. Users of Twitter express their interests, favorites, and sentiments towards various topics and issues they encountered in daily life, therefore, Twitter is an important online platform for people to express their opinions which is a key fact to influence their behaviors. Thus, sentiment analysis for Twitter data is meaningful for both individuals and organizations to make decisions. Due to the huge amount of data generated by Twitter every day, a system which can store and process big data is becoming a problem. In this study, we present a method to collect Twitter data sets, and store and analyze the data sets on Hadoop platform. The experiment results prove that the present method performs efficient.
Subjects/Keywords: Twitter data sets; big data
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Kang, Q. (2015). Sentiment analysis of big social data with Apache Hadoop. (Thesis). University of Hawaii – Manoa. Retrieved from http://hdl.handle.net/10125/101227
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
Chicago Manual of Style (16th Edition):
Kang, Qiuling. “Sentiment analysis of big social data with Apache Hadoop.” 2015. Thesis, University of Hawaii – Manoa. Accessed March 01, 2021.
http://hdl.handle.net/10125/101227.
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
MLA Handbook (7th Edition):
Kang, Qiuling. “Sentiment analysis of big social data with Apache Hadoop.” 2015. Web. 01 Mar 2021.
Vancouver:
Kang Q. Sentiment analysis of big social data with Apache Hadoop. [Internet] [Thesis]. University of Hawaii – Manoa; 2015. [cited 2021 Mar 01].
Available from: http://hdl.handle.net/10125/101227.
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
Council of Science Editors:
Kang Q. Sentiment analysis of big social data with Apache Hadoop. [Thesis]. University of Hawaii – Manoa; 2015. Available from: http://hdl.handle.net/10125/101227
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation

Rutgers University
12.
Patel, Jimit.
Real time big data mining.
Degree: MS, Computer Science, 2015, Rutgers University
URL: https://rucore.libraries.rutgers.edu/rutgers-lib/49077/
► This thesis presents a parallel implementation of data streaming algorithms for multiple streams. Thousands of data streams are generated in different industries like finance, health,…
(more)
▼ This thesis presents a parallel implementation of
data streaming algorithms for multiple streams. Thousands of
data streams are generated in different industries like finance, health, internet, telecommunication, etc. The main problem is to analyze all these streams in real time to find correlation between streams, standard deviation, moving average, etc. There are efficient algorithms available to analyze multiple streams. However, we can still improve the performance of a system to analyze multiple streams through parallel implementation. This thesis specifically focuses on: 1) design and implementation of a parallel system for multiple streams to find Discrete Fourier Transform (DFT), Most Correlated Pair, Singular Value Decomposition, Standard Deviation, Moving Average, and Aggregated Average; 2) performance analysis of multiple threaded application versus single threaded application; and 3) visualization of archived
data.
Advisors/Committee Members: Palis, Michael A (chair).
Subjects/Keywords: Big data; Data mining
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Patel, J. (2015). Real time big data mining. (Masters Thesis). Rutgers University. Retrieved from https://rucore.libraries.rutgers.edu/rutgers-lib/49077/
Chicago Manual of Style (16th Edition):
Patel, Jimit. “Real time big data mining.” 2015. Masters Thesis, Rutgers University. Accessed March 01, 2021.
https://rucore.libraries.rutgers.edu/rutgers-lib/49077/.
MLA Handbook (7th Edition):
Patel, Jimit. “Real time big data mining.” 2015. Web. 01 Mar 2021.
Vancouver:
Patel J. Real time big data mining. [Internet] [Masters thesis]. Rutgers University; 2015. [cited 2021 Mar 01].
Available from: https://rucore.libraries.rutgers.edu/rutgers-lib/49077/.
Council of Science Editors:
Patel J. Real time big data mining. [Masters Thesis]. Rutgers University; 2015. Available from: https://rucore.libraries.rutgers.edu/rutgers-lib/49077/

Tampere University
13.
Syrjärinne, Paula.
Urban Traffic Analysis with Bus Location Data
.
Degree: 2016, Tampere University
URL: https://trepo.tuni.fi/handle/10024/98613
► Tässä työssä esitellään Tampereen alueen julkisen liikenteen linja-autoista kerätyn datan käyttöä ja analysointia. Aineistoa on analysoitu useilla eri algoritmeilla ja monesta eri näkökulmasta. Osa analyyseista…
(more)
▼ Tässä työssä esitellään Tampereen alueen julkisen liikenteen linja-autoista kerätyn datan käyttöä ja analysointia. Aineistoa on analysoitu useilla eri algoritmeilla ja monesta eri näkökulmasta. Osa analyyseista mittaa julkisen liikenteen palvelutasoa, osa tarjoaa matkustajille hyödyllistä lisäinformaatiota ja osa keskittyy havainnoimaan liikenteen yleistä sujuvuutta.
Työn alussa esitellään aiheeseen liittyviä taustatietoja ja aiemmin samasta aiheesta tehtyjä tutkimuksia. Erilaiset liikenteeseen liittyvät sensoriverkostot käydään läpi, keskittyen erityisesti sensoriautoverkostoihin. Kauttaaltaan työssä käsitellään Tampereen linja-autodataa liikkuvasta autosensoriverkosta kerättynä datana. Sensoriautoverkostojen analyysiin liittyvää kirjallisuutta esitellään työssä siten että tutkimukset on jaoteltu lähdedatan perusteella taksidataa, linja-autodataa ja mobiililaitedataa käsitteleviin artikkeleihin. Kuhunkin näistä liittyy erilaisia tutkimusongelmia.
Taksidataa käytettäessä puuttuvat havaintopisteet ovat yleisin ongelma, kun taas henkilöautoliikenteen mallintaminen linja-autoista kerätyn datan perusteella on tyypillinen kysymys bussidataa käyttävissä tutkimuksissa. Mobiililaitteista kerättyä dataa käytettäessä pitää sen sijaan yleensä ensin selvittää onko laite ylipäätään liikkuvassa ajoneuvossa.
Tampereen linja-autodata esitellään yksityiskohtaisesti. Tämä data on verrattain hyvälaatuista, koska sen päivitysnopeus on korkea, jokaiseen havaintoon on aina liittetty yksilölliset tunnisteet ja koko julkisen liikenteen verkoston alueelta on runsaasti havaintoja saatavilla. Kuten missä tahansa oikeasta lähteestä kerätyssä datassa, tässäkin aineistossa on kuitenkin ongelmia, kuten epäjohdonmukaisuuksia, virheitä ja kohinaa. Näiden virheiden odotettavissa olevat suuruusluokat on käyty datan esittelyssä läpi. Samoin esittelellään esikäsittelyprosessi, jossa dataa sekä puhdistetaan virheistä että sen kokoa ja muotoa muutetaan helpommin käytettäväksi tilastollisessa analyysissä.
Työn kokeellisessa osassa tarkastellaan aluksi datan käyttöä julkisen liikenteen toimivuuden mittaamisessa. Datasta on etsitty usein esiintyviä aika-paikka-linja –joukkoja, jotka paljastavat missä, milloin ja millä linjoilla bussit ovat säännöllisesti myöhässä. Sen lisäksi reittiajoja on jaoteltu paikan ja tapahtumien (kuten pysäkillä käynnit tai liikennevaloissa odottaminen) mukaan, jotta on löydetty syitä myöhästymisille.
Matkustajien kannalta tehdyissä kokeiluissa on toteutettu mm. dataan perustuvat pysäkkiaikataulut, jotka mukautuvat ajan mittaan todellisten saapumisaikojen mukaan. Saapumisajan lisäksi matkustajille annetaan arvio saapumisajan epävarmuudesta.
Yleisen liikenteen sujuvuuden analysoimiseksi esitellään katuosuusprofiilien käsite. Profiili kertoo kullekin pysäkinvälille normaalin ajoajan rajat kunakin vuorokaudenaikana. Profiileja voidaan käyttää pysäkinvälien luokitteluun esimerkiksi aamu- ja iltapäiväruuhkan vaikutusten mukaan, ja ne ovat perusta reaaliaikaisen poikkeustilamonitoroinnin tarpeisiin.…
Subjects/Keywords: data-analyysi
;
big data
;
älyliikenne
;
data analysis
;
big data
;
ITS
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Syrjärinne, P. (2016). Urban Traffic Analysis with Bus Location Data
. (Doctoral Dissertation). Tampere University. Retrieved from https://trepo.tuni.fi/handle/10024/98613
Chicago Manual of Style (16th Edition):
Syrjärinne, Paula. “Urban Traffic Analysis with Bus Location Data
.” 2016. Doctoral Dissertation, Tampere University. Accessed March 01, 2021.
https://trepo.tuni.fi/handle/10024/98613.
MLA Handbook (7th Edition):
Syrjärinne, Paula. “Urban Traffic Analysis with Bus Location Data
.” 2016. Web. 01 Mar 2021.
Vancouver:
Syrjärinne P. Urban Traffic Analysis with Bus Location Data
. [Internet] [Doctoral dissertation]. Tampere University; 2016. [cited 2021 Mar 01].
Available from: https://trepo.tuni.fi/handle/10024/98613.
Council of Science Editors:
Syrjärinne P. Urban Traffic Analysis with Bus Location Data
. [Doctoral Dissertation]. Tampere University; 2016. Available from: https://trepo.tuni.fi/handle/10024/98613

Indiana University
14.
Suriarachchi, Isuru.
Big provenance stream processing for data-intensive computations
.
Degree: 2018, Indiana University
URL: http://hdl.handle.net/2022/22579
► Industry, academia, and research alike are grappling with the opportunities that Big Data brings in the ability to analyze data from numerous sources for insight,…
(more)
▼ Industry, academia, and research alike are grappling with the opportunities that
Big Data brings in the ability to analyze
data from numerous sources for insight, decision making, and predictive forecasts. The analysis workflows for dealing with such volumes of
data are said to be large scale
data-intensive computations (DICs).
Data-intensive computation frameworks, also known as
Big Data processing frameworks, carry out both online and offline processing.
Big Data analysis workflows frequently consist of multiple steps:
data cleaning, joining
data from different sources and applying processing algorithms. Critically today the steps of a given workflow may be performed with different processing frameworks simultaneously, complicating the lifecycle of the
data products that go through the workflow. This is particularly the case in emerging
Big Data management solutions like
Data Lakes in which
data from multiple sources are stored in a shared storage solution and analyzed for different purposes at different points of time. In such an environment, accessibility and traceability of
data products are known to be hard to achieve.
Data provenance, or
data lineage, leads to a good solution for this problem as it provides the derivation history of a
data product and helps in monitoring, debugging and reproducing computations. Our initial research produced a provenance-based reference architecture and a prototype implementation to achieve better traceability and management. Experiments show that the size of fine-grained provenance collected from
data-intensive computations can be several times larger than the original
data itself, creating a
Big Data problem referred to in the literature “
Big Provenance”. Storing and managing
Big Provenance for later analysis is not be feasible for some
data-intensive applications due to high resource consumption. In addition to that, not all provenance is equally valuable and can be summarized without loss of critical information. In this thesis, I apply stream processing techniques to analyze streams of provenance captured from
data-intensive computations. The specific contributions are several. First, a provenance model which includes formal definitions for provenance stream, forward provenance and backward provenance in the context of
data-intensive computations. Second, a stateful, one-pass, parallel stream processing algorithm to summarize a full provenance stream on-the-fly by preserving backward provenance and forward provenance. The algorithm is resilient to provenance events arriving out-of-order. Multiple provenance stream partitioning strategies: horizontal, vertical, and random for provenance emerging from
data-intensive computations are also presented. A provenance stream processing architecture is developed to apply the proposed parallel streaming algorithm on a stream of provenance arriving through a distributed log store. The solution is evaluated using Apache Kafka log store, Apache Flink stream processing system, and the Komadu provenance capture service. Provenance…
Advisors/Committee Members: Plale, Beth (advisor).
Subjects/Keywords: Big Data;
Big Provenance;
Stream Processing
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Suriarachchi, I. (2018). Big provenance stream processing for data-intensive computations
. (Thesis). Indiana University. Retrieved from http://hdl.handle.net/2022/22579
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
Chicago Manual of Style (16th Edition):
Suriarachchi, Isuru. “Big provenance stream processing for data-intensive computations
.” 2018. Thesis, Indiana University. Accessed March 01, 2021.
http://hdl.handle.net/2022/22579.
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
MLA Handbook (7th Edition):
Suriarachchi, Isuru. “Big provenance stream processing for data-intensive computations
.” 2018. Web. 01 Mar 2021.
Vancouver:
Suriarachchi I. Big provenance stream processing for data-intensive computations
. [Internet] [Thesis]. Indiana University; 2018. [cited 2021 Mar 01].
Available from: http://hdl.handle.net/2022/22579.
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
Council of Science Editors:
Suriarachchi I. Big provenance stream processing for data-intensive computations
. [Thesis]. Indiana University; 2018. Available from: http://hdl.handle.net/2022/22579
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation

University of Victoria
15.
Chrimes, Dillon.
Towards a big data analytics platform with Hadoop/MapReduce framework using simulated patient data of a hospital system.
Degree: School of Health Information Science, 2016, University of Victoria
URL: http://hdl.handle.net/1828/7645
► Background: Big data analytics (BDA) is important to reduce healthcare costs. However, there are many challenges. The study objective was high performance establishment of interactive…
(more)
▼ Background:
Big data analytics (BDA) is important to reduce healthcare costs. However, there are many challenges. The study objective was high performance establishment of interactive BDA platform of hospital system.
Methods: A Hadoop/MapReduce framework formed the BDA platform with HBase (NoSQL database) using hospital-specific metadata and file ingestion. Query performance tested with Apache tools in Hadoop’s ecosystem.
Results: At optimized iteration, Hadoop distributed file system (HDFS) ingestion required three seconds but HBase required four to twelve hours to complete the Reducer of MapReduce. HBase bulkloads took a week for one billion (10TB) and over two months for three billion (30TB). Simple and complex query results showed about two seconds for one and three billion, respectively.
Interpretations: BDA platform of HBase distributed by Hadoop successfully under high performance at large volumes representing the Province’s entire
data. Inconsistencies of MapReduce limited operational efficiencies. Importance of the Hadoop/MapReduce on representation of health informatics is further discussed.
Advisors/Committee Members: Kuo, Alex (Mu - Hsing) (supervisor).
Subjects/Keywords: Big Data; Big Data Analytics; Big Data Tools; Big Data Visualizations; Hadoop Ecosystem; Health Big Data; Hospital Systems; Interactive Big Data; Patient Data; Simulations
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Chrimes, D. (2016). Towards a big data analytics platform with Hadoop/MapReduce framework using simulated patient data of a hospital system. (Masters Thesis). University of Victoria. Retrieved from http://hdl.handle.net/1828/7645
Chicago Manual of Style (16th Edition):
Chrimes, Dillon. “Towards a big data analytics platform with Hadoop/MapReduce framework using simulated patient data of a hospital system.” 2016. Masters Thesis, University of Victoria. Accessed March 01, 2021.
http://hdl.handle.net/1828/7645.
MLA Handbook (7th Edition):
Chrimes, Dillon. “Towards a big data analytics platform with Hadoop/MapReduce framework using simulated patient data of a hospital system.” 2016. Web. 01 Mar 2021.
Vancouver:
Chrimes D. Towards a big data analytics platform with Hadoop/MapReduce framework using simulated patient data of a hospital system. [Internet] [Masters thesis]. University of Victoria; 2016. [cited 2021 Mar 01].
Available from: http://hdl.handle.net/1828/7645.
Council of Science Editors:
Chrimes D. Towards a big data analytics platform with Hadoop/MapReduce framework using simulated patient data of a hospital system. [Masters Thesis]. University of Victoria; 2016. Available from: http://hdl.handle.net/1828/7645
16.
Cao, Xiang.
Efficient Data Management and Processing in Big Data Applications.
Degree: PhD, Computer Science, 2017, University of Minnesota
URL: http://hdl.handle.net/11299/188863
► In today's Big Data applications, huge amount of data are being generated. With the rapid growth of data amount, data management and processing become essential.…
(more)
▼ In today's Big Data applications, huge amount of data are being generated. With the rapid growth of data amount, data management and processing become essential. It is important to design efficient approaches to manage and process data. In this thesis, data management and processing are investigated for Big Data applications. Key-value store (KVS) is widely used in many Big Data applications by providing flexible and efficient performance. Recently, a new Ethernet accessed disk drive for key-value pairs called "Kinetic Drive" was developed by Seagate. It can reduce the management complexity, especially in large-scale deployment. It is important to manage the key-value pairs and store them in Kinetic Drives in an organized way. In this thesis, we present data allocation schemes on a large-scale key-value store system using Kinetic Drives. We investigate key indexing schemes and allocate data on drives accordingly. We propose efficient approaches to migrate data among drives. Also, it is necessary to manage huge amount of key-value pairs to provide attributes search for users. In this thesis, we design a large-scale searchable key-value store system based on Kinetic Drives. We investigate an indexing scheme to map data to the drives. We propose a key generation approach to reflect metadata information of the actual data and support users' attributes search requests. Nowadays, MapReduce has become a very popular framework to process data in many applications. Data shuffling usually accounts for a large portion of the entire running time of MapReduce jobs. In recent years, scale-up computing architecture for MapReduce jobs has been developed. With multi-processor, multi-core design connected via NUMAlink and large shared memories, NUMA architecture provides a powerful scale-up computing capability. In this thesis, we focus on the optimization of data shuffling phase in MapReduce framework in NUMA machine. We concentrate on the various bandwidth capacities of NUMAlink(s) among different memory locations to fully utilize the network. We investigate the NUMAlink topology and propose a topology-aware reducer placement algorithm to speed up the data shuffling phase. We extend our approach to a larger computing environment with multiple NUMA machines.
Subjects/Keywords: Big Data; Data Management; Data Processing
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Cao, X. (2017). Efficient Data Management and Processing in Big Data Applications. (Doctoral Dissertation). University of Minnesota. Retrieved from http://hdl.handle.net/11299/188863
Chicago Manual of Style (16th Edition):
Cao, Xiang. “Efficient Data Management and Processing in Big Data Applications.” 2017. Doctoral Dissertation, University of Minnesota. Accessed March 01, 2021.
http://hdl.handle.net/11299/188863.
MLA Handbook (7th Edition):
Cao, Xiang. “Efficient Data Management and Processing in Big Data Applications.” 2017. Web. 01 Mar 2021.
Vancouver:
Cao X. Efficient Data Management and Processing in Big Data Applications. [Internet] [Doctoral dissertation]. University of Minnesota; 2017. [cited 2021 Mar 01].
Available from: http://hdl.handle.net/11299/188863.
Council of Science Editors:
Cao X. Efficient Data Management and Processing in Big Data Applications. [Doctoral Dissertation]. University of Minnesota; 2017. Available from: http://hdl.handle.net/11299/188863

Georgia Tech
17.
Lee, Kisung.
Scalable big data systems: Architectures and optimizations.
Degree: PhD, Computer Science, 2015, Georgia Tech
URL: http://hdl.handle.net/1853/55520
► Big data analytics has become not just a popular buzzword but also a strategic direction in information technology for many enterprises and government organizations. Even…
(more)
▼ Big data analytics has become not just a popular buzzword but also a strategic direction in information technology for many enterprises and government organizations. Even though many new computing and storage systems have been developed for
big data analytics, scalable
big data processing has become more and more challenging as a result of the huge and rapidly growing size of real-world
data. Dedicated to the development of architectures and optimization techniques for scaling
big data processing systems, especially in the era of cloud computing, this dissertation makes three unique contributions. First, it introduces a suite of graph partitioning algorithms that can run much faster than existing
data distribution methods and inherently scale to the growth of
big data. The main idea of these approaches is to partition a
big graph by preserving the core computational
data structure as much as possible to maximize intra-server computation and minimize inter-server communication. In addition, it proposes a distributed iterative graph computation framework that effectively utilizes secondary storage to maximize access locality and speed up distributed iterative graph computations. The framework not only considerably reduces memory requirements for iterative graph algorithms but also significantly improves the performance of iterative graph computations. Last but not the least, it establishes a suite of optimization techniques for scalable spatial
data processing along with three orthogonal dimensions: (i) scalable processing of spatial alarms for mobile users traveling on road networks, (ii) scalable location tagging for improving the quality of Twitter
data analytics and prediction accuracy, and (iii) lightweight spatial indexing for enhancing the performance of
big spatial
data queries.
Advisors/Committee Members: Liu, Ling (advisor), Omiecinski, Ed (committee member), Pu, Calton (committee member), Schwan, Karsten (committee member), Ramaswamy, Lakshmish (committee member).
Subjects/Keywords: Big data; Graph data; Spatial data
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Lee, K. (2015). Scalable big data systems: Architectures and optimizations. (Doctoral Dissertation). Georgia Tech. Retrieved from http://hdl.handle.net/1853/55520
Chicago Manual of Style (16th Edition):
Lee, Kisung. “Scalable big data systems: Architectures and optimizations.” 2015. Doctoral Dissertation, Georgia Tech. Accessed March 01, 2021.
http://hdl.handle.net/1853/55520.
MLA Handbook (7th Edition):
Lee, Kisung. “Scalable big data systems: Architectures and optimizations.” 2015. Web. 01 Mar 2021.
Vancouver:
Lee K. Scalable big data systems: Architectures and optimizations. [Internet] [Doctoral dissertation]. Georgia Tech; 2015. [cited 2021 Mar 01].
Available from: http://hdl.handle.net/1853/55520.
Council of Science Editors:
Lee K. Scalable big data systems: Architectures and optimizations. [Doctoral Dissertation]. Georgia Tech; 2015. Available from: http://hdl.handle.net/1853/55520

University of California – Riverside
18.
Jacobs, Steven.
A BAD Thesis: The Vision, Creation, and Evaluation of a Big Active Data Platform.
Degree: Computer Science, 2018, University of California – Riverside
URL: http://www.escholarship.org/uc/item/47g680h1
► Virtually all of today's Big Data systems are passive in nature, responding to queries posted by their users. Instead, this thesis aims to shift Big…
(more)
▼ Virtually all of today's Big Data systems are passive in nature, responding to queries posted by their users. Instead, this thesis aims to shift Big Data platforms from passive to active. A Big Active Data (BAD) system should continuously and reliably capture Big Data while enabling timely and automatic delivery of relevant information to a large pool of interested users, as well as supporting retrospective analyses of historical information. While various scalable streaming query engines have been created, their active behavior is limited to a (relatively) small window of the incoming data.To this end this thesis presents a BAD platform, that combines ideas and capabilities from both Big Data and Active Data (e.g., Publish/Subscribe, Streaming Engines). It supports complex subscriptions that consider not only newly arrived items but also their relationships to past, stored data. Further, it can provide actionable notifications by enriching the subscription results with other useful data. The platform extends an existing open-source Big Data Management System, Apache AsterixDB, with an it{active toolkit}. The toolkit contains features to rapidly ingest semistructured data, share execution pipelines among users, manage scaled user data subscriptions, and actively monitor the state of the data to produce individualized information for each user.This thesis describes the features and designs of the current BAD system and demonstrates its ability to scale without sacrificing query capability or individualization. One part of the BAD platform, the \emph{Data Feed}, relies on storage mechanisms that allow for fast ingestion, namely the Log-Structured Merge-Tree (LSM-Tree). As such, this thesis also presents work on a formal evaluation and performance comparison of theoretical and existing LSM Merge policies for fast ingestion.
Subjects/Keywords: Computer science; Active Data; Big Active Data; Big Data
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Jacobs, S. (2018). A BAD Thesis: The Vision, Creation, and Evaluation of a Big Active Data Platform. (Thesis). University of California – Riverside. Retrieved from http://www.escholarship.org/uc/item/47g680h1
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
Chicago Manual of Style (16th Edition):
Jacobs, Steven. “A BAD Thesis: The Vision, Creation, and Evaluation of a Big Active Data Platform.” 2018. Thesis, University of California – Riverside. Accessed March 01, 2021.
http://www.escholarship.org/uc/item/47g680h1.
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
MLA Handbook (7th Edition):
Jacobs, Steven. “A BAD Thesis: The Vision, Creation, and Evaluation of a Big Active Data Platform.” 2018. Web. 01 Mar 2021.
Vancouver:
Jacobs S. A BAD Thesis: The Vision, Creation, and Evaluation of a Big Active Data Platform. [Internet] [Thesis]. University of California – Riverside; 2018. [cited 2021 Mar 01].
Available from: http://www.escholarship.org/uc/item/47g680h1.
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
Council of Science Editors:
Jacobs S. A BAD Thesis: The Vision, Creation, and Evaluation of a Big Active Data Platform. [Thesis]. University of California – Riverside; 2018. Available from: http://www.escholarship.org/uc/item/47g680h1
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
19.
Fize, Jacques.
Mise en correspondance de données textuelles hétérogènes fondée sur la dimension spatiale : Matching between heterogeneous textual data based on spatial dimension.
Degree: Docteur es, Informatique, 2019, Montpellier
URL: http://www.theses.fr/2019MONTS099
► Avec l’essor du Big Data, le traitement du Volume, de la Vélocité (croissance et évolution) et de la Variété de la donnée concentre les efforts…
(more)
▼ Avec l’essor du Big Data, le traitement du Volume, de la Vélocité (croissance et évolution) et de la Variété de la donnée concentre les efforts des différentes communautés pour exploiter ces nouvelles ressources. Ces nouvelles ressources sont devenues si importantes, que celles-ci sont considérées comme le nouvel « or noir ». Au cours des dernières années, le volume et la vélocité sont des aspects de la donnée qui sont maitrisés contrairement à la variété qui elle reste un défi majeur. Cette thèse présente deux contributions dans le domaine de mise en correspondance de données hétérogènes, avec un focus sur la dimensions spatiale.La première contribution repose sur un processus de mise en correspondance de données textuelles hétérogènes divisé en deux étapes : la géoreprésentation et le géomatching. Dans la première phase, nous proposons de représenter la dimension spatiale de chaque document d'un corpus à travers une structure dédiée, la Spatial Textual Representation (STR). Cette représentation de type graphe est composée des entités spatiales identifiées dans le document, ainsi que les relations spatiales qu'elles entretiennent. Pour identifier les entités spatiales d'un document et leurs relations spatiales, nous proposons une ressource dédiée, nommée Geodict. La seconde phase, le géomatching, consiste à mesurer la similarité entre les représentations générées (STR). S'appuyant sur la nature de la structure de la STR (i.e. graphe), différents algorithmes de graph matching ont été étudiés. Pour évaluer la pertinence d'une correspondance, nous proposons un ensemble de 6 critères s'appuyant sur une définition de la similarité spatiale entre deux documents.La seconde contribution repose sur la dimension thématique des données textuelles et sa participation dans le processus de mise en correspondance spatiale. Nous proposons d'identifier les thèmes apparaissant dans la même fenêtre contextuelle que certaines entités spatiales. L'objectif est d'induire certaines des similarités spatiales implicites entre les documents. Pour cela, nous proposons d'étendre la structure de la STR à l'aide de deux concepts : l'entité thématique et de la relation thématique. L'entité thématique représente un concept propre à un domaine particulier (agronome, médical) et représenté selon différentes orthographes présentes dans une ressource terminologique, ici un vocabulaire. Une relation thématique lie une entité spatiale à une entité thématique si celles-ci apparaissent dans une même fenêtre contextuelle. Les vocabulaires choisis ainsi que la nouvelle forme de la STR intégrant la dimension thématique sont évalués selon leur couverture sur les corpus étudiés, ainsi que leurs contributions dans le processus de mise en correspondance spatiale.
With the rise of Big Data, the processing of Volume, Velocity (growth and evolution) and data Variety concentrates the efforts of communities to exploit these new resources. These new resources have become so important that they are considered the new "black gold". In recent years, volume and…
Advisors/Committee Members: Roche, Mathieu (thesis director), Teisseire, Maguelonne (thesis director).
Subjects/Keywords: Science des données; Big Data; Biodiversité; Data Science; Big Data; Biodiversity
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Fize, J. (2019). Mise en correspondance de données textuelles hétérogènes fondée sur la dimension spatiale : Matching between heterogeneous textual data based on spatial dimension. (Doctoral Dissertation). Montpellier. Retrieved from http://www.theses.fr/2019MONTS099
Chicago Manual of Style (16th Edition):
Fize, Jacques. “Mise en correspondance de données textuelles hétérogènes fondée sur la dimension spatiale : Matching between heterogeneous textual data based on spatial dimension.” 2019. Doctoral Dissertation, Montpellier. Accessed March 01, 2021.
http://www.theses.fr/2019MONTS099.
MLA Handbook (7th Edition):
Fize, Jacques. “Mise en correspondance de données textuelles hétérogènes fondée sur la dimension spatiale : Matching between heterogeneous textual data based on spatial dimension.” 2019. Web. 01 Mar 2021.
Vancouver:
Fize J. Mise en correspondance de données textuelles hétérogènes fondée sur la dimension spatiale : Matching between heterogeneous textual data based on spatial dimension. [Internet] [Doctoral dissertation]. Montpellier; 2019. [cited 2021 Mar 01].
Available from: http://www.theses.fr/2019MONTS099.
Council of Science Editors:
Fize J. Mise en correspondance de données textuelles hétérogènes fondée sur la dimension spatiale : Matching between heterogeneous textual data based on spatial dimension. [Doctoral Dissertation]. Montpellier; 2019. Available from: http://www.theses.fr/2019MONTS099

Jönköping University
20.
Rystadius, Gustaf; Monell, David.
The dynamic management revolution of Big Data : A case study of Åhlen’s Big Data Analytics operation.
Degree: Jönköping International Business School, 2020, Jönköping University
URL: http://urn.kb.se/resolve?urn=urn:nbn:se:hj:diva-48959
► Background: The implementation of Big Data Analytics (BDA) has drastically increased within several sectors such as retailing. Due to its rapidly altering environment, companies have…
(more)
▼ Background: The implementation of Big Data Analytics (BDA) has drastically increased within several sectors such as retailing. Due to its rapidly altering environment, companies have to adapt and modify their business strategies and models accordingly. The concepts of ambidexterity and agility are said to act as mediators to these changes in relation to a company’s capabilities within BDA. Problem: Research within the respective fields of dynamic mediators and BDAC have been conducted, but the investigation of specific traits of these mediators, their interconnection and its impact on BDAC is scant. This actuality is seen as a surprise from scholars, calling for further empirical investigation. Purpose: This paper sought to empirically investigate what specific traits of ambidexterity and agility that emerged within the case company of Åhlen’s BDA-operation, and how these traits are interconnected. It further studied how these traits and their interplay impacts the firm's talent and managerial BDAC. Method: A qualitative case study on the retail firm Åhlens was conducted with three participants central to the firm's BDA-operation. Semi-structured interviews were conducted with questions derived from the conceptual framework based upon reviewed literature and pilot interviews. The data was then analyzed and matched to literature using a thematic analysis approach. Results: Five ambidextrous traits and three agile traits were found within Åhlen’s BDA-operation. Analysis of these traits showcased a clear positive impact on Åhlen’s BDAC, when properly interconnected. Further, it was found that in absence of such interplay, the dynamic mediators did not have as positive impact and occasionally even disruptive effects on the firm’s BDAC. Hence it was concluded that proper connection between the mediators had to be present in order to successfully impact and enhance the capabilities.
Subjects/Keywords: Big Data Analytics; Big Data Analytics Capabilities; Ambidexterity; Agility; Big Data; Business Administration; Företagsekonomi
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Rystadius, Gustaf; Monell, D. (2020). The dynamic management revolution of Big Data : A case study of Åhlen’s Big Data Analytics operation. (Thesis). Jönköping University. Retrieved from http://urn.kb.se/resolve?urn=urn:nbn:se:hj:diva-48959
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
Chicago Manual of Style (16th Edition):
Rystadius, Gustaf; Monell, David. “The dynamic management revolution of Big Data : A case study of Åhlen’s Big Data Analytics operation.” 2020. Thesis, Jönköping University. Accessed March 01, 2021.
http://urn.kb.se/resolve?urn=urn:nbn:se:hj:diva-48959.
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
MLA Handbook (7th Edition):
Rystadius, Gustaf; Monell, David. “The dynamic management revolution of Big Data : A case study of Åhlen’s Big Data Analytics operation.” 2020. Web. 01 Mar 2021.
Vancouver:
Rystadius, Gustaf; Monell D. The dynamic management revolution of Big Data : A case study of Åhlen’s Big Data Analytics operation. [Internet] [Thesis]. Jönköping University; 2020. [cited 2021 Mar 01].
Available from: http://urn.kb.se/resolve?urn=urn:nbn:se:hj:diva-48959.
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
Council of Science Editors:
Rystadius, Gustaf; Monell D. The dynamic management revolution of Big Data : A case study of Åhlen’s Big Data Analytics operation. [Thesis]. Jönköping University; 2020. Available from: http://urn.kb.se/resolve?urn=urn:nbn:se:hj:diva-48959
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation

Brunel University
21.
Lakoju, Mike.
A strategic approach of value identification for a big data project.
Degree: PhD, 2017, Brunel University
URL: http://bura.brunel.ac.uk/handle/2438/15837
;
https://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.764867
► The disruptive nature of innovations and technological advancements present potentially huge benefits, however, it is critical to take caution because they also come with challenges.…
(more)
▼ The disruptive nature of innovations and technological advancements present potentially huge benefits, however, it is critical to take caution because they also come with challenges. This author holds fast to the school of thought which suggests that every organisation or society should properly evaluate innovations and their attendant challenges from a strategic perspective, before adopting them, or else could get blindsided by the after effects. Big Data is one of such innovations, currently trending within industry and academia. The instinctive nature of Organizations compels them to constantly find new ways to stay ahead of the competition. It is for this reason, that some incoherencies exist in the field of big data. While on the one hand, we have some Organizations rushing into implementing Big Data Projects, we also have in possibly equal measure, many other organisations that remain sceptical and uncertain of the benefits of "Big Data" in general and are also concerned with the implementation costs. What this has done is, create a huge focus on the area of Big Data Implementation. Literature reveals a good number of challenges around Big Data project implementations. For example, most Big Data projects are either abandoned or do not hit their expected target. Unfortunately, most IS literature has focused on implementation methodologies that are primarily focused on the data, resources, Big Data infrastructures, algorithms etc. Rather than leaving the incoherent space that exists to remain, this research seeks to collapse the space and open opportunities to harness and expand knowledge. Consequently, the research takes a slightly different standpoint by approaching Big Data implementation from a Strategic Perspective. The author emphasises the fact that focus should be shifted from going straight into implementing Big Data projects to first implementing a Big Data Strategy for the Organization. Before implementation, this strategy step will create the value proposition and identify deliverables to justify the project. To this end, the researcher combines an Alignment theory, with Digital Business Strategy theory to create a Big Data Strategy Framework that Organisations could use to align their business strategy with the Big Data project. The Framework was tested in two case studies, and the study resulted in the generation of the strategic Big Data Goals for both case studies. This Big Data Strategy framework aided the organisation in identifying the potential value that could be obtained from their Big Data project. These Strategic Big Data Goals can now be implemented in Big data Projects.
Subjects/Keywords: 005.7; Big data strategy; Savi-bigd; Big data; Digital business strategy; Big data framework
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Lakoju, M. (2017). A strategic approach of value identification for a big data project. (Doctoral Dissertation). Brunel University. Retrieved from http://bura.brunel.ac.uk/handle/2438/15837 ; https://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.764867
Chicago Manual of Style (16th Edition):
Lakoju, Mike. “A strategic approach of value identification for a big data project.” 2017. Doctoral Dissertation, Brunel University. Accessed March 01, 2021.
http://bura.brunel.ac.uk/handle/2438/15837 ; https://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.764867.
MLA Handbook (7th Edition):
Lakoju, Mike. “A strategic approach of value identification for a big data project.” 2017. Web. 01 Mar 2021.
Vancouver:
Lakoju M. A strategic approach of value identification for a big data project. [Internet] [Doctoral dissertation]. Brunel University; 2017. [cited 2021 Mar 01].
Available from: http://bura.brunel.ac.uk/handle/2438/15837 ; https://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.764867.
Council of Science Editors:
Lakoju M. A strategic approach of value identification for a big data project. [Doctoral Dissertation]. Brunel University; 2017. Available from: http://bura.brunel.ac.uk/handle/2438/15837 ; https://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.764867

Universiteit Utrecht
22.
Franzke, A.S.
Big Data Ethicist - What will the role of the ethicist be in advising governments in the field of big data?.
Degree: 2016, Universiteit Utrecht
URL: http://dspace.library.uu.nl:8080/handle/1874/336098
► This paper elaborates on the question how the ethicist can address the demands for ethical expertise in governments occurring through big data practices. In the…
(more)
▼ This paper elaborates on the question how the ethicist can address the demands for ethical expertise in governments occurring through
big data practices. In the field of
big data, information about known ethical risks is needed. This role of providing information is more related to the role of the ethical consultant, but also discussion and open reflection is needed. Thus, a hybrid between ethical consultant and ethical facilitator will be the most beneficial way to increase a culture of engagement in those working with
big data. Such an approach is needed to keep reflection of how to use this technology on going. To avoid, fruitless discussion, three steps are presented. First the ethicist should gain some insight into the context, then distil the most relevant issues, address them and end with a clear recommendation how to proceed. The ethicist should be both open for discussion and solutions oriented.
Advisors/Committee Members: Anderson, Joel.
Subjects/Keywords: big data; ethic; e-government
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Franzke, A. S. (2016). Big Data Ethicist - What will the role of the ethicist be in advising governments in the field of big data?. (Masters Thesis). Universiteit Utrecht. Retrieved from http://dspace.library.uu.nl:8080/handle/1874/336098
Chicago Manual of Style (16th Edition):
Franzke, A S. “Big Data Ethicist - What will the role of the ethicist be in advising governments in the field of big data?.” 2016. Masters Thesis, Universiteit Utrecht. Accessed March 01, 2021.
http://dspace.library.uu.nl:8080/handle/1874/336098.
MLA Handbook (7th Edition):
Franzke, A S. “Big Data Ethicist - What will the role of the ethicist be in advising governments in the field of big data?.” 2016. Web. 01 Mar 2021.
Vancouver:
Franzke AS. Big Data Ethicist - What will the role of the ethicist be in advising governments in the field of big data?. [Internet] [Masters thesis]. Universiteit Utrecht; 2016. [cited 2021 Mar 01].
Available from: http://dspace.library.uu.nl:8080/handle/1874/336098.
Council of Science Editors:
Franzke AS. Big Data Ethicist - What will the role of the ethicist be in advising governments in the field of big data?. [Masters Thesis]. Universiteit Utrecht; 2016. Available from: http://dspace.library.uu.nl:8080/handle/1874/336098

Penn State University
23.
Iyer, Karthik Thyagarajan.
Computational complexity of data mining algorithms used in fraud detection.
Degree: 2015, Penn State University
URL: https://submit-etda.libraries.psu.edu/catalog/26437
► According to estimates by certain government agencies, 10% of the total medical expenditure is lost to healthcare fraud. Similarly the credit card industry loses billions…
(more)
▼ According to estimates by certain government agencies, 10% of the total medical expenditure is lost to healthcare fraud. Similarly the credit card industry loses billions of dollars every year due to fraudulent transactions. The datasets used to identify these fraudulent transactions have millions of rows and each transaction is defined by 15-40 attributes. It is not possible for a human to sift through these massive datasets and find the fraudulent transactions. Hence credit card companies and insurance companies use
data mining algorithms to identify fraudulent transactions. These
data mining algorithms need to identify the fraudulent transactions efficiently and at the same time they need to process the dataset as quickly as possible. The time taken by the algorithm to execute is a function of the computational complexity of the algorithm. In this thesis, the theoretical running time complexities of the various
data mining algorithms used in fraud detection are compared, i.e. the complexity is expressed as a function of the number of instances in the database. These algorithms were then run on statistical tools like Weka and R and on comparing the performance of these algorithms to the theoretical computational efficiency it was found that, all algorithms agree with the
big O complexity. Support vector machines and decision trees performed better than the
big O complexity.
Advisors/Committee Members: Vittaldas V Prabhu, Thesis Advisor/Co-Advisor.
Subjects/Keywords: Data mining; Complexity; Big O
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Iyer, K. T. (2015). Computational complexity of data mining algorithms used in fraud detection. (Thesis). Penn State University. Retrieved from https://submit-etda.libraries.psu.edu/catalog/26437
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
Chicago Manual of Style (16th Edition):
Iyer, Karthik Thyagarajan. “Computational complexity of data mining algorithms used in fraud detection.” 2015. Thesis, Penn State University. Accessed March 01, 2021.
https://submit-etda.libraries.psu.edu/catalog/26437.
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
MLA Handbook (7th Edition):
Iyer, Karthik Thyagarajan. “Computational complexity of data mining algorithms used in fraud detection.” 2015. Web. 01 Mar 2021.
Vancouver:
Iyer KT. Computational complexity of data mining algorithms used in fraud detection. [Internet] [Thesis]. Penn State University; 2015. [cited 2021 Mar 01].
Available from: https://submit-etda.libraries.psu.edu/catalog/26437.
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
Council of Science Editors:
Iyer KT. Computational complexity of data mining algorithms used in fraud detection. [Thesis]. Penn State University; 2015. Available from: https://submit-etda.libraries.psu.edu/catalog/26437
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation

Queens University
24.
Eftekhari, Azadeh.
Binary: a Framework For Big Data Integration For Ad-Hoc Querying
.
Degree: Computing, 2016, Queens University
URL: http://hdl.handle.net/1974/14234
► Traditional relational database systems are not practical for big data workloads that require scalable architectures for efficient data storage, manipulation, and analysis. Apache Hadoop, one…
(more)
▼ Traditional relational database systems are not practical for big data workloads that require scalable architectures for efficient data storage, manipulation, and analysis. Apache Hadoop, one of these big data frameworks, provides distributed storage and processing as well as a central repository for different types of data from different sources. Data integration from various sources is often required before performing analytics. Apache Hive on Hadoop is widely used for this purpose, as well as for data summarization and analysis. It has features such as a SQL-like query language, a Metastore to hold metadata and file formats to support access to various frameworks on Hadoop and beyond. For comprehensive analysis and decision-making, however, a hybrid system is required to integrate Hadoop with traditional relational database management systems in order to access the valuable data stored in relational databases. Current hybrid systems are either expensive proprietary products or require a system to be developed by the user, which requires programming knowledge. In addition these approaches are not sufficiently flexible to be applied to other frameworks.
In this thesis, we propose a framework called BINARY (A framework for Big data INtegration for Ad-hoc queRYing). BINARY is a hybrid Software as a Service that provides a web interface supported by a back-end infrastructure for ad-hoc querying, accessing, visualizing and joining data from different data sources, including Relational Database Management Systems and Apache Hive. Our framework uses scalable Hive and HDFS big data storage systems and supports different data sources via back-end resource adapters. There is also a front-end web interface that enables the use of HiveQL to query the data sources.
The framework is extendable and allows adding other storage engines (e.g. HBase) and analytics engines (e.g. R) as needed. We used REST software architecture to enable loose connections between the engines and the User Interface programs to facilitate independent updates without affecting the data infrastructure. Our approach is validated with a proof-of-concept prototype implemented on the OpenStack cloud system.
Subjects/Keywords: Ad-Hoc Query
;
Big Data
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Eftekhari, A. (2016). Binary: a Framework For Big Data Integration For Ad-Hoc Querying
. (Thesis). Queens University. Retrieved from http://hdl.handle.net/1974/14234
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
Chicago Manual of Style (16th Edition):
Eftekhari, Azadeh. “Binary: a Framework For Big Data Integration For Ad-Hoc Querying
.” 2016. Thesis, Queens University. Accessed March 01, 2021.
http://hdl.handle.net/1974/14234.
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
MLA Handbook (7th Edition):
Eftekhari, Azadeh. “Binary: a Framework For Big Data Integration For Ad-Hoc Querying
.” 2016. Web. 01 Mar 2021.
Vancouver:
Eftekhari A. Binary: a Framework For Big Data Integration For Ad-Hoc Querying
. [Internet] [Thesis]. Queens University; 2016. [cited 2021 Mar 01].
Available from: http://hdl.handle.net/1974/14234.
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
Council of Science Editors:
Eftekhari A. Binary: a Framework For Big Data Integration For Ad-Hoc Querying
. [Thesis]. Queens University; 2016. Available from: http://hdl.handle.net/1974/14234
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
25.
Castillo, Vanesa.
Tendencias, retos y oportunidades en la aplicación de analytics en recursos humanos en compañías innovadoras en Argentina 2016-2017.
Degree: MBA, Business Administration, 2017, Universidad Torcuato di Tella
URL: https://repositorio.utdt.edu/handle/utdt/11140
► Los principales resultados arrojados es que el uso de los analytics aplicados a la función de recursos en empresas innovadoras en Argentina es aún incipiente.…
(more)
▼ Los principales resultados arrojados es que el uso de los analytics aplicados a la función de recursos en empresas innovadoras en Argentina es aún incipiente. no siendo una práctica altamente prevalente y no estando en el estado de madurez necesario para comenzar a rendir los frutos. Más de la mitad de las organizaciones en lugar de estar desarrollando analytics predictivos o prescriptivos se encuentran aun tratando de dilucidar cómo manejar grandes cantidades de
data y acceder siquiera a los reportes más básicos. Lo que quiere decir que la mayoría sigue basando sus decisiones relacionadas al talento en la intuición o en los presentimientos de los Managers o de los profesionales de Recursos Humanos
Advisors/Committee Members: De Simone, Paola (advisor).
Subjects/Keywords: Recursos humanos; Big data
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Castillo, V. (2017). Tendencias, retos y oportunidades en la aplicación de analytics en recursos humanos en compañías innovadoras en Argentina 2016-2017. (Thesis). Universidad Torcuato di Tella. Retrieved from https://repositorio.utdt.edu/handle/utdt/11140
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
Chicago Manual of Style (16th Edition):
Castillo, Vanesa. “Tendencias, retos y oportunidades en la aplicación de analytics en recursos humanos en compañías innovadoras en Argentina 2016-2017.” 2017. Thesis, Universidad Torcuato di Tella. Accessed March 01, 2021.
https://repositorio.utdt.edu/handle/utdt/11140.
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
MLA Handbook (7th Edition):
Castillo, Vanesa. “Tendencias, retos y oportunidades en la aplicación de analytics en recursos humanos en compañías innovadoras en Argentina 2016-2017.” 2017. Web. 01 Mar 2021.
Vancouver:
Castillo V. Tendencias, retos y oportunidades en la aplicación de analytics en recursos humanos en compañías innovadoras en Argentina 2016-2017. [Internet] [Thesis]. Universidad Torcuato di Tella; 2017. [cited 2021 Mar 01].
Available from: https://repositorio.utdt.edu/handle/utdt/11140.
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
Council of Science Editors:
Castillo V. Tendencias, retos y oportunidades en la aplicación de analytics en recursos humanos en compañías innovadoras en Argentina 2016-2017. [Thesis]. Universidad Torcuato di Tella; 2017. Available from: https://repositorio.utdt.edu/handle/utdt/11140
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation

Tampere University
26.
Hassi, Sakari.
Big Datan visualisoinnin kokemus virtuaalitodellisuudessa
.
Degree: 2018, Tampere University
URL: https://trepo.tuni.fi/handle/10024/104000
► Tutkielmassa pyrittiin selvittämään vastauksia siihen olisiko virtuaalitodellisuus soveltuva ympäristö Big Datan visualisoimiseen, eli tehostaisiko kokemuksellisempi ympäristö Big Dataksi luokiteltavien datajoukkojen ymmärtämistä. Tutkimuskysymykseen liittyen tutkielmassa haluttiin…
(more)
▼ Tutkielmassa pyrittiin selvittämään vastauksia siihen olisiko virtuaalitodellisuus soveltuva ympäristö Big Datan visualisoimiseen, eli tehostaisiko kokemuksellisempi ympäristö Big Dataksi luokiteltavien datajoukkojen ymmärtämistä. Tutkimuskysymykseen liittyen tutkielmassa haluttiin selvittää, miten käyttäjäkokemus tiedon visualisoinnista eroaa virtuaalitodellisuuden ja työasemaympäristön välillä ja miten käyttäjät kokevat tiedon visualisoinnin kokemuksen virtuaalitodellisuudessa.
Vastausten selvittämiseksi tutkielma aloitettiin Big Datan käsitteen sekä aikaisempien virtuaalitodellisuuteen pohjautuneiden Big Datan visualisointijärjestelmien taustakartoituksella. Aikaisempien visualisointijärjestelmien raportoituja ominaisuuksia peilattiin Big Datan käsitettä vasten ja tehtiin havaintoja siitä, että aikaisemmat ratkaisut ovat huonosti täyttäneet Big Datan käsitteen mukaisia vaatimuksia ja eivät tarjonneet pohjaa tässä tutkielmassa toteutettavia visualisointeja varten. Tutkielman toteutusvaiheessa luotiin kolme visualisointikokonaisuutta, joista luotiin erilliset demot virtuaalitodellisuuteen sekä työasemaympäristöön. Visualisoinnin demot pyrittiin toteuttamaan Big Datan käsitteiden mukaisesti näitä kaikkia kuitenkaan saavuttamatta. Tutkielman rajallisilla resursseilla Big Datan asettamista haasteista suurimmaksi koettiin tarpeeksi laajan datamäärän hyödyntämisen sekä Big Datan määritteiden mukaiseen käyttöön soveltuvien tietokantojen löytämisen. Luotuja testijärjestelmiä varten luotiin testisuunnitelma, jonka mukaisesti suoritettiin 10 osallistujan käyttäjätestaus tiedon visualisoinnin kokemusten selvittämiseksi yhtäläisen virtuaalitodellisuustoteutuksen ja työasemaversion välillä. Käyttäjätutkimuksissa monet käyttäjät kokivat virtuaalitodellisuuden visualisoinnit kokonaisvaltaisempana kokemuksena ja ympäristö mahdollisti paremman keskittymisen visualisoinnin sisältöön. Osallistujat kuitenkin kokivat, että virtuaalitodellisuuden visualisointien tulisi olla luotuna virtuaalitodellisuuden tarjoamia mahdollisuuksia hyödyntäen, jotta erilaisen ympäristön hyödyntäminen koetaan merkityksellisenä. Tämän lisäksi virtuaalitodellisuudessa käytettyjen vuorovaikutustekniikoiden soveltuvuus sekä järjestelmän käytön sujuvuus korostuivat huomiota herättäneinä tekijöinä.
Subjects/Keywords: Big Data;
virtuaalitodellisuus;
visualisointi;
käyttäjäkokemus
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Hassi, S. (2018). Big Datan visualisoinnin kokemus virtuaalitodellisuudessa
. (Masters Thesis). Tampere University. Retrieved from https://trepo.tuni.fi/handle/10024/104000
Chicago Manual of Style (16th Edition):
Hassi, Sakari. “Big Datan visualisoinnin kokemus virtuaalitodellisuudessa
.” 2018. Masters Thesis, Tampere University. Accessed March 01, 2021.
https://trepo.tuni.fi/handle/10024/104000.
MLA Handbook (7th Edition):
Hassi, Sakari. “Big Datan visualisoinnin kokemus virtuaalitodellisuudessa
.” 2018. Web. 01 Mar 2021.
Vancouver:
Hassi S. Big Datan visualisoinnin kokemus virtuaalitodellisuudessa
. [Internet] [Masters thesis]. Tampere University; 2018. [cited 2021 Mar 01].
Available from: https://trepo.tuni.fi/handle/10024/104000.
Council of Science Editors:
Hassi S. Big Datan visualisoinnin kokemus virtuaalitodellisuudessa
. [Masters Thesis]. Tampere University; 2018. Available from: https://trepo.tuni.fi/handle/10024/104000

University of Houston
27.
Nguyen, Hung Khanh.
Big Data Optimization for Distributed Resource Management in Smart Grid.
Degree: PhD, Electrical Engineering, 2017, University of Houston
URL: http://hdl.handle.net/10657/1878
► Electric power grids are experiencing the increasing adoption of distributed energy resources, which can bring huge economical and environmental benefit. However, the large-scale penetration of…
(more)
▼ Electric power grids are experiencing the increasing adoption of distributed energy resources, which can bring huge economical and environmental benefit. However, the large-scale penetration of distributed energy resources will make both operations and long-term planning to be more and more complex due to the higher degree of output variability than traditional centralized sources. This variability creates irresistible challenges for grid operators to ensure system security and reliability. In addition, traditional optimization algorithms are no longer applicable for such integrated and complex systems in which economic efficiency, grid reliability, and privacy need to be simultaneously satisfied. Therefore, an innovative optimization framework is critical to tackle the emerging challenges due to the large-scale and independent decision-making nature of distributed resource management problem in the future power system.
In this dissertation, we focus on the application of
big data optimization methods for distributed resource management problem in smart grid to improve the reliability and security of the distribution system. First, we propose an incentive mechanism design to motivate microgrids to participate in the peak ramp minimization problem for the system to mitigate the ramping effect due to the high penetration of distributed renewable generations. Distributed algorithms to achieve the optimal operation point are proposed, which allow microgrids to execute their computation in either synchronous fashion or asynchronous fashion. Second, a large-scale optimization problem for microgrid optimal scheduling and the load curtailment problem is formulated. We propose a decomposition algorithm and implement parallel computation for the proposed algorithm to run on a computer cluster using the Hadoop MapReduce software framework. Third, a decentralized reactive power compensation model is studied to reduce the power losses and improve the voltage profile for distribution networks. Finally, we consider
big data optimization methods for resource allocation problem in wireless network virtualization to prevent traffic disruption against physical network failures.
Advisors/Committee Members: Han, Zhu (advisor), Rajashekara, Kaushik (committee member), Pan, Miao (committee member), Khodaei, Amin (committee member), Mohsenian-Rad, Hamed (committee member).
Subjects/Keywords: Big data; Smart grids; Optimization
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Nguyen, H. K. (2017). Big Data Optimization for Distributed Resource Management in Smart Grid. (Doctoral Dissertation). University of Houston. Retrieved from http://hdl.handle.net/10657/1878
Chicago Manual of Style (16th Edition):
Nguyen, Hung Khanh. “Big Data Optimization for Distributed Resource Management in Smart Grid.” 2017. Doctoral Dissertation, University of Houston. Accessed March 01, 2021.
http://hdl.handle.net/10657/1878.
MLA Handbook (7th Edition):
Nguyen, Hung Khanh. “Big Data Optimization for Distributed Resource Management in Smart Grid.” 2017. Web. 01 Mar 2021.
Vancouver:
Nguyen HK. Big Data Optimization for Distributed Resource Management in Smart Grid. [Internet] [Doctoral dissertation]. University of Houston; 2017. [cited 2021 Mar 01].
Available from: http://hdl.handle.net/10657/1878.
Council of Science Editors:
Nguyen HK. Big Data Optimization for Distributed Resource Management in Smart Grid. [Doctoral Dissertation]. University of Houston; 2017. Available from: http://hdl.handle.net/10657/1878

University of Manitoba
28.
Chen, Yixuan.
Approaching “Big Data” in Biological Research Imaging Spectroscopy with Novel Compression.
Degree: Biosystems Engineering, 2014, University of Manitoba
URL: http://hdl.handle.net/1993/23434
► This research focuses on providing a fast and space efficient compression method to answer information queries on spectroscopic data. Our primary hypothesis was whether a…
(more)
▼ This research focuses on providing a fast and space efficient compression method to answer information queries on spectroscopic
data. Our primary hypothesis was whether a conversion from decimal
data to character/integer space could be done in a manner that enables use of succinct structures and provides good compression. This compression algorithm is motivated to handle queries on spectroscopic
data that approaches limits of main computer memory.
The primary hypothesis is supported in that the new compression method can save 79.20% - 94.07% computer space on the average. The average of maximum error rates is also acceptable, being 0.05% - 1.36% depending on the
subject that the
data was collected from. Additionally, the data’s compression rate and entropy are negatively correlated; while compression rate and maximum error were positively correlated when the max error rates were performed on a natural logarithm transformation. The effects of different types of
data sources on compression rate have been studied as well. Fungus datasets achieved highest compression rates, while mouse brain datasets obtained the lowest compression rates among four types of
data sources. Finally, the effect of the studied compression algorithm and method on integrating spectral bands has been investigated in this study. The spectral integration for determining lipid, CH2 and dense core plaque obtained good image quality and the errors can be considered inconsequential except the case of determining creatine deposits. Despite the fact that creatine deposits are still recognizable in the reconstructed image, the image quality was reduced.
Advisors/Committee Members: Morrison, Jason (Biosystems Engineering) (supervisor), Paliwal, Jitendra (Biosystems Engineering) Leung, Carson Kai-Sang (Computer Science) (examiningcommittee).
Subjects/Keywords: Image Compression; Big Data
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Chen, Y. (2014). Approaching “Big Data” in Biological Research Imaging Spectroscopy with Novel Compression. (Masters Thesis). University of Manitoba. Retrieved from http://hdl.handle.net/1993/23434
Chicago Manual of Style (16th Edition):
Chen, Yixuan. “Approaching “Big Data” in Biological Research Imaging Spectroscopy with Novel Compression.” 2014. Masters Thesis, University of Manitoba. Accessed March 01, 2021.
http://hdl.handle.net/1993/23434.
MLA Handbook (7th Edition):
Chen, Yixuan. “Approaching “Big Data” in Biological Research Imaging Spectroscopy with Novel Compression.” 2014. Web. 01 Mar 2021.
Vancouver:
Chen Y. Approaching “Big Data” in Biological Research Imaging Spectroscopy with Novel Compression. [Internet] [Masters thesis]. University of Manitoba; 2014. [cited 2021 Mar 01].
Available from: http://hdl.handle.net/1993/23434.
Council of Science Editors:
Chen Y. Approaching “Big Data” in Biological Research Imaging Spectroscopy with Novel Compression. [Masters Thesis]. University of Manitoba; 2014. Available from: http://hdl.handle.net/1993/23434
29.
Lindström, Frej.
Impact analysis of characteristics in product development : Change in product property with respect to component generations.
Degree: Mathematics and Mathematical Statistics, 2017, Umeå University
URL: http://urn.kb.se/resolve?urn=urn:nbn:se:umu:diva-136911
► Scania has developed a unique modular product system which is an important successfactor, creating exibility and lies at the heart of their business model.…
(more)
▼ Scania has developed a unique modular product system which is an important successfactor, creating exibility and lies at the heart of their business model. R&Duse product and vehicle product properties to describe the product key factors. Theseproduct properties are both used during the development of new features and products,and also utilized by the project oce to estimate the total contribution of a project.Scania want to develop a new method to understand and be able to track and comparethe projects eect over time and also predict future vehicle improvements. In this thesis, we investigate how to quantify the impact on vehicle product propertiesand predict component improvements, based on data sources that have not beenutilized for these purposes before. The impact objective is ultimately to increase the understandingof the development process of heavy vehicles and the aim for this projectwas to provide statistical methods that can be used for investigative and predictivepurposes. First, with analysis of variance we statistically veried and quantied differencesin a product property between comparable vehicle populations with respectto component generations. Then, Random Forest and Articial Neural Networks wereimplemented to predict future eect on product property with respect to componentimprovements. We could see a dierence of approximately 10 % between the comparablecomponents of interest, which was more than the expected dierence. Theexpectations are based on performance measurements from a test environment. Theimplemented Random Forest model was not able to predict future eect based on theseperformance measures. Articial Neural Networks was able to capture structures fromthe test environment and its predictive performance and reliability was, under the givencircumstances, relatively good.
Subjects/Keywords: Statistics; big data; Mathematics; Matematik
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Lindström, F. (2017). Impact analysis of characteristics in product development : Change in product property with respect to component generations. (Thesis). Umeå University. Retrieved from http://urn.kb.se/resolve?urn=urn:nbn:se:umu:diva-136911
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
Chicago Manual of Style (16th Edition):
Lindström, Frej. “Impact analysis of characteristics in product development : Change in product property with respect to component generations.” 2017. Thesis, Umeå University. Accessed March 01, 2021.
http://urn.kb.se/resolve?urn=urn:nbn:se:umu:diva-136911.
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
MLA Handbook (7th Edition):
Lindström, Frej. “Impact analysis of characteristics in product development : Change in product property with respect to component generations.” 2017. Web. 01 Mar 2021.
Vancouver:
Lindström F. Impact analysis of characteristics in product development : Change in product property with respect to component generations. [Internet] [Thesis]. Umeå University; 2017. [cited 2021 Mar 01].
Available from: http://urn.kb.se/resolve?urn=urn:nbn:se:umu:diva-136911.
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
Council of Science Editors:
Lindström F. Impact analysis of characteristics in product development : Change in product property with respect to component generations. [Thesis]. Umeå University; 2017. Available from: http://urn.kb.se/resolve?urn=urn:nbn:se:umu:diva-136911
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation

Hong Kong University of Science and Technology
30.
Wang, Lu.
Techniques and applications of random sampling on massive data.
Degree: 2015, Hong Kong University of Science and Technology
URL: http://repository.ust.hk/ir/Record/1783.1-78835
;
https://doi.org/10.14711/thesis-b1514598
;
http://repository.ust.hk/ir/bitstream/1783.1-78835/1/th_redirect.html
► Living in the era of big data, we often need to process and analyze data sets that have never been so large and fast-growing. Random…
(more)
▼ Living in the era of big data, we often need to process and analyze data sets that have never been so large and fast-growing. Random sampling has thus received much attention as an effective tool for turning big data “small”. It allows us to significantly reduce the size of input while maintaining the main features of the original data set we need. It is also easy to trade off between the computation complexity and the accuracy of the result, by tweaking the sample size. Although random sampling is a classical problem with a long history, it has received revived attention lately motivated by new applications as well as new constraints in the big data era. This thesis presents several new techniques and applications of random sampling: (1) a new randomized streaming algorithm for finding approximate quantiles in a data stream, which achieves the smallest space complexity of all such algorithms; (2) an augmented B-tree index that, for any given range query, returns a sampling-based summary containing the quantiles and heavy hitters of all tuples in the query range; (3) a sample-augmented R-tree that, given any range query, returns random samples from the query range in an online fashion. Apart from the description and analysis of each algorithm we propose, experimental results are also provided, confirming the advantages of the new algorithms. Finally, we showcase a system for large-scale spatio-temporal data analysis using the developed techniques.
Subjects/Keywords: Big data
; Sampling (Statistics)
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Wang, L. (2015). Techniques and applications of random sampling on massive data. (Thesis). Hong Kong University of Science and Technology. Retrieved from http://repository.ust.hk/ir/Record/1783.1-78835 ; https://doi.org/10.14711/thesis-b1514598 ; http://repository.ust.hk/ir/bitstream/1783.1-78835/1/th_redirect.html
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
Chicago Manual of Style (16th Edition):
Wang, Lu. “Techniques and applications of random sampling on massive data.” 2015. Thesis, Hong Kong University of Science and Technology. Accessed March 01, 2021.
http://repository.ust.hk/ir/Record/1783.1-78835 ; https://doi.org/10.14711/thesis-b1514598 ; http://repository.ust.hk/ir/bitstream/1783.1-78835/1/th_redirect.html.
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
MLA Handbook (7th Edition):
Wang, Lu. “Techniques and applications of random sampling on massive data.” 2015. Web. 01 Mar 2021.
Vancouver:
Wang L. Techniques and applications of random sampling on massive data. [Internet] [Thesis]. Hong Kong University of Science and Technology; 2015. [cited 2021 Mar 01].
Available from: http://repository.ust.hk/ir/Record/1783.1-78835 ; https://doi.org/10.14711/thesis-b1514598 ; http://repository.ust.hk/ir/bitstream/1783.1-78835/1/th_redirect.html.
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
Council of Science Editors:
Wang L. Techniques and applications of random sampling on massive data. [Thesis]. Hong Kong University of Science and Technology; 2015. Available from: http://repository.ust.hk/ir/Record/1783.1-78835 ; https://doi.org/10.14711/thesis-b1514598 ; http://repository.ust.hk/ir/bitstream/1783.1-78835/1/th_redirect.html
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
◁ [1] [2] [3] [4] [5] … [59] ▶
.