You searched for subject:(Big Data Computing)
.
Showing records 1 – 30 of
135 total matches.
◁ [1] [2] [3] [4] [5] ▶

University of Technology, Sydney
1.
Zhang, X.
Toward scalable and cost-effective privacy-preserving big data publishing in cloud computing.
Degree: 2014, University of Technology, Sydney
URL: http://hdl.handle.net/10453/30324
► Big data and cloud computing are two disruptive trends nowadays, provisioning numerous opportunities to current IT industry and research communities while posing significant challenges on…
(more)
▼ Big data and cloud computing are two disruptive trends nowadays, provisioning numerous opportunities to current IT industry and research communities while posing significant challenges on them as well. The massive increase in computing power and data storage capacity provisioned by the cloud and the advances in big data mining and analytics have expanded the scope of information available to businesses, government, and individuals by orders of magnitude. A major obstacle to the adoption of cloud computing in sectors such as health and business for big data analysis is the privacy risk associated with releasing data sets to third-parties in the cloud. The data sets in the sectors mentioned above often contain personal privacy-sensitive data, e.g., electronic health records and financial transaction records, while these data sets can offer significant economic and social benefits if analysed or mined by organizations such as disease research centres. Although some privacy issues are not new, the situation is aggravated due to the features of cloud computing like ubiquitous access and multi-tenancy, and the three V properties of big data, i.e., Volume, Velocity and Variety. Therefore, it is still a significant challenge to achieve privacy-preserving big data publishing in cloud computing. A widely-adopted technique for privacy-preserving data publishing with semantic correctness guarantees is to anonymise data via generalisation, and a bundle of anonymisation approaches have been proposed. However, most existing approaches are either inherently sequential or distributed without directly optimising scalability, thus rendering them unsuitable for data intensive applications and inapplicable to the state-of-the-art parallel and distributed paradigms like MapReduce.
In this thesis, we mainly investigate the problem of big data anonymisation for privacy preservation from the perspectives of scalability and cost-effectiveness. The cloud computing advantages including on-demand resource provisioning, rapid elasticity and pay-as-you-go fashion are exploited to address the problem, aiming at gaining high scalability and cost-effectiveness. Specifically, we examine three major phases in the lifecycle of privacy-preserving data publishing or sharing in cloud environments, including data anonymisation, anonymous data update and anonymous data management. Accordingly, a scalable and cost-effective privacy-preserving framework is proposed to provide a holistic conceptual foundation for privacy preservation over big data and enable users to accomplish the full potential of the high scalability, elasticity, and cost-effectiveness of the cloud. We develop a corresponding prototype system consisting of a series of solutions to the scalability issues that lie in the three phases based on MapReduce, the de facto standard for big data processing paradigm at present, for the sake of high scalability, cost-effectiveness and compatibility with other big data mining and analytical tools. In terms of extensive experiments on real-world data…
Subjects/Keywords: Cloud computing.; Big data.; Privacy.
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Zhang, X. (2014). Toward scalable and cost-effective privacy-preserving big data publishing in cloud computing. (Thesis). University of Technology, Sydney. Retrieved from http://hdl.handle.net/10453/30324
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
Chicago Manual of Style (16th Edition):
Zhang, X. “Toward scalable and cost-effective privacy-preserving big data publishing in cloud computing.” 2014. Thesis, University of Technology, Sydney. Accessed December 08, 2019.
http://hdl.handle.net/10453/30324.
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
MLA Handbook (7th Edition):
Zhang, X. “Toward scalable and cost-effective privacy-preserving big data publishing in cloud computing.” 2014. Web. 08 Dec 2019.
Vancouver:
Zhang X. Toward scalable and cost-effective privacy-preserving big data publishing in cloud computing. [Internet] [Thesis]. University of Technology, Sydney; 2014. [cited 2019 Dec 08].
Available from: http://hdl.handle.net/10453/30324.
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
Council of Science Editors:
Zhang X. Toward scalable and cost-effective privacy-preserving big data publishing in cloud computing. [Thesis]. University of Technology, Sydney; 2014. Available from: http://hdl.handle.net/10453/30324
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation

Australian National University
2.
Wang, Meisong.
Layered performance modelling and evaluation for cloud topic detection and tracking based big data applications
.
Degree: 2016, Australian National University
URL: http://hdl.handle.net/1885/107262
► “Big Data” best characterized by its three features namely “Variety”, “Volume” and “Velocity” is revolutionizing nearly every aspect of our lives ranging from enterprises to…
(more)
▼ “Big Data” best characterized by its three features namely
“Variety”, “Volume” and “Velocity” is revolutionizing
nearly every aspect of our lives ranging from enterprises to
consumers, from science to government. A fourth characteristic
namely “value” is delivered via the use of smart data
analytics over Big Data. One such Big Data Analytics application
considered in this thesis is Topic Detection and Tracking (TDT).
The characteristics of Big Data brings with it unprecedented
challenges such as too large for traditional devices to process
and store (volume), too fast for traditional methods to scale
(velocity), and heterogeneous data (variety). In recent times,
cloud computing has emerged as a practical and technical solution
for processing big data. However, while deploying Big data
analytics applications such as TDT in cloud (called cloud-based
TDT), the challenge is to cost-effectively orchestrate and
provision Cloud resources to meet performance Service Level
Agreements (SLAs). Although there exist limited work on
performance modeling of cloud-based TDT applications none of
these methods can be directly applied to guarantee the
performance SLA of cloud-based TDT applications. For instance,
current literature lacks a systematic, reliable and accurate
methodology to measure, predict and finally guarantee
performances of TDT applications. Furthermore, existing
performance models fail to consider the end-to-end complexity of
TDT applications and focus only on the individual processing
components (e.g. map reduce).
To tackle this challenge, in this thesis, we develop a layered
performance model of cloud-based TDT applications that take into
account big data characteristics, the data and event flow across
myriad cloud software and hardware resources and diverse SLA
considerations. In particular, we propose and develop models to
capture in detail with great accuracy, the factors having a
pivotal role in performances of cloud-based TDT applications and
identify ways in which these factors affect the performance and
determine the dependencies between the factors. Further, we have
developed models to predict the performance of cloud-based TDT
applications under uncertainty conditions imposed by Big Data
characteristics. The model developed in this thesis is aimed to
be generic allowing its application to other cloud-based data
analytics applications. We have demonstrated the feasibility,
efficiency, validity and prediction accuracy of the proposed
models via experimental evaluations using a real-world Flu
detection use-case on Apache Hadoop Map Reduce, HDFS and Mahout
Frameworks.
Subjects/Keywords: Cloud Computing;
IoT;
Big Data
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Wang, M. (2016). Layered performance modelling and evaluation for cloud topic detection and tracking based big data applications
. (Thesis). Australian National University. Retrieved from http://hdl.handle.net/1885/107262
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
Chicago Manual of Style (16th Edition):
Wang, Meisong. “Layered performance modelling and evaluation for cloud topic detection and tracking based big data applications
.” 2016. Thesis, Australian National University. Accessed December 08, 2019.
http://hdl.handle.net/1885/107262.
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
MLA Handbook (7th Edition):
Wang, Meisong. “Layered performance modelling and evaluation for cloud topic detection and tracking based big data applications
.” 2016. Web. 08 Dec 2019.
Vancouver:
Wang M. Layered performance modelling and evaluation for cloud topic detection and tracking based big data applications
. [Internet] [Thesis]. Australian National University; 2016. [cited 2019 Dec 08].
Available from: http://hdl.handle.net/1885/107262.
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
Council of Science Editors:
Wang M. Layered performance modelling and evaluation for cloud topic detection and tracking based big data applications
. [Thesis]. Australian National University; 2016. Available from: http://hdl.handle.net/1885/107262
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation

University of North Carolina – Greensboro
3.
Whitworth, Jeffrey N.
Applying hybrid cloud systems to solve challenges posed by
the big data problem.
Degree: 2013, University of North Carolina – Greensboro
URL: http://libres.uncg.edu/ir/listing.aspx?styp=ti&id=15603
► The problem of Big Data poses challenges to traditional compute systems used for Machine Learning (ML) techniques that extract, analyze and visualize important information. New…
(more)
▼ The problem of
Big Data poses challenges to
traditional compute systems used for Machine Learning (ML)
techniques that extract, analyze and visualize important
information. New and creative solutions for processing
data must be
explored in order to overcome hurdles imposed by
Big Data as the
amount of
data generation grows. These solutions include
introducing hybrid cloud systems to aid in the storage and
processing of
data. However, this introduces additional problems
relating to
data security as
data travels outside localized systems
to rely on public storage and processing resources. Current
research has relied primarily on
data classification as a mechanism
to address security concerns of
data traversing external resources.
This technique can be limited as it assumes
data is accurately
classified and that an appropriate amount of
data is cleared for
external use. Leveraging a flexible key store for
data encryption
can help overcome these possible limitations by treating all
data
the same and mitigating risk depending on the public provider. This
is shown by introducing a
Data Key Store (DKS) and public cloud
storage offering into a
Big Data analytics network topology.
Finding show that introducing the
Data Key Store into a
Big Data
analytics network topology successfully allows the topology to be
extended to handle the large amounts of
data associated with
Big
Data while preserving appropriate
data security. Introducing a
public cloud storage solution also provides additional benefits to
the
Big Data network topology by introducing intentional time delay
into
data processing, efficient use of system resources when
data
ebbs occur and extending traditional
data storage resiliency
techniques to
Big Data storage.;
Big Data, Cloud, Intrusion
Detection, Storage
Advisors/Committee Members: Shanmugathasan Suthaharan (advisor).
Subjects/Keywords: Big data; Cloud computing – Security measures
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Whitworth, J. N. (2013). Applying hybrid cloud systems to solve challenges posed by
the big data problem. (Masters Thesis). University of North Carolina – Greensboro. Retrieved from http://libres.uncg.edu/ir/listing.aspx?styp=ti&id=15603
Chicago Manual of Style (16th Edition):
Whitworth, Jeffrey N. “Applying hybrid cloud systems to solve challenges posed by
the big data problem.” 2013. Masters Thesis, University of North Carolina – Greensboro. Accessed December 08, 2019.
http://libres.uncg.edu/ir/listing.aspx?styp=ti&id=15603.
MLA Handbook (7th Edition):
Whitworth, Jeffrey N. “Applying hybrid cloud systems to solve challenges posed by
the big data problem.” 2013. Web. 08 Dec 2019.
Vancouver:
Whitworth JN. Applying hybrid cloud systems to solve challenges posed by
the big data problem. [Internet] [Masters thesis]. University of North Carolina – Greensboro; 2013. [cited 2019 Dec 08].
Available from: http://libres.uncg.edu/ir/listing.aspx?styp=ti&id=15603.
Council of Science Editors:
Whitworth JN. Applying hybrid cloud systems to solve challenges posed by
the big data problem. [Masters Thesis]. University of North Carolina – Greensboro; 2013. Available from: http://libres.uncg.edu/ir/listing.aspx?styp=ti&id=15603

Vanderbilt University
4.
Tapdiya, Ashish.
Large Scale Data Management for Enterprise Workloads.
Degree: PhD, Computer Science, 2018, Vanderbilt University
URL: http://etd.library.vanderbilt.edu//available/etd-03262018-133402/
;
► The continual proliferation of mobile devices, social media platforms, gaming consoles, etc., combined with the ever-increasing online user population has resulted in a data deluge.…
(more)
▼ The continual proliferation of mobile devices, social media platforms, gaming consoles, etc., combined with the ever-increasing online user population has resulted in a
data deluge. The traditional
data management solutions are inadequate to meet the storage and processing challenges posed by this large and complex
data. Although, several large scale
data management solutions have been proposed recently,
data architects still face several challenges in scaling proposed systems for enterprise workloads. To this end, we propose novel mechanisms to enable scalable
data management for each enterprise workload class.
For online transaction processing workloads, we have developed mechanisms for scalable transaction processing in relational and distributed databases. For relational databases, we present a simple mechanism to enable robust performance profiling of cloud hosted databases. For distributed databases, we present the design of Synergy system that leverages materialized views and a lightweight concurrency control on top of a NoSQL database to provide for scalable
data management with familiar relational conventions and more robust query expressiveness.
For online analytical processing workloads, we empirically evaluate SQL-on-Hadoop and SQL-on-Object-Storage systems and illustrate their performance characteristics. For SQL-on-Hadoop systems, we demonstrate the size-up behavior, scale-up behavior, optimizer attributes, execution engine efficiency and impact of file formats. For SQL-on-Object-Storage systems, we implement and evaluate the performance impact of block range indexes.
The knowledge gained from this thesis will enable
data architects to address the scaling challenges posed by enterprise workloads.
Advisors/Committee Members: Dr. Daniel Fabbri (chair), Dr. Bradley Malin (committee member), Dr. William French (committee member), Dr. Jules White (committee member), Dr. Yuan Xue (committee member).
Subjects/Keywords: Cloud Computing; Big Data; Databases; Performance Evaluation
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Tapdiya, A. (2018). Large Scale Data Management for Enterprise Workloads. (Doctoral Dissertation). Vanderbilt University. Retrieved from http://etd.library.vanderbilt.edu//available/etd-03262018-133402/ ;
Chicago Manual of Style (16th Edition):
Tapdiya, Ashish. “Large Scale Data Management for Enterprise Workloads.” 2018. Doctoral Dissertation, Vanderbilt University. Accessed December 08, 2019.
http://etd.library.vanderbilt.edu//available/etd-03262018-133402/ ;.
MLA Handbook (7th Edition):
Tapdiya, Ashish. “Large Scale Data Management for Enterprise Workloads.” 2018. Web. 08 Dec 2019.
Vancouver:
Tapdiya A. Large Scale Data Management for Enterprise Workloads. [Internet] [Doctoral dissertation]. Vanderbilt University; 2018. [cited 2019 Dec 08].
Available from: http://etd.library.vanderbilt.edu//available/etd-03262018-133402/ ;.
Council of Science Editors:
Tapdiya A. Large Scale Data Management for Enterprise Workloads. [Doctoral Dissertation]. Vanderbilt University; 2018. Available from: http://etd.library.vanderbilt.edu//available/etd-03262018-133402/ ;

University of Bridgeport
5.
Alshammari, Hamoud H.
Improving Hadoop Performance by Using Metadata of Related Jobs in Text Datasets Via Enhancing MapReduce Workflow
.
Degree: 2016, University of Bridgeport
URL: https://scholarworks.bridgeport.edu/xmlui/handle/123456789/1660
► Cloud Computing provides different services to the users with regard to processing data. One of the main concepts in Cloud Computing is BigData and BigData…
(more)
▼ Cloud Computing provides different services to the users with regard to processing data. One of the main concepts in Cloud Computing is BigData and BigData analysis. BigData is a complex, un-structured or very large size of data. Hadoop is a tool or an environment that is used to process BigData in parallel processing mode. The idea behind Hadoop is, rather than send data to the servers to process. Hadoop divides a job into small tasks and sends them to servers. These servers contain data, process the tasks and send the results back to the master node in Hadoop. Hadoop contains some limitations that could be developed to have a higher performance in executing jobs. These limitations are mostly because of data locality in the cluster, jobs and tasks scheduling, CPU execution time, or resource allocations in Hadoop. Data locality and efficient resource allocation remains a challenge in cloud computing MapReduce platform. We propose an enhanced Hadoop architecture that reduces the computation cost associated with BigData analysis. At the same time, the proposed architecture addresses the issue of resource allocation in native Hadoop. The proposed architecture provides an efficient distributed clustering approach for dedicated cloud computing environments. Enhanced Hadoop architecture leverages on NameNode’s ability to assign jobs to the TaskTrakers (DataNodes) within the cluster. By adding controlling features to the NameNode, it can intelligently direct and assign tasks to the DataNodes that contain the required data. Our focus is on extracting features and building a metadata table that carries information about the existence and the location of the data blocks in the cluster. This enables NameNode to direct the jobs to specific DataNodes without going through the whole data sets in the cluster. It should be noted that newly build lookup table is an addition to the metadata table that already exists in the native Hadoop. Our development is about processing real text in text data sets that might be readable such as books, or not readable such as DNA data sets. To test the performance of proposed architecture, we perform DNA sequence matching and alignment of various short genome sequences. Comparing with native Hadoop, proposed Hadoop reduced CPU time, number of read operations, input data size, and another different factors.
Subjects/Keywords: Big data;
Cloud computing;
Hadoop;
MapReduce
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Alshammari, H. H. (2016). Improving Hadoop Performance by Using Metadata of Related Jobs in Text Datasets Via Enhancing MapReduce Workflow
. (Thesis). University of Bridgeport. Retrieved from https://scholarworks.bridgeport.edu/xmlui/handle/123456789/1660
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
Chicago Manual of Style (16th Edition):
Alshammari, Hamoud H. “Improving Hadoop Performance by Using Metadata of Related Jobs in Text Datasets Via Enhancing MapReduce Workflow
.” 2016. Thesis, University of Bridgeport. Accessed December 08, 2019.
https://scholarworks.bridgeport.edu/xmlui/handle/123456789/1660.
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
MLA Handbook (7th Edition):
Alshammari, Hamoud H. “Improving Hadoop Performance by Using Metadata of Related Jobs in Text Datasets Via Enhancing MapReduce Workflow
.” 2016. Web. 08 Dec 2019.
Vancouver:
Alshammari HH. Improving Hadoop Performance by Using Metadata of Related Jobs in Text Datasets Via Enhancing MapReduce Workflow
. [Internet] [Thesis]. University of Bridgeport; 2016. [cited 2019 Dec 08].
Available from: https://scholarworks.bridgeport.edu/xmlui/handle/123456789/1660.
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
Council of Science Editors:
Alshammari HH. Improving Hadoop Performance by Using Metadata of Related Jobs in Text Datasets Via Enhancing MapReduce Workflow
. [Thesis]. University of Bridgeport; 2016. Available from: https://scholarworks.bridgeport.edu/xmlui/handle/123456789/1660
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation

Purdue University
6.
Kambatla, Karthik Shashank.
Methods to Improve Applicability and Efficiency of Distributed Data-Centric Compute Frameworks.
Degree: PhD, Computer Science, 2016, Purdue University
URL: https://docs.lib.purdue.edu/open_access_dissertations/1379
► The success of modern applications depends on the insights they collect from their data repositories. Data repositories for such applications currently exceed exabytes and are…
(more)
▼ The success of modern applications depends on the insights they collect from their
data repositories.
Data repositories for such applications currently exceed exabytes and are rapidly increasing in size, as they collect
data from varied sources - web applications, mobile phones, sensors and other connected devices. Distributed storage and
data-centric compute frameworks have been invented to store and analyze these large datasets. This dissertation focuses on extending the applicability and improving the efficiency of distributed
data-centric compute frameworks.
Advisors/Committee Members: Ananth Y Grama, Dongyan Xu, Sonia Fahmy, Mathias Payer, Aniket Kate.
Subjects/Keywords: big data; distributed computing; distributed systems
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Kambatla, K. S. (2016). Methods to Improve Applicability and Efficiency of Distributed Data-Centric Compute Frameworks. (Doctoral Dissertation). Purdue University. Retrieved from https://docs.lib.purdue.edu/open_access_dissertations/1379
Chicago Manual of Style (16th Edition):
Kambatla, Karthik Shashank. “Methods to Improve Applicability and Efficiency of Distributed Data-Centric Compute Frameworks.” 2016. Doctoral Dissertation, Purdue University. Accessed December 08, 2019.
https://docs.lib.purdue.edu/open_access_dissertations/1379.
MLA Handbook (7th Edition):
Kambatla, Karthik Shashank. “Methods to Improve Applicability and Efficiency of Distributed Data-Centric Compute Frameworks.” 2016. Web. 08 Dec 2019.
Vancouver:
Kambatla KS. Methods to Improve Applicability and Efficiency of Distributed Data-Centric Compute Frameworks. [Internet] [Doctoral dissertation]. Purdue University; 2016. [cited 2019 Dec 08].
Available from: https://docs.lib.purdue.edu/open_access_dissertations/1379.
Council of Science Editors:
Kambatla KS. Methods to Improve Applicability and Efficiency of Distributed Data-Centric Compute Frameworks. [Doctoral Dissertation]. Purdue University; 2016. Available from: https://docs.lib.purdue.edu/open_access_dissertations/1379
7.
Zareian, Saeed.
Toward Autonomic Data-Oriented Scalability in Cloud Computing Environments.
Degree: MA -MA, Information Systems and Technology, 2016, York University
URL: http://hdl.handle.net/10315/32145
► The applications deployed in modern data centers are highly diverse in terms of architecture and performance needs. It is a challenge to provide consistent services…
(more)
▼ The applications deployed in modern
data centers are highly diverse in terms of architecture and performance needs. It is a challenge to provide consistent services to all applications in a shared environment. This thesis proposes a generic analytical engine that can optimize the use of cloud-based resources according to service needs in an autonomic manner. The proposed system is capable of ingesting large amounts of
data generated by various monitoring services within
data centers. Then, by transforming that
data into actionable knowledge, the system can make the necessary decisions to maintain a desired level of quality of service. The contributions of this work are the following: First, we define a scalable architecture to collect the metrics and store the
data. Second, we design and implement a process for building prediction models that characterize application performance using
data mining and statistical techniques. Lastly, we evaluate the accuracy of the prediction models.
Advisors/Committee Members: Litoiu, Marin (advisor).
Subjects/Keywords: Computer science; Cloud Computing; Big Data
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Zareian, S. (2016). Toward Autonomic Data-Oriented Scalability in Cloud Computing Environments. (Masters Thesis). York University. Retrieved from http://hdl.handle.net/10315/32145
Chicago Manual of Style (16th Edition):
Zareian, Saeed. “Toward Autonomic Data-Oriented Scalability in Cloud Computing Environments.” 2016. Masters Thesis, York University. Accessed December 08, 2019.
http://hdl.handle.net/10315/32145.
MLA Handbook (7th Edition):
Zareian, Saeed. “Toward Autonomic Data-Oriented Scalability in Cloud Computing Environments.” 2016. Web. 08 Dec 2019.
Vancouver:
Zareian S. Toward Autonomic Data-Oriented Scalability in Cloud Computing Environments. [Internet] [Masters thesis]. York University; 2016. [cited 2019 Dec 08].
Available from: http://hdl.handle.net/10315/32145.
Council of Science Editors:
Zareian S. Toward Autonomic Data-Oriented Scalability in Cloud Computing Environments. [Masters Thesis]. York University; 2016. Available from: http://hdl.handle.net/10315/32145

University of New South Wales
8.
Wu, Dongyao.
Big Data Processing on Arbitrarily Distributed Dataset.
Degree: Computer Science & Engineering, 2017, University of New South Wales
URL: http://handle.unsw.edu.au/1959.4/57957
;
https://unsworks.unsw.edu.au/fapi/datastream/unsworks:45191/SOURCE02?view=true
► Over the past years, frameworks such as MapReduce and Spark have been introduced to ease the task of developing big data programsand applications. These frameworks…
(more)
▼ Over the past years, frameworks such as MapReduce and Spark have been introduced to ease the task of developing
big data programsand applications. These frameworks significantly reduce the complexity of developing
big data programs and applications. However, inreality, many real-world scenarios require pipelining and integration of multiple
big data jobs. As the
big data pipelines and applicationsbecome more and more complicated, it is almost impossible to manually optimize the performance for each component not to mentionthe whole pipeline/application. At the same time, there are also increasing requirements to facilitate interaction, composition andintegration for
big data analytics applications in continuously evolving, integrating and delivering scenarios. In addition, with theemergence and development of cloud
computing, mobile
computing and the Internet of Things,
data are increasingly collected andstored in highly distributed infrastructures (e.g. across
data centres, clusters, racks and nodes).To deal with the challenges above and fill the gap in existing
big data processing frameworks, we present the Hierarchically DistributedData Matrix (HDM) along with the system implementation to support the writing and execution of composable and integrable
big dataapplications. HDM is a light-weight, functional and strongly-typed meta-
data abstraction which contains complete information (such asdata format, locations, dependencies and functions between input and output) to support parallel execution of
data-driven applications.Exploiting the functional nature of HDM enables deployed applications of HDM to be natively integrable and reusable by other programsand applications. In addition, by analysing the execution graph and functional semantics of HDMs, multiple automated optimizations areprovided to improve the execution performance of HDM
data flows. Moreover, by extending the kernel of HDM, we propose a multi-cluster solution which enables HDM to support large scale
data analytics among multi-cluster scenarios. Drawing on the comprehensiveinformation maintained by HDM graphs, the runtime execution engine of HDM is also able to provide provenance and historymanagement for submitted applications. We conduct comprehensive experiments to evaluate our solution compared with the currentstate-of-the-art
big data processing framework – Apache Spark.
Advisors/Committee Members: Zhu, Liming, Computer Science & Engineering, Faculty of Engineering, UNSW, Sakr, Sherif, Computer Science & Engineering, Faculty of Engineering, UNSW, Paik, Helen, Computer Science & Engineering, Faculty of Engineering, UNSW, Liu, Anna, Amazon, Australia.
Subjects/Keywords: Distributed Systems; Big Data; Data Analytics; Data Provenance; Cloud Computing
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Wu, D. (2017). Big Data Processing on Arbitrarily Distributed Dataset. (Doctoral Dissertation). University of New South Wales. Retrieved from http://handle.unsw.edu.au/1959.4/57957 ; https://unsworks.unsw.edu.au/fapi/datastream/unsworks:45191/SOURCE02?view=true
Chicago Manual of Style (16th Edition):
Wu, Dongyao. “Big Data Processing on Arbitrarily Distributed Dataset.” 2017. Doctoral Dissertation, University of New South Wales. Accessed December 08, 2019.
http://handle.unsw.edu.au/1959.4/57957 ; https://unsworks.unsw.edu.au/fapi/datastream/unsworks:45191/SOURCE02?view=true.
MLA Handbook (7th Edition):
Wu, Dongyao. “Big Data Processing on Arbitrarily Distributed Dataset.” 2017. Web. 08 Dec 2019.
Vancouver:
Wu D. Big Data Processing on Arbitrarily Distributed Dataset. [Internet] [Doctoral dissertation]. University of New South Wales; 2017. [cited 2019 Dec 08].
Available from: http://handle.unsw.edu.au/1959.4/57957 ; https://unsworks.unsw.edu.au/fapi/datastream/unsworks:45191/SOURCE02?view=true.
Council of Science Editors:
Wu D. Big Data Processing on Arbitrarily Distributed Dataset. [Doctoral Dissertation]. University of New South Wales; 2017. Available from: http://handle.unsw.edu.au/1959.4/57957 ; https://unsworks.unsw.edu.au/fapi/datastream/unsworks:45191/SOURCE02?view=true

NSYSU
9.
Guo, Bao-Ren.
Implementation of MPI Cloud Computing Platform Build upon System Kernel Environment.
Degree: Master, Electrical Engineering, 2016, NSYSU
URL: http://etd.lib.nsysu.edu.tw/ETD-db/ETD-search/view_etd?URN=etd-0721116-203822
► With the age of Big Data coming, the three defining characteristics of Big Data – Volume, variety and Velocity, make Cloud Computing facing new challenges. In…
(more)
▼ With the age of
Big Data coming, the three defining characteristics of
Big Data – Volume, variety and Velocity, make Cloud
Computing facing new challenges. In response to the demand of
Big Data analytics,Using distributed
computing cluster to process vast amounts of
data is a megatrend .In this paper ,as the outset,we discuss the performance of distributed
computing clusters provided by the current cloud
computing platforms. Found that for the Message-Passing Interface (MPI) cluster, which is used in Scientific
Computing, such as astronomical, atmosphere, physical spectrum...etc., cloud
computing platforms provide less relevant integration services. Which made MPI Cluster make inefficient use of cloud
computing resources and unable to exert its high
computing performance. Following, we propose The MPI Cluster
Architecture makes efficient use of cloud
computing resources. To break the limitation of constructing traditional MPI Clusters
computing environment, The MPI Cluster uses Socket Interface-TCP/IP Server & Client to build Communication System as the message-passing channel, and simplifies Operating System
computing resources management mechanism by Group Manager. As the separated way described above,The MPI Cluster can resiliently grasp Operating System
computing resources ,and work well on the cloud
computing platform,that
computing resources are virtualization and flexible. In order to make The MPI Cluster flexible dispatch
computing resources on the cloud
computing platform more easily, we adopt Kernel Distributed
Computing Management (KDCM), proposed by Chiu and Huang. By its ability to unify manage and allocate
computing resources, provides The MPI Cluster a path to flexible dispatch
computing resources. As the goal of The MPI Cluster: Running in Linux kernel driver, and loading on KDCM, We name it MPI Kernel Cluster (MPIKC). As MPIKC and KDCM fit tightly ,they can efficient dispatch
computing resources,exert its high
computing performance,and provide Operating System is cloud
computing environment on cloud
computing platform. At the end, We verify the correctness of running MPIKC on KDCM, and use MPIKC to do distributed
computing with high load, in the result of Sum of Absolute Difference and K-means clustering, each
computing unit used, enhances a 0.5~1 times performance.We proved that MPIKC fits well with cloud
computing platform, and exerts its high performance as the advantage of distributed
computing cluster.
Keywordsï¼Cloud
Computing, Distributed
Computing Cluster,
Big Data, MPI, Kernel Driver
Advisors/Committee Members: Kai-Ming Yang (chair), Jih-Ching Chiu (committee member), Shiann-Rong Kuang (chair), Tong-Yu Hsieh (chair).
Subjects/Keywords: MPI; Big Data; Kernel Driver; Distributed Computing Cluster; Cloud Computing
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Guo, B. (2016). Implementation of MPI Cloud Computing Platform Build upon System Kernel Environment. (Thesis). NSYSU. Retrieved from http://etd.lib.nsysu.edu.tw/ETD-db/ETD-search/view_etd?URN=etd-0721116-203822
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
Chicago Manual of Style (16th Edition):
Guo, Bao-Ren. “Implementation of MPI Cloud Computing Platform Build upon System Kernel Environment.” 2016. Thesis, NSYSU. Accessed December 08, 2019.
http://etd.lib.nsysu.edu.tw/ETD-db/ETD-search/view_etd?URN=etd-0721116-203822.
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
MLA Handbook (7th Edition):
Guo, Bao-Ren. “Implementation of MPI Cloud Computing Platform Build upon System Kernel Environment.” 2016. Web. 08 Dec 2019.
Vancouver:
Guo B. Implementation of MPI Cloud Computing Platform Build upon System Kernel Environment. [Internet] [Thesis]. NSYSU; 2016. [cited 2019 Dec 08].
Available from: http://etd.lib.nsysu.edu.tw/ETD-db/ETD-search/view_etd?URN=etd-0721116-203822.
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
Council of Science Editors:
Guo B. Implementation of MPI Cloud Computing Platform Build upon System Kernel Environment. [Thesis]. NSYSU; 2016. Available from: http://etd.lib.nsysu.edu.tw/ETD-db/ETD-search/view_etd?URN=etd-0721116-203822
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation

The Ohio State University
10.
Bicer, Tekin.
Supporting Data-Intensive Scientic Computing on Bandwidth
and Space Constrained Environments.
Degree: PhD, Computer Science and Engineering, 2014, The Ohio State University
URL: http://rave.ohiolink.edu/etdc/view?acc_num=osu1397749544
► Scientific applications, simulations and instruments generate massive amount of data. This data does not only contribute to the already existing scientific areas, but it also…
(more)
▼ Scientific applications, simulations and instruments
generate massive amount of
data. This
data does not only contribute
to the already existing scientific areas, but it also leads to new
sciences. However, management of this large-scale
data and its
analysis are both challenging processes. In this context, we
require tools, methods and technologies such as reduction-based
processing structures, cloud
computing and storage, and efficient
parallel compression methods.In this dissertation, we first focus
on parallel and scalable processing of
data stored in S3, a cloud
storage resource, using compute instances in Amazon Web Services
(AWS). We develop MATE-EC2 which allows specification of
data
processing using a variant of Map-Reduce paradigm. We show various
optimizations, including
data organization, job scheduling, and
data retrieval strategies, that can be leveraged based on the
performance characteristics of cloud storage resources.
Furthermore, we investigate the efficiency of our middleware in
both homogeneous and heterogeneous environments. Next, we improve
our middleware so that users can perform transparent processing on
data that is distributed among local and cloud resources. With this
work, we maximize the utilization of geographically distributed
resources. We evaluate our system's overhead, scalability, and
performance with varying
data distributions.The users of
data-intensive applications have different requirements on hybrid
cloud settings. Two of the most important ones are execution time
of the application and resulting cost on the cloud. Our third
contribution is providing a time and cost model for
data-intensive
applications that run on hybrid cloud environments. The proposed
model lets our middleware adapt performance changes and dynamically
allocate necessary resources from its environments. Therefore,
applications can meet user specified constraints. Fourth, we
investigate compression approaches for scientific datasets and
build a compression system. The proposed system focuses on
implementation and application of domain specific compression
algorithms. We port our compression system into aforementioned
middleware and implement different compression algorithms. Our
framework enables our middleware to maximize bandwidth utilization
of
data-intensive applications while minimizing storage
requirements.Although, compression can help us to minimize input
and output overhead of
data-intensive applications, utilization of
compression during parallel operations is not trivial.
Specifically, unable to determine compressed
data chunk sizes in
advance complicates the parallel write operations. In our final
work, we develop different methods for enabling compression during
parallel input and output operations. Then, we port our proposed
methods into PnetCDF, a widely used scientific
data management
library, and show how transparent compression can be supported
during parallel output operations. The proposed system lets an
existing parallel simulation program start outputting and storing
data in a…
Advisors/Committee Members: Agrawal, Gagan (Advisor).
Subjects/Keywords: Computer Science; Data-Intensive Computing; Map-Reduce; Cloud Computing; Big Data; Scientific Data Management; Compression
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Bicer, T. (2014). Supporting Data-Intensive Scientic Computing on Bandwidth
and Space Constrained Environments. (Doctoral Dissertation). The Ohio State University. Retrieved from http://rave.ohiolink.edu/etdc/view?acc_num=osu1397749544
Chicago Manual of Style (16th Edition):
Bicer, Tekin. “Supporting Data-Intensive Scientic Computing on Bandwidth
and Space Constrained Environments.” 2014. Doctoral Dissertation, The Ohio State University. Accessed December 08, 2019.
http://rave.ohiolink.edu/etdc/view?acc_num=osu1397749544.
MLA Handbook (7th Edition):
Bicer, Tekin. “Supporting Data-Intensive Scientic Computing on Bandwidth
and Space Constrained Environments.” 2014. Web. 08 Dec 2019.
Vancouver:
Bicer T. Supporting Data-Intensive Scientic Computing on Bandwidth
and Space Constrained Environments. [Internet] [Doctoral dissertation]. The Ohio State University; 2014. [cited 2019 Dec 08].
Available from: http://rave.ohiolink.edu/etdc/view?acc_num=osu1397749544.
Council of Science Editors:
Bicer T. Supporting Data-Intensive Scientic Computing on Bandwidth
and Space Constrained Environments. [Doctoral Dissertation]. The Ohio State University; 2014. Available from: http://rave.ohiolink.edu/etdc/view?acc_num=osu1397749544

Temple University
11.
Huang, Xueli.
Achieving Data Privacy and Security in Cloud.
Degree: PhD, 2016, Temple University
URL: http://digital.library.temple.edu/u?/p245801coll10,372805
► Computer and Information Science
The growing concerns in term of the privacy of data stored in public cloud have restrained the widespread adoption of cloud…
(more)
▼ Computer and Information Science
The growing concerns in term of the privacy of data stored in public cloud have restrained the widespread adoption of cloud computing. The traditional method to protect the data privacy is to encrypt data before they are sent to public cloud, but heavy computation is always introduced by this approach, especially for the image and video data, which has much more amount of data than text data. Another way is to take advantage of hybrid cloud by separating the sensitive data from non-sensitive data and storing them in trusted private cloud and un-trusted public cloud respectively. But if we adopt the method directly, all the images and videos containing sensitive data have to be stored in private cloud, which makes this method meaningless. Moreover, the emergence of the Software-Defined Networking (SDN) paradigm, which decouples the control logic from the closed and proprietary implementations of traditional network devices, enables researchers and practitioners to design new innovative network functions and protocols in a much easier, flexible, and more powerful way. The data plane will ask the control plane to update flow rules when the data plane gets new network packets with which it does not know how to deal with, and the control plane will then dynamically deploy and configure flow rules according to the data plane's requests, which makes the whole network could be managed and controlled efficiently. However, this kind of reactive control model could be used by hackers launching Distributed Denial-of-Service (DDoS) attacks by sending large amount of new requests from the data plane to the control plane. For image data, we divide the image is into pieces with equal size to speed up the encryption process, and propose two kinds of method to cut the relationship between the edges. One is to add random noise in each piece, the other is to design a one-to-one mapping function for each piece to map different pixel value into different another one, which cuts off the relationship between pixels as well the edges. Our mapping function is given with a random parameter as inputs to make each piece could randomly choose different mapping. Finally, we shuffle the pieces with another random parameter, which makes the problems recovering the shuffled image to be NP-complete. For video data, we propose two different methods separately for intra frame, I-frame, and inter frame, P-frame, based on their different characteristic. A hybrid selective video encryption scheme for H.264/AVC based on Advanced Encryption Standard (AES) and video data themselves is proposed for I-frame. For each P-slice of P-frame, we only abstract small part of them in private cloud based on the characteristic of intra prediction mode, which efficiently prevents P-frame being decoded. For cloud running with SDN, we propose a framework to keep the controller away from DDoS attack. We first predict the amount of new requests for each switch periodically based on its previous information, and the new requests will be sent…
Advisors/Committee Members: Du, Xiaojiang;, Ling, Haibin, Guo, Yuhong, Won, Chang-Hee;.
Subjects/Keywords: Computer science;
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Huang, X. (2016). Achieving Data Privacy and Security in Cloud. (Doctoral Dissertation). Temple University. Retrieved from http://digital.library.temple.edu/u?/p245801coll10,372805
Chicago Manual of Style (16th Edition):
Huang, Xueli. “Achieving Data Privacy and Security in Cloud.” 2016. Doctoral Dissertation, Temple University. Accessed December 08, 2019.
http://digital.library.temple.edu/u?/p245801coll10,372805.
MLA Handbook (7th Edition):
Huang, Xueli. “Achieving Data Privacy and Security in Cloud.” 2016. Web. 08 Dec 2019.
Vancouver:
Huang X. Achieving Data Privacy and Security in Cloud. [Internet] [Doctoral dissertation]. Temple University; 2016. [cited 2019 Dec 08].
Available from: http://digital.library.temple.edu/u?/p245801coll10,372805.
Council of Science Editors:
Huang X. Achieving Data Privacy and Security in Cloud. [Doctoral Dissertation]. Temple University; 2016. Available from: http://digital.library.temple.edu/u?/p245801coll10,372805

Georgia Tech
12.
Zhou, Yang.
Innovative mining, processing, and application of big graphs.
Degree: PhD, Computer Science, 2017, Georgia Tech
URL: http://hdl.handle.net/1853/59173
► With continued advances in science and technology, big graph (or network) data, such as World Wide Web, social networks, academic collaboration networks, transportation networks, telecommunication…
(more)
▼ With continued advances in science and technology,
big graph (or network)
data, such as World Wide Web, social networks, academic collaboration networks, transportation networks, telecommunication networks, biological networks, and electrical networks, have grown at an astonishing rate in terms of volume, variety, and velocity. Analyzing such
big graph
data has huge potential to reveal hidden insights and promote innovation in business, science, and engineering domains. However, there exist a number of challenging bottlenecks in developing advanced graph analytics tools in the
Big Data era. This dissertation research focus on bridging graph mining and graph processing techniques to alleviate such bottlenecks in terms of both effectiveness and efficiency. This dissertation had made original contributions on exploring, understanding, and learning
big graph
data in graph mining, processing and application: First, we have developed a suite of novel graph mining algorithms to analyze real-world heterogeneous information networks. Our algorithmic approaches enable new ways to dive into the correlation structure of
big graphs to derive new insights about how heterogeneous entities interact with one another and influence the effectiveness and efficiency of graph clustering, graph classification and graph ranking. Second, we have developed a scalable graph parallel processing framework by exploring parallel processing optimizations at both access tier and computation tier. We have designed a suite of hierarchically composable graph parallel abstractions to enable large-scale graphs to be processed efficiently for iterative graph computation applications. Our approach enables computer hardware resource aware graph partitioning such that parallel graph processing workloads can be well balanced in the presence of highly irregular graph structures and the mismatch of graph access and computation workloads. Third but not the least, we have developed innovative domain specific graph analytics frameworks to understand the hidden patterns in enterprise storage systems and to derive the interesting correlations among various enterprise web services. These novel graph algorithms and frameworks provide broader and deeper insights for better understanding of tradeoffs in enterprise system design and implementation.
Advisors/Committee Members: Liu, Ling (advisor), Lofstead, Jay (committee member), Navathe, Shamkant (committee member), Pu, Calton (committee member), Ramaswamy, Lakshmish (committee member).
Subjects/Keywords: Big data; Data mining; Parallel and distributed computing; Machine learning; Databases
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Zhou, Y. (2017). Innovative mining, processing, and application of big graphs. (Doctoral Dissertation). Georgia Tech. Retrieved from http://hdl.handle.net/1853/59173
Chicago Manual of Style (16th Edition):
Zhou, Yang. “Innovative mining, processing, and application of big graphs.” 2017. Doctoral Dissertation, Georgia Tech. Accessed December 08, 2019.
http://hdl.handle.net/1853/59173.
MLA Handbook (7th Edition):
Zhou, Yang. “Innovative mining, processing, and application of big graphs.” 2017. Web. 08 Dec 2019.
Vancouver:
Zhou Y. Innovative mining, processing, and application of big graphs. [Internet] [Doctoral dissertation]. Georgia Tech; 2017. [cited 2019 Dec 08].
Available from: http://hdl.handle.net/1853/59173.
Council of Science Editors:
Zhou Y. Innovative mining, processing, and application of big graphs. [Doctoral Dissertation]. Georgia Tech; 2017. Available from: http://hdl.handle.net/1853/59173

Rhodes University
13.
Sweeney, Michael John.
A framework for scoring and tagging NetFlow data.
Degree: Faculty of Science, Computer Science, 2019, Rhodes University
URL: http://hdl.handle.net/10962/65022
► With the increase in link speeds and the growth of the Internet, the volume of NetFlow data generated has increased significantly over time and processing…
(more)
▼ With the increase in link speeds and the growth of the Internet, the volume of NetFlow data generated has increased significantly over time and processing these volumes has become a challenge, more specifically a Big Data challenge. With the advent of technologies and architectures designed to handle Big Data volumes, researchers have investigated their application to the processing of NetFlow data. This work builds on prior work wherein a scoring methodology was proposed for identifying anomalies in NetFlow by proposing and implementing a system that allows for automatic, real-time scoring through the adoption of Big Data stream processing architectures. The first part of the research looks at the means of event detection using the scoring approach and implementing as a number of individual, standalone components, each responsible for detecting and scoring a single type of flow trait. The second part is the implementation of these scoring components in a framework, named Themis1, capable of handling high volumes of data with low latency processing times. This was tackled using tools, technologies and architectural elements from the world of Big Data stream processing. The performance of the framework on the stream processing architecture was shown to demonstrate good flow throughput at low processing latencies on a single low end host. The successful demonstration of the framework on a single host opens the way to leverage the scaling capabilities afforded by the architectures and technologies used. This gives weight to the possibility of using this framework for real time threat detection using NetFlow data from larger networked environments.
Subjects/Keywords: Big data; Electronic data processing; High performance computing
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Sweeney, M. J. (2019). A framework for scoring and tagging NetFlow data. (Thesis). Rhodes University. Retrieved from http://hdl.handle.net/10962/65022
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
Chicago Manual of Style (16th Edition):
Sweeney, Michael John. “A framework for scoring and tagging NetFlow data.” 2019. Thesis, Rhodes University. Accessed December 08, 2019.
http://hdl.handle.net/10962/65022.
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
MLA Handbook (7th Edition):
Sweeney, Michael John. “A framework for scoring and tagging NetFlow data.” 2019. Web. 08 Dec 2019.
Vancouver:
Sweeney MJ. A framework for scoring and tagging NetFlow data. [Internet] [Thesis]. Rhodes University; 2019. [cited 2019 Dec 08].
Available from: http://hdl.handle.net/10962/65022.
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
Council of Science Editors:
Sweeney MJ. A framework for scoring and tagging NetFlow data. [Thesis]. Rhodes University; 2019. Available from: http://hdl.handle.net/10962/65022
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation

Georgia State University
14.
Casturi, Narasimharao V.
Enterprise Data Mining & Machine Learning Framework on Cloud Computing for Investment Platforms.
Degree: PhD, Computer Science, 2019, Georgia State University
URL: https://scholarworks.gsu.edu/cs_diss/150
► Machine Learning and Data Mining are two key components in decision making systems which can provide valuable in-sights quickly into huge data set. Turning…
(more)
▼ Machine Learning and
Data Mining are two key components in decision making systems which can provide valuable in-sights quickly into huge
data set. Turning raw
data into meaningful information and converting it into actionable tasks makes organizations profitable and sustain immense competition. In the past decade we saw an increase in
Data Mining algorithms and tools for financial market analysis, consumer products, manufacturing, insurance industry, social networks, scientific discoveries and warehousing. With vast amount of
data available for analysis, the traditional tools and techniques are outdated for
data analysis and decision support. Organizations are investing considerable amount of resources in the area of
Data Mining Frameworks in order to emerge as market leaders. Machine Learning is a natural evolution of
Data Mining. The existing Machine Learning techniques rely heavily on the underlying
Data Mining techniques in which the Patterns Recognition is an essential component. Building an efficient
Data Mining Framework is expensive and usually culminates in multi-year project for the organizations. The organization pay a heavy price for any delay or inefficient
Data Mining foundation. In this research, we propose to build a cost effective and efficient
Data Mining (DM) and Machine Learning (ML) Framework on cloud
computing environment to solve the inherent limitations in the existing design methodologies. The elasticity of the cloud architecture solves the hardware constraint on businesses. Our research is focused on refining and enhancing the current
Data Mining frameworks to build an enterprise
data mining and machine learning framework. Our initial studies and techniques produced very promising results by reducing the existing build time considerably. Our technique of dividing the DM and ML Frameworks into several individual components (5 sub components) which can be reused at several phases of the final enterprise build is efficient and saves operational costs to the organization. Effective Aggregation using selective cuboids and parallel computations using Azure Cloud Services are few of many proposed techniques in our research. Our research produced a nimble, scalable portable architecture for enterprise wide implementation of DM and ML frameworks.
Advisors/Committee Members: Rajshekhar Sunderraman, Anu Bourgeios, Yanqing Zhang, Sheldon Schiffer.
Subjects/Keywords: FinTech; Big Data; Data Mining; Machine Learning; Cloud Computing; Enterprise Architecture
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Casturi, N. V. (2019). Enterprise Data Mining & Machine Learning Framework on Cloud Computing for Investment Platforms. (Doctoral Dissertation). Georgia State University. Retrieved from https://scholarworks.gsu.edu/cs_diss/150
Chicago Manual of Style (16th Edition):
Casturi, Narasimharao V. “Enterprise Data Mining & Machine Learning Framework on Cloud Computing for Investment Platforms.” 2019. Doctoral Dissertation, Georgia State University. Accessed December 08, 2019.
https://scholarworks.gsu.edu/cs_diss/150.
MLA Handbook (7th Edition):
Casturi, Narasimharao V. “Enterprise Data Mining & Machine Learning Framework on Cloud Computing for Investment Platforms.” 2019. Web. 08 Dec 2019.
Vancouver:
Casturi NV. Enterprise Data Mining & Machine Learning Framework on Cloud Computing for Investment Platforms. [Internet] [Doctoral dissertation]. Georgia State University; 2019. [cited 2019 Dec 08].
Available from: https://scholarworks.gsu.edu/cs_diss/150.
Council of Science Editors:
Casturi NV. Enterprise Data Mining & Machine Learning Framework on Cloud Computing for Investment Platforms. [Doctoral Dissertation]. Georgia State University; 2019. Available from: https://scholarworks.gsu.edu/cs_diss/150

Brunel University
15.
Suthakar, Uthayanath.
A scalable data store and analytic platform for real-time monitoring of data-intensive scientific infrastructure.
Degree: PhD, 2017, Brunel University
URL: http://bura.brunel.ac.uk/handle/2438/15788
;
https://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.764857
► Monitoring data-intensive scientific infrastructures in real-time such as jobs, data transfers, and hardware failures is vital for efficient operation. Due to the high volume and…
(more)
▼ Monitoring data-intensive scientific infrastructures in real-time such as jobs, data transfers, and hardware failures is vital for efficient operation. Due to the high volume and velocity of events that are produced, traditional methods are no longer optimal. Several techniques, as well as enabling architectures, are available to support the Big Data issue. In this respect, this thesis complements existing survey work by contributing an extensive literature review of both traditional and emerging Big Data architecture. Scalability, low-latency, fault-tolerance, and intelligence are key challenges of the traditional architecture. However, Big Data technologies and approaches have become increasingly popular for use cases that demand the use of scalable, data intensive processing (parallel), and fault-tolerance (data replication) and support for low-latency computations. In the context of a scalable data store and analytics platform for monitoring data-intensive scientific infrastructure, Lambda Architecture was adapted and evaluated on the Worldwide LHC Computing Grid, which has been proven effective. This is especially true for computationally and data-intensive use cases. In this thesis, an efficient strategy for the collection and storage of large volumes of data for computation is presented. By moving the transformation logic out from the data pipeline and moving to analytics layers, it simplifies the architecture and overall process. Time utilised is reduced, untampered raw data are kept at storage level for fault-tolerance, and the required transformation can be done when needed. An optimised Lambda Architecture (OLA), which involved modelling an efficient way of joining batch layer and streaming layer with minimum code duplications in order to support scalability, low-latency, and fault-tolerance is presented. A few models were evaluated; pure streaming layer, pure batch layer and the combination of both batch and streaming layers. Experimental results demonstrate that OLA performed better than the traditional architecture as well the Lambda Architecture. The OLA was also enhanced by adding an intelligence layer for predicting data access pattern. The intelligence layer actively adapts and updates the model built by the batch layer, which eliminates the re-training time while providing a high level of accuracy using the Deep Learning technique. The fundamental contribution to knowledge is a scalable, low-latency, fault-tolerant, intelligent, and heterogeneous-based architecture for monitoring a data-intensive scientific infrastructure, that can benefit from Big Data, technologies and approaches.
Subjects/Keywords: Big data; Data science; Distributed system; Lambda Architecture; Parallel computing
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Suthakar, U. (2017). A scalable data store and analytic platform for real-time monitoring of data-intensive scientific infrastructure. (Doctoral Dissertation). Brunel University. Retrieved from http://bura.brunel.ac.uk/handle/2438/15788 ; https://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.764857
Chicago Manual of Style (16th Edition):
Suthakar, Uthayanath. “A scalable data store and analytic platform for real-time monitoring of data-intensive scientific infrastructure.” 2017. Doctoral Dissertation, Brunel University. Accessed December 08, 2019.
http://bura.brunel.ac.uk/handle/2438/15788 ; https://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.764857.
MLA Handbook (7th Edition):
Suthakar, Uthayanath. “A scalable data store and analytic platform for real-time monitoring of data-intensive scientific infrastructure.” 2017. Web. 08 Dec 2019.
Vancouver:
Suthakar U. A scalable data store and analytic platform for real-time monitoring of data-intensive scientific infrastructure. [Internet] [Doctoral dissertation]. Brunel University; 2017. [cited 2019 Dec 08].
Available from: http://bura.brunel.ac.uk/handle/2438/15788 ; https://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.764857.
Council of Science Editors:
Suthakar U. A scalable data store and analytic platform for real-time monitoring of data-intensive scientific infrastructure. [Doctoral Dissertation]. Brunel University; 2017. Available from: http://bura.brunel.ac.uk/handle/2438/15788 ; https://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.764857

New Jersey Institute of Technology
16.
Shu, Tong.
Performance optimization and energy efficiency of big-data computing workflows.
Degree: PhD, Computer Science, 2017, New Jersey Institute of Technology
URL: https://digitalcommons.njit.edu/dissertations/41
► Next-generation e-science is producing colossal amounts of data, now frequently termed as Big Data, on the order of terabyte at present and petabyte or…
(more)
▼ Next-generation e-science is producing colossal amounts of
data, now frequently termed as
Big Data, on the order of terabyte at present and petabyte or even exabyte in the predictable future. These scientific applications typically feature
data-intensive workflows comprised of moldable parallel
computing jobs, such as MapReduce, with intricate inter-job dependencies. The granularity of task partitioning in each moldable job of such
big data workflows has a significant impact on workflow completion time, energy consumption, and financial cost if executed in clouds, which remains largely unexplored. This dissertation conducts an in-depth investigation into the properties of moldable jobs and provides an experiment-based validation of the performance model where the total workload of a moldable job increases along with the degree of parallelism. Furthermore, this dissertation conducts rigorous research on workflow execution dynamics in resource sharing environments and explores the interactions between workflow mapping and task scheduling on various
computing platforms. A workflow optimization architecture is developed to seamlessly integrate three interrelated technical components, i.e., resource allocation, job mapping, and task scheduling.
Advisors/Committee Members: Chase Qishi Wu, Guiling Wang, Roberto Rojas-Cessa.
Subjects/Keywords: Big data; Scientific workflow; Green computing; Cloud computing; Parallel computing; Map reduce; Computer Sciences
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Shu, T. (2017). Performance optimization and energy efficiency of big-data computing workflows. (Doctoral Dissertation). New Jersey Institute of Technology. Retrieved from https://digitalcommons.njit.edu/dissertations/41
Chicago Manual of Style (16th Edition):
Shu, Tong. “Performance optimization and energy efficiency of big-data computing workflows.” 2017. Doctoral Dissertation, New Jersey Institute of Technology. Accessed December 08, 2019.
https://digitalcommons.njit.edu/dissertations/41.
MLA Handbook (7th Edition):
Shu, Tong. “Performance optimization and energy efficiency of big-data computing workflows.” 2017. Web. 08 Dec 2019.
Vancouver:
Shu T. Performance optimization and energy efficiency of big-data computing workflows. [Internet] [Doctoral dissertation]. New Jersey Institute of Technology; 2017. [cited 2019 Dec 08].
Available from: https://digitalcommons.njit.edu/dissertations/41.
Council of Science Editors:
Shu T. Performance optimization and energy efficiency of big-data computing workflows. [Doctoral Dissertation]. New Jersey Institute of Technology; 2017. Available from: https://digitalcommons.njit.edu/dissertations/41

Clemson University
17.
Xuan, Pengfei.
Accelerating Big Data Analytics on Traditional High-Performance Computing Systems Using Two-Level Storage.
Degree: PhD, School of Computing, 2016, Clemson University
URL: https://tigerprints.clemson.edu/all_dissertations/2318
► High-performance Computing (HPC) clusters, which consist of a large number of compute nodes, have traditionally been widely employed in industry and academia to run…
(more)
▼ High-performance
Computing (HPC) clusters, which consist of a large number of compute nodes, have traditionally been widely employed in industry and academia to run diverse compute-intensive applications. In recent years, the revolution in
data-driven science results in large volumes of
data, often size in terabytes or petabytes, and makes
data-intensive applications getting exponential growth. The
data-intensive
computing presents new challenges to HPC clusters due to the different workload characteristics and optimization objectives. One of those challenges is how to efficiently integrate software frameworks developed for
big data analytics, such as Hadoop and Spark, with traditional HPC systems to support both
data-intensive and compute-intensive workloads.
To address this challenge, we first present a novel two-level storage system, TLS, that integrates a distributed in-memory storage system with a parallel file system. The former renders memory-speed high I/O performance and the latter renders consistent storage with large capacity. We model and compare its I/O throughput to Hadoop distributed file system (HDFS) and OrangeFS (formerly PVFS2). We further build a prototype of TLS with Alluxio (formerly Tachyon) and OrangeFS, and evaluate its performance using MapReduce benchmarks. Both analyses and experiments on real systems show that the proposed storage architecture delivers higher aggregate I/O throughputs than HDFS and OrangeFS while retaining weak scalability on both read and write.
However, statically configured in-memory storage may leave inadequate space for compute-intensive jobs or lose the opportunity to utilize more available space for
data-intensive applications. Then, we develop a dynamic memory controller, DynIMS, which infers memory demands of compute tasks in real time and employs a feedback-based control model to dynamically adjust the capacity of the in-memory storage system. The DynIMS is able to quickly release capacity of in-memory storage system for compute-intensive workload, as well as maximize the capacity of in-memory storage system for
data-intensive applications when other compute workloads are finished. We test DynIMS using mixed HPCC and Spark workloads on a production HPC cluster. Experimental results show that DynIMS can achieve up to 5 performance improvement compared to systems with static memory allocations.
We expect the work in this dissertation helps further accelerate the adoption of
big data frameworks to solve the
data-intensive problems in traditional HPC systems, and gears up the converged
computing infrastructure for both academia and industry.
Advisors/Committee Members: Feng Luo, Pradip K Srimani, Rong Ge, Jim Martin.
Subjects/Keywords: Big Data Analytics; Data-intensive Computing; HPC; In-memory Computing; Parallel File System
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Xuan, P. (2016). Accelerating Big Data Analytics on Traditional High-Performance Computing Systems Using Two-Level Storage. (Doctoral Dissertation). Clemson University. Retrieved from https://tigerprints.clemson.edu/all_dissertations/2318
Chicago Manual of Style (16th Edition):
Xuan, Pengfei. “Accelerating Big Data Analytics on Traditional High-Performance Computing Systems Using Two-Level Storage.” 2016. Doctoral Dissertation, Clemson University. Accessed December 08, 2019.
https://tigerprints.clemson.edu/all_dissertations/2318.
MLA Handbook (7th Edition):
Xuan, Pengfei. “Accelerating Big Data Analytics on Traditional High-Performance Computing Systems Using Two-Level Storage.” 2016. Web. 08 Dec 2019.
Vancouver:
Xuan P. Accelerating Big Data Analytics on Traditional High-Performance Computing Systems Using Two-Level Storage. [Internet] [Doctoral dissertation]. Clemson University; 2016. [cited 2019 Dec 08].
Available from: https://tigerprints.clemson.edu/all_dissertations/2318.
Council of Science Editors:
Xuan P. Accelerating Big Data Analytics on Traditional High-Performance Computing Systems Using Two-Level Storage. [Doctoral Dissertation]. Clemson University; 2016. Available from: https://tigerprints.clemson.edu/all_dissertations/2318

KTH
18.
Moré, Andre.
HopsWorks : A project-based access control model for Hadoop.
Degree: Information and Communication Technology (ICT), 2015, KTH
URL: http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-175742
► The growth in the global data gathering capacity is producing a vast amount of data which is getting vaster at an increasingly faster rate.…
(more)
▼ The growth in the global data gathering capacity is producing a vast amount of data which is getting vaster at an increasingly faster rate. This data properly analyzed can represent great opportunity for businesses, but processing it is a resource-intensive task. Sharing can increase efficiency due to reusability but there are legal and ethical questions that arise when data is shared. The purpose of this thesis is to gain an in depth understanding of the different access control methods that can be used to facilitate sharing, and choose one to implement on a platform that lets user analyze, share, and collaborate on, datasets. The resulting platform uses a project based access control on the API level and a fine-grained role based access control on the file system to give full control over the shared data to the data owner.
I dagsläget så genereras och samlas det in oerhört stora mängder data som växer i ett allt högre tempo för varje dag som går. Den korrekt analyserade datan skulle kunna erbjuda stora möjligheter för företag men problemet är att det är väldigt resurskrävande att bearbeta. Att göra det möjligt för organisationer att dela med sig utav datan skulle effektivisera det hela tack vare återanvändandet av data men det dyker då upp olika frågor kring lagliga samt etiska aspekter när man delar dessa data. Syftet med denna rapport är att få en djupare förståelse för dom olika åtkomstmetoder som kan användas vid delning av data för att sedan kunna välja den metod som man ansett vara mest lämplig att använda sig utav i en plattform. Plattformen kommer att användas av användare som vill skapa projekt där man vill analysera, dela och arbeta med DataSets, vidare kommer plattformens säkerhet att implementeras med en projekt-baserad åtkomstkontroll på API nivå och detaljerad rollbaserad åtkomstkontroll på filsystemet för att ge dataägaren full kontroll över den data som delas
Subjects/Keywords: Hops; HopsWorks; Hadoop; DataSets; Big Data; Distributed Computing; Hops; HopsWorks; Hadoop; DataSets; Big Data; Distributed Computing
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Moré, A. (2015). HopsWorks : A project-based access control model for Hadoop. (Thesis). KTH. Retrieved from http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-175742
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
Chicago Manual of Style (16th Edition):
Moré, Andre. “HopsWorks : A project-based access control model for Hadoop.” 2015. Thesis, KTH. Accessed December 08, 2019.
http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-175742.
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
MLA Handbook (7th Edition):
Moré, Andre. “HopsWorks : A project-based access control model for Hadoop.” 2015. Web. 08 Dec 2019.
Vancouver:
Moré A. HopsWorks : A project-based access control model for Hadoop. [Internet] [Thesis]. KTH; 2015. [cited 2019 Dec 08].
Available from: http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-175742.
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
Council of Science Editors:
Moré A. HopsWorks : A project-based access control model for Hadoop. [Thesis]. KTH; 2015. Available from: http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-175742
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
19.
Song, Ge.
Méthodes parallèles pour le traitement des flux de données continus : Parallel and continuous join processing for data stream.
Degree: Docteur es, Informatique, 2016, Paris Saclay
URL: http://www.theses.fr/2016SACLC059
► Nous vivons dans un monde où une grande quantité de données est généré en continu. Par exemple, quand on fait une recherche sur Google, quand…
(more)
▼ Nous vivons dans un monde où une grande quantité de données est généré en continu. Par exemple, quand on fait une recherche sur Google, quand on achète quelque chose sur Amazon, quand on clique en ‘Aimer’ sur Facebook, quand on upload une image sur Instagram, et quand un capteur est activé, etc., de nouvelles données vont être généré. Les données sont différentes d’une simple information numérique, mais viennent dans de nombreux format. Cependant, les données prisent isolément n’ont aucun sens. Mais quand ces données sont reliées ensemble on peut en extraire de nouvelles informations. De plus, les données sont sensibles au temps. La façon la plus précise et efficace de représenter les données est de les exprimer en tant que flux de données. Si les données les plus récentes ne sont pas traitées rapidement, les résultats obtenus ne sont pas aussi utiles. Ainsi, un système parallèle et distribué pour traiter de grandes quantités de flux de données en temps réel est un problème de recherche important. Il offre aussi de bonne perspective d’application. Dans cette thèse nous étudions l’opération de jointure sur des flux de données, de manière parallèle et continue. Nous séparons ce problème en deux catégories. La première est la jointure en parallèle et continue guidée par les données. La second est la jointure en parallèle et continue guidée par les requêtes.
We live in a world where a vast amount of data is being continuously generated. Data is coming in a variety of ways. For example, every time we do a search on Google, every time we purchase something on Amazon, every time we click a ‘like’ on Facebook, every time we upload an image on Instagram, every time a sensor is activated, etc., it will generate new data. Data is different than simple numerical information, it now comes in a variety of forms. However, isolated data is valueless. But when this huge amount of data is connected, it is very valuable to look for new insights. At the same time, data is time sensitive. The most accurate and effective way of describing data is to express it as a data stream. If the latest data is not promptly processed, the opportunity of having the most useful results will be missed.So a parallel and distributed system for processing large amount of data streams in real time has an important research value and a good application prospect. This thesis focuses on the study of parallel and continuous data stream Joins. We divide this problem into two categories. The first one is Data Driven Parallel and Continuous Join, and the second one is Query Driven Parallel and Continuous Join.
Advisors/Committee Members: Magoulès, Frédéric (thesis director), Huet, Fabrice (thesis director).
Subjects/Keywords: Big Data; Flux de Données; Calculation en Parallel; Exploration de Données; Big Data; Data Stream; Parallel Computing; Data Mining
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Song, G. (2016). Méthodes parallèles pour le traitement des flux de données continus : Parallel and continuous join processing for data stream. (Doctoral Dissertation). Paris Saclay. Retrieved from http://www.theses.fr/2016SACLC059
Chicago Manual of Style (16th Edition):
Song, Ge. “Méthodes parallèles pour le traitement des flux de données continus : Parallel and continuous join processing for data stream.” 2016. Doctoral Dissertation, Paris Saclay. Accessed December 08, 2019.
http://www.theses.fr/2016SACLC059.
MLA Handbook (7th Edition):
Song, Ge. “Méthodes parallèles pour le traitement des flux de données continus : Parallel and continuous join processing for data stream.” 2016. Web. 08 Dec 2019.
Vancouver:
Song G. Méthodes parallèles pour le traitement des flux de données continus : Parallel and continuous join processing for data stream. [Internet] [Doctoral dissertation]. Paris Saclay; 2016. [cited 2019 Dec 08].
Available from: http://www.theses.fr/2016SACLC059.
Council of Science Editors:
Song G. Méthodes parallèles pour le traitement des flux de données continus : Parallel and continuous join processing for data stream. [Doctoral Dissertation]. Paris Saclay; 2016. Available from: http://www.theses.fr/2016SACLC059
20.
Ribot, Stephane.
Adoption of Big Data And Cloud Computing Technologies for Large Scale Mobile Traffic Analysis : L’adoption des technologies Big Data et Cloud Computing dans le cadre de l’analyse des données de trafic mobile.
Degree: Docteur es, Science de Gestion, 2016, Lyon
URL: http://www.theses.fr/2016LYSE3049
► L’émergence des technologies Big Data et Cloud computing pour répondre à l’accroissement constant de la complexité et de la diversité des données constituent un nouvel…
(more)
▼ L’émergence des technologies Big Data et Cloud computing pour répondre à l’accroissement constant de la complexité et de la diversité des données constituent un nouvel enjeu de taille pour les entreprises qui, désormais, doivent prendre en compte ce nouveau paradigme. Les opérateurs de services mobiles sont un exemple de sociétés qui cherchent à valoriser et monétiser les données collectées de leur utilisateurs. Cette recherche a pour objectif d’analyser ce nouvel enjeu qui allie d’une part l’explosion du nombre des données à analyser, et d’autre part, la constante émergence de nouvelles technologies et de leur adoption. Dans cette thèse, nous abordons la question de recherche suivante: « Dans quelle mesure les technologies Cloud Computing et Big Data contribuent aux tâches menées par les Data Scientists? » Sur la base d’une approche hypothético-déductive relayée par les théories classiques de l’adoption, les hypothèses et le modèle conceptuel sont inspirés du modèle de l’adéquation de la tâche et de la technologie (TTF) de Goodhue. Les facteurs proposés incluent le Big Data et le Cloud Computing, la tâche, la technologie, l'individu, le TTF, l’utilisation et les impacts réalisés. Cette thèse aborde sept hypothèses qui adressent spécifiquement les faiblesses des modèles précédents. Une enquête a été conduite auprès de 169 chercheurs contribuant à l’analyse des données mobiles. Une analyse quantitative a été effectuée afin de démontrer la validité des mesures effectuées et d’établir la pertinence du modèle théorique proposé. L’analyse partielle des moindres carrés a été utilisée (partial least square) pour établir les corrélations entre les construits. Cette recherche délivre deux contributions majeures : le développement d'un construit (TTF) spécifique aux technologies Big Data et Cloud computing ainsi que la validation de ce construit dans le modèle d’adéquation des technologies Big data - Cloud Computing et de l’analyse des données mobiles.
A new economic paradigm is emerging as a result of enterprises generating and managing increasing amounts of data and looking for technologies like cloud computing and Big Data to improve data-driven decision making and ultimately performance. Mobile service providers are an example of firms that are looking to monetize the collected mobile data. Our thesis explores cloud computing determinants of adoption and Big Data determinants of adoption at the user level. In this thesis, we employ a quantitative research methodology and operationalized using a cross-sectional survey so temporal consistency could be maintained for all the variables. The TTF model was supported by results analyzed using partial least square (PLS) structural equation modeling (SEM), which reflects positive relationships between individual, technology and task factors on TTF for mobile data analysis.Our research makes two contributions: the development of a new TTF construct – task-Big Data/cloud computing technology fit model – and the testing of that construct in a model overcoming the rigidness of the…
Advisors/Committee Members: Lebraty, Jean Fabrice (thesis director), Boulanger, Danielle (thesis director).
Subjects/Keywords: Big Data; Nuage Informatique; Adoption; Big Data; Cloud Computing; Task-Technology Fit; Data value chain; 000
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Ribot, S. (2016). Adoption of Big Data And Cloud Computing Technologies for Large Scale Mobile Traffic Analysis : L’adoption des technologies Big Data et Cloud Computing dans le cadre de l’analyse des données de trafic mobile. (Doctoral Dissertation). Lyon. Retrieved from http://www.theses.fr/2016LYSE3049
Chicago Manual of Style (16th Edition):
Ribot, Stephane. “Adoption of Big Data And Cloud Computing Technologies for Large Scale Mobile Traffic Analysis : L’adoption des technologies Big Data et Cloud Computing dans le cadre de l’analyse des données de trafic mobile.” 2016. Doctoral Dissertation, Lyon. Accessed December 08, 2019.
http://www.theses.fr/2016LYSE3049.
MLA Handbook (7th Edition):
Ribot, Stephane. “Adoption of Big Data And Cloud Computing Technologies for Large Scale Mobile Traffic Analysis : L’adoption des technologies Big Data et Cloud Computing dans le cadre de l’analyse des données de trafic mobile.” 2016. Web. 08 Dec 2019.
Vancouver:
Ribot S. Adoption of Big Data And Cloud Computing Technologies for Large Scale Mobile Traffic Analysis : L’adoption des technologies Big Data et Cloud Computing dans le cadre de l’analyse des données de trafic mobile. [Internet] [Doctoral dissertation]. Lyon; 2016. [cited 2019 Dec 08].
Available from: http://www.theses.fr/2016LYSE3049.
Council of Science Editors:
Ribot S. Adoption of Big Data And Cloud Computing Technologies for Large Scale Mobile Traffic Analysis : L’adoption des technologies Big Data et Cloud Computing dans le cadre de l’analyse des données de trafic mobile. [Doctoral Dissertation]. Lyon; 2016. Available from: http://www.theses.fr/2016LYSE3049
21.
Ahlcrona, Felix.
Sakernas Internet : En studie om vehicular fog computing påverkan i trafiken.
Degree: Informatics, 2018, University of Skövde
URL: http://urn.kb.se/resolve?urn=urn:nbn:se:his:diva-15713
► Framtidens fordon kommer vara väldigt annorlunda jämfört med dagens fordon. Stor del av förändringen kommer ske med hjälp av IoT. Världen kommer bli oerhört…
(more)
▼ Framtidens fordon kommer vara väldigt annorlunda jämfört med dagens fordon. Stor del av förändringen kommer ske med hjälp av IoT. Världen kommer bli oerhört uppkopplat, sensorer kommer kunna ta fram data som de flesta av oss inte ens visste fanns. Mer data betyder även mer problem. Enorma mängder data kommer genereras och distribueras av framtidens IoT-enheter och denna data behöver analyseras och lagras på effektiva sätt med hjälp av Big data principer. Fog computing är en utveckling av Cloud tekniken som föreslås som en lösning på många av de problem IoT lider utav. Är tradionella lagringsmöjligheter och analyseringsverktyg tillräckliga för den enorma volymen data som kommer produceras eller krävs det nya tekniker för att stödja utvecklingen? Denna studie kommer försöka besvara frågeställningen: ”Vilka problem och möjligheter får utvecklingen av Fog computing i personbilar för konsumenter?” Frågeställningen besvaras genom en systematisk litteraturstudie. Den systematiska litteraturstudien syfte är identifiera och tolka tidigare litteratur och forskning. Analys av materialet har skett med hjälp av öppen kodning som har använts för att sortera och kategorisera data. Resultat visar att tekniker som IoT, Big data och Fog computing är väldigt integrerade i varandra. I framtidens fordon kommer det finns mycket IoTenheter som producerar enorma mängder data. Fog computing kommer bli en effektiv lösning för att hantera de mängder data från IoT-enheterna med låg fördröjning. Möjligheterna blir nya applikationer och system som hjälper till med att förbättra säkerheten i trafiken, miljön och information om bilens tillstånd. Det finns flera risker och problem som behöver lösas innan en fullskalig version kan börja användas, risker som autentisering av data, integriteten för användaren samt bestämma vilken mobilitetsmodell som är effektivast.
Future vehicles will be very different from today's vehicles. Much of the change will be done using the IoT. The world will be very connected, sensors will be able to access data that most of us did not even know existed. More data also means more problems. Enormous amounts of data will be generated and distributed by the future's IoT devices, and this data needs to be analyzed and stored efficiently using Big data Principles. Fog computing is a development of Cloud technology that is suggested as a solution to many of the problems IoT suffer from. Are traditional storage and analysis tools sufficient for the huge volume of data that will be produced or are new technologies needed to support development? This study will try to answer the question: "What problems and opportunities does the development of Fog computing in passenger cars have for consumers?" The question is answered by a systematic literature study. The objective of the systematic literature study is to identify and interpret previous literature and research. Analysis of the material has been done by using open coding where coding has been used to sort and categorize data. Results show that…
Subjects/Keywords: IoT; big data; fog computing; vehicular fog computing; connected vehicles; Sakernas internet; big data; fog computing; vehicular fog computing; uppkopplade bilar; Information Systems; Systemvetenskap, informationssystem och informatik
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Ahlcrona, F. (2018). Sakernas Internet : En studie om vehicular fog computing påverkan i trafiken. (Thesis). University of Skövde. Retrieved from http://urn.kb.se/resolve?urn=urn:nbn:se:his:diva-15713
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
Chicago Manual of Style (16th Edition):
Ahlcrona, Felix. “Sakernas Internet : En studie om vehicular fog computing påverkan i trafiken.” 2018. Thesis, University of Skövde. Accessed December 08, 2019.
http://urn.kb.se/resolve?urn=urn:nbn:se:his:diva-15713.
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
MLA Handbook (7th Edition):
Ahlcrona, Felix. “Sakernas Internet : En studie om vehicular fog computing påverkan i trafiken.” 2018. Web. 08 Dec 2019.
Vancouver:
Ahlcrona F. Sakernas Internet : En studie om vehicular fog computing påverkan i trafiken. [Internet] [Thesis]. University of Skövde; 2018. [cited 2019 Dec 08].
Available from: http://urn.kb.se/resolve?urn=urn:nbn:se:his:diva-15713.
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
Council of Science Editors:
Ahlcrona F. Sakernas Internet : En studie om vehicular fog computing påverkan i trafiken. [Thesis]. University of Skövde; 2018. Available from: http://urn.kb.se/resolve?urn=urn:nbn:se:his:diva-15713
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
22.
Rabah, Mazouzi.
Approches collaboratives pour la classification des données complexes : Collaborative approaches for complex data classification.
Degree: Docteur es, Informatique, 2016, Paris 8
URL: http://www.theses.fr/2016PA080079
► La présente thèse s'intéresse à la classification collaborative dans un contexte de données complexes, notamment dans le cadre du Big Data, nous nous sommes penchés…
(more)
▼ La présente thèse s'intéresse à la classification collaborative dans un contexte de données complexes, notamment dans le cadre du Big Data, nous nous sommes penchés sur certains paradigmes computationels pour proposer de nouvelles approches en exploitant des technologies de calcul intensif et large echelle. Dans ce cadre, nous avons mis en oeuvre des classifieurs massifs, au sens où le nombre de classifieurs qui composent le multi-classifieur peut être tres élevé. Dans ce cas, les méthodes classiques d'interaction entre classifieurs ne demeurent plus valables et nous devions proposer de nouvelles formes d'interactions, qui ne se contraignent pas de prendre la totalité des prédictions des classifieurs pour construire une prédiction globale. Selon cette optique, nous nous sommes trouvés confrontés à deux problèmes : le premier est le potientiel de nos approches à passer à l'echelle. Le second, relève de la diversité qui doit être créée et maintenue au sein du système, afin d'assurer sa performance. De ce fait, nous nous sommes intéressés à la distribution de classifieurs dans un environnement de Cloud-computing, ce système multi-classifieurs est peut etre massif et ses propréités sont celles d'un système complexe. En terme de diversité des données, nous avons proposé une approche d'enrichissement de données d'apprentissage par la génération de données de synthèse, à partir de modèles analytiques qui décrivent une partie du phenomène étudié. Aisni, la mixture des données, permet de renforcer l'apprentissage des classifieurs. Les expérientations menées ont montré un grand potentiel pour l'amélioration substantielle des résultats de classification.
This thesis focuses on the collaborative classification in the context of complex data, in particular the context of Big Data, we used some computational paradigms to propose new approaches based on HPC technologies. In this context, we aim at offering massive classifiers in the sense that the number of elementary classifiers that make up the multiple classifiers system can be very high. In this case, conventional methods of interaction between classifiers is no longer valid and we had to propose new forms of interaction, where it is not constrain to take all classifiers predictions to build an overall prediction. According to this, we found ourselves faced with two problems: the first is the potential of our approaches to scale up. The second, is the diversity that must be created and maintained within the system, to ensure its performance. Therefore, we studied the distribution of classifiers in a cloud-computing environment, this multiple classifiers system can be massive and their properties are those of a complex system. In terms of diversity of data, we proposed a training data enrichment approach for the generation of synthetic data from analytical models that describe a part of the phenomenon studied. so, the mixture of data reinforces learning classifiers. The experimentation made have shown the great potential for the substantial improvement of classification…
Advisors/Committee Members: Akdag, Herman (thesis director).
Subjects/Keywords: Classification; Ensemble de classifieurs; Big data; Cloud-computing; Diversité
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Rabah, M. (2016). Approches collaboratives pour la classification des données complexes : Collaborative approaches for complex data classification. (Doctoral Dissertation). Paris 8. Retrieved from http://www.theses.fr/2016PA080079
Chicago Manual of Style (16th Edition):
Rabah, Mazouzi. “Approches collaboratives pour la classification des données complexes : Collaborative approaches for complex data classification.” 2016. Doctoral Dissertation, Paris 8. Accessed December 08, 2019.
http://www.theses.fr/2016PA080079.
MLA Handbook (7th Edition):
Rabah, Mazouzi. “Approches collaboratives pour la classification des données complexes : Collaborative approaches for complex data classification.” 2016. Web. 08 Dec 2019.
Vancouver:
Rabah M. Approches collaboratives pour la classification des données complexes : Collaborative approaches for complex data classification. [Internet] [Doctoral dissertation]. Paris 8; 2016. [cited 2019 Dec 08].
Available from: http://www.theses.fr/2016PA080079.
Council of Science Editors:
Rabah M. Approches collaboratives pour la classification des données complexes : Collaborative approaches for complex data classification. [Doctoral Dissertation]. Paris 8; 2016. Available from: http://www.theses.fr/2016PA080079

UCLA
23.
Tetali, Sai Deep.
Program Analyses for Cloud Computations.
Degree: Computer Science, 2015, UCLA
URL: http://www.escholarship.org/uc/item/0nh2k2c7
► Cloud computing has become an essential part of our computing infrastructure. In this model, data and programs are hosted in (often third-party) data centers that…
(more)
▼ Cloud computing has become an essential part of our computing infrastructure. In this model, data and programs are hosted in (often third-party) data centers that provide APIs for data access and running large-scale computations. It is used in almost all major internet services companies and increasingly being considered by other organizations to host their data and run analytics. However several challenges lie in its full-scale adoption, chief among them being security, performance and correctness. Security is important as both client data and computations need to be sent to third-party data centers. Performance is important as cloud computing involves several development iterations, each running on large-scale data. Correctness is critical as cloud frameworks are complex distributed systems serving billions of users every day.In this dissertation, I argue that program analysis techniques can help address the above key challenges of cloud computing. I describe three projects that illustrate different aspects of the solution space: MrCrypt is a system that uses static analysis to guarantee data confidentiality in cloud computations by using homomorphic encryption schemes. Vega is a library that significantly improves incremental performance by rewriting modified workflows to use previously computed results. Kuai is a distributed, enumerative model checker that verifies correctness in Software Defined Networks, the networking layer used by many data centers.
Subjects/Keywords: Computer science; Big Data; Cloud Computing; Programming Analysis; Programming Languages
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Tetali, S. D. (2015). Program Analyses for Cloud Computations. (Thesis). UCLA. Retrieved from http://www.escholarship.org/uc/item/0nh2k2c7
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
Chicago Manual of Style (16th Edition):
Tetali, Sai Deep. “Program Analyses for Cloud Computations.” 2015. Thesis, UCLA. Accessed December 08, 2019.
http://www.escholarship.org/uc/item/0nh2k2c7.
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
MLA Handbook (7th Edition):
Tetali, Sai Deep. “Program Analyses for Cloud Computations.” 2015. Web. 08 Dec 2019.
Vancouver:
Tetali SD. Program Analyses for Cloud Computations. [Internet] [Thesis]. UCLA; 2015. [cited 2019 Dec 08].
Available from: http://www.escholarship.org/uc/item/0nh2k2c7.
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
Council of Science Editors:
Tetali SD. Program Analyses for Cloud Computations. [Thesis]. UCLA; 2015. Available from: http://www.escholarship.org/uc/item/0nh2k2c7
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation

Universidade Nova
24.
Domingos, João Nuno Silva Tabar.
On the cloud deployment of a session abstraction for service/data aggregation.
Degree: 2013, Universidade Nova
URL: http://www.rcaap.pt/detail.jsp?id=oai:run.unl.pt:10362/9923
► Dissertação para obtenção do Grau de Mestre em Engenharia Informática
The global cyber-infrastructure comprehends a growing number of resources, spanning over several abstraction layers. These…
(more)
▼ Dissertação para obtenção do Grau de Mestre em
Engenharia Informática
The global cyber-infrastructure comprehends a growing number of resources, spanning over several abstraction layers. These resources, which can include wireless sensor devices or mobile networks, share common requirements such as richer inter-connection capabilities and increasing data consumption demands.
Additionally, the service model is now widely spread, supporting the development
and execution of distributed applications. In this context, new challenges are emerging around the “big data” topic. These challenges include service access optimizations, such as data-access context sharing, more efficient data filtering/
aggregation mechanisms, and adaptable service access models that can respond to context changes. The service access characteristics can be aggregated
to capture specific interaction models. Moreover, ubiquitous service access is a
growing requirement, particularly regarding mobile clients such as tablets and
smartphones.
The Session concept aggregates the service access characteristics, creating specific
interaction models, which can then be re-used in similar contexts. Existing
Session abstraction implementations also allow dynamic reconfigurations of these interaction models, so that the model can adapt to context changes, based on service, client or underlying communication medium variables. Cloud computing on the other hand, provides ubiquitous access, along with large data persistence and processing services.
This thesis proposes a Session abstraction implementation, deployed on a Cloud
platform, in the form of a middleware. This middleware captures rich/dynamic
interaction models between users with similar interests, and provides a generic
mechanism for interacting with datasources based on multiple protocols. Such an abstraction contextualizes service/users interactions, can be reused by other
users in similar contexts. This Session implementation also permits data persistence
by saving all data in transit in a Cloud-based repository,
The aforementioned middleware delivers richer datasource-access interaction
models, dynamic reconfigurations, and allows the integration of heterogenous datasources. The solution also provides ubiquitous access, allowing client connections from standard Web browsers or Android based mobile devices.
Advisors/Committee Members: Gomes, Maria Cecília, Paulino, Hervé.
Subjects/Keywords: Big data; Cloud computing; Sessions; Dynamic reconfigurations; Mobile platforms
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Domingos, J. N. S. T. (2013). On the cloud deployment of a session abstraction for service/data aggregation. (Thesis). Universidade Nova. Retrieved from http://www.rcaap.pt/detail.jsp?id=oai:run.unl.pt:10362/9923
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
Chicago Manual of Style (16th Edition):
Domingos, João Nuno Silva Tabar. “On the cloud deployment of a session abstraction for service/data aggregation.” 2013. Thesis, Universidade Nova. Accessed December 08, 2019.
http://www.rcaap.pt/detail.jsp?id=oai:run.unl.pt:10362/9923.
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
MLA Handbook (7th Edition):
Domingos, João Nuno Silva Tabar. “On the cloud deployment of a session abstraction for service/data aggregation.” 2013. Web. 08 Dec 2019.
Vancouver:
Domingos JNST. On the cloud deployment of a session abstraction for service/data aggregation. [Internet] [Thesis]. Universidade Nova; 2013. [cited 2019 Dec 08].
Available from: http://www.rcaap.pt/detail.jsp?id=oai:run.unl.pt:10362/9923.
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
Council of Science Editors:
Domingos JNST. On the cloud deployment of a session abstraction for service/data aggregation. [Thesis]. Universidade Nova; 2013. Available from: http://www.rcaap.pt/detail.jsp?id=oai:run.unl.pt:10362/9923
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation

Virginia Tech
25.
Uliana, David Christopher.
FPGA-Based Accelerator Development for Non-Engineers.
Degree: MS, Electrical and Computer Engineering, 2014, Virginia Tech
URL: http://hdl.handle.net/10919/78091
► In today's world of big-data computing, access to massive, complex data sets has reached an unprecedented level, and the task of intelligently processing such data…
(more)
▼ In today's world of
big-
data computing, access to massive, complex
data sets has reached an unprecedented level, and the task of intelligently processing such
data into useful information has become a growing concern to the high-performance
computing community.
However, domain experts, who are the brains behind this processing, typically lack the skills required to build FPGA-based hardware accelerators ideal for their applications, as traditional development flows targeting such hardware require digital design expertise.
This work proposes a usable, end-to-end accelerator development methodology that attempts to bridge this gap between domain-experts and the vast computational capacity of FPGA-based heterogeneous platforms.
To accomplish this, two development flows were assembled, both targeting the Convey Hybrid-Core HC-1 heterogeneous platform and utilizing existing graphical design environments for design entry.
Furthermore, incremental implementation techniques were applied to one of the flows to accelerate bitstream compilation, improving design productivity.
The efficacy of these flows in extending FPGA-based acceleration to non-engineers in the life sciences was informally tested at two separate instances of an NSF-funded summer workshop, organized and hosted by the Virginia Bioinformatics Institute at Virginia Tech.
In both workshops, groups of four or five non-engineer participants made significant modifications to a bare-bones Smith-Waterman accelerator, extending functionality and improving performance.
Advisors/Committee Members: Athanas, Peter M. (committeechair), Feng, Wu-Chun (committee member), Kepa, Krzysztof (committee member), Martin, Thomas (committee member), Zhang, Liqing (committee member).
Subjects/Keywords: Big-data; HPC; FPGA; Heterogeneous Computing; Life Sciences
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Uliana, D. C. (2014). FPGA-Based Accelerator Development for Non-Engineers. (Masters Thesis). Virginia Tech. Retrieved from http://hdl.handle.net/10919/78091
Chicago Manual of Style (16th Edition):
Uliana, David Christopher. “FPGA-Based Accelerator Development for Non-Engineers.” 2014. Masters Thesis, Virginia Tech. Accessed December 08, 2019.
http://hdl.handle.net/10919/78091.
MLA Handbook (7th Edition):
Uliana, David Christopher. “FPGA-Based Accelerator Development for Non-Engineers.” 2014. Web. 08 Dec 2019.
Vancouver:
Uliana DC. FPGA-Based Accelerator Development for Non-Engineers. [Internet] [Masters thesis]. Virginia Tech; 2014. [cited 2019 Dec 08].
Available from: http://hdl.handle.net/10919/78091.
Council of Science Editors:
Uliana DC. FPGA-Based Accelerator Development for Non-Engineers. [Masters Thesis]. Virginia Tech; 2014. Available from: http://hdl.handle.net/10919/78091

Universidad Nacional de La Plata
26.
De Luca, Julián.
Incidencia de idiomas populares en la lengua española con Big Data: análisis masivo de datos mediante Amazon Elastic MapReduce y Google N-grams.
Degree: 2016, Universidad Nacional de La Plata
URL: http://hdl.handle.net/10915/59489
► La presente tesina analiza, diseña e implementa una solución para el problema de la detección de neologismos y extranjerismos en la lengua Española. Para este…
(more)
▼ La presente tesina analiza, diseña e implementa una solución para el problema de la detección de neologismos y extranjerismos en la lengua Española. Para este fin, realizamos un análisis de investigaciones previas y problemáticas surgidas, y decidimos realizar un aporte mediante un sistema de cloud computing.
Nuestra solución se centra en el uso de tecnologías de análisis masivo de datos para el tratamiento de corpus grandes de una manera eficiente. Realizamos un repaso general de las tecnologías existentes, sin especificar herramientas puntuales, para luego introducir herramientas concisas. Logramos alcanzar una solución sencilla pero eficaz y escalable mediante el uso de herramientas provistas por Amazon Web Services, utilizando el corpus público llamado Google Books Ngrams.
Finalmente, abrimos la posibilidad de utilizar otras herramientas, y la misma metodología en otro tipo de estudios. Demostramos que logramos un aporte a la comunidad de lingüística computacional, y seguiremos trabajando en el tema en cuestión.
Facultad de Informática
Advisors/Committee Members: Hasperué, Waldo, Chichizola, Franco.
Subjects/Keywords: Ciencias Informáticas; big data; cloud computing; neología; extranjerismos
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
De Luca, J. (2016). Incidencia de idiomas populares en la lengua española con Big Data: análisis masivo de datos mediante Amazon Elastic MapReduce y Google N-grams. (Thesis). Universidad Nacional de La Plata. Retrieved from http://hdl.handle.net/10915/59489
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
Chicago Manual of Style (16th Edition):
De Luca, Julián. “Incidencia de idiomas populares en la lengua española con Big Data: análisis masivo de datos mediante Amazon Elastic MapReduce y Google N-grams.” 2016. Thesis, Universidad Nacional de La Plata. Accessed December 08, 2019.
http://hdl.handle.net/10915/59489.
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
MLA Handbook (7th Edition):
De Luca, Julián. “Incidencia de idiomas populares en la lengua española con Big Data: análisis masivo de datos mediante Amazon Elastic MapReduce y Google N-grams.” 2016. Web. 08 Dec 2019.
Vancouver:
De Luca J. Incidencia de idiomas populares en la lengua española con Big Data: análisis masivo de datos mediante Amazon Elastic MapReduce y Google N-grams. [Internet] [Thesis]. Universidad Nacional de La Plata; 2016. [cited 2019 Dec 08].
Available from: http://hdl.handle.net/10915/59489.
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
Council of Science Editors:
De Luca J. Incidencia de idiomas populares en la lengua española con Big Data: análisis masivo de datos mediante Amazon Elastic MapReduce y Google N-grams. [Thesis]. Universidad Nacional de La Plata; 2016. Available from: http://hdl.handle.net/10915/59489
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation

University of Notre Dame
27.
Olivia Choudhury.
Expediting Analysis and Improving Fidelity of Big Data
Genomics</h1>.
Degree: PhD, Computer Science and Engineering, 2017, University of Notre Dame
URL: https://curate.nd.edu/show/zp38w953f1h
► Genomics, or the study of genome-derived data, has had widespread impact in applications including medicine, forensic science, human evolution, environmental science, and social science.…
(more)
▼ Genomics, or the study of genome-derived
data, has had widespread impact in applications including medicine,
forensic science, human evolution, environmental science, and
social science. The plummeting cost of genome sequencing in the
last decade has spurred an exponential growth of genomic
data. The
rate of
data generation from these sequencing techniques has
outpaced
computing throughput, as predicted by Moore’s Law, causing
a major bottleneck in the rate of
data processing and analysis.
Emerging genome
data is also characterized by missing and erroneous
values, that reduce
data fidelity and limit its applicability for
downstream analysis. This forms the basis of the following research
questions: (i) Can we design frameworks that can expedite
data
analysis and enable efficient utilization of computational
resources? (ii) Can we develop accurate and efficient algorithms to
improve
data fidelity in genomic applications? We
address the first problem by developing a parallel
data analysis
framework that accelerates large-scale comparative genomics
applications. We identify that optimal
data partitioning and
caching significantly improve the performance of such framework. We
further construct a predictive model to estimate runtime
configurations that facilitate optimal utilization of cloud and
cluster-based resources while executing
data-intensive
applications. The fidelity of genomic
data
derived from next-generation sequencing techniques impacts
downstream applications like genome-wide association study (GWAS)
and genome assembly. For imputation of missing genotype
data, we
design an accurate, fast, and lightweight algorithm for both model
(with a reference genotype panel) and non-model (without a
reference genotype panel) organisms. To correct erroneous long
reads generated by emerging sequencing techniques, we formulate a
hybrid correction algorithm that determines a correction policy
based on an optimal combination of base quality and similarity of
aligned short reads. We extend the core algorithm by proposing an
iterative learning paradigm that further improves its
performance. Our proposed
data analysis framework
is accessible to the scientific community and has been used to
study the genomes of important plant species and malaria vector
mosquitoes. The predictive models exhibit high accuracy in
determining optimal parameters of operation on commercial cloud
services like Amazon EC2 and Microsoft Azure. Finally, the
imputation and error correction algorithms outperform
state-of-the-art alternatives when tested on real
data sets of
plants, malarial mosquitoes, and humans. Hence, in this thesis, we
present novel solutions to expedite
data-parallel genomic
applications while optimizing cloud and cluster-based resource
utilization. We also design novel, accurate, and efficient
algorithms to impute missing
data and correct erroneous
data in
emerging genomic applications.
Advisors/Committee Members: Scott Emrich, Research Director.
Subjects/Keywords: Big Data; Genomics; Cloud Computing; Bioinformatics;
Computational Biology; Machine Learning
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Choudhury, O. (2017). Expediting Analysis and Improving Fidelity of Big Data
Genomics</h1>. (Doctoral Dissertation). University of Notre Dame. Retrieved from https://curate.nd.edu/show/zp38w953f1h
Chicago Manual of Style (16th Edition):
Choudhury, Olivia. “Expediting Analysis and Improving Fidelity of Big Data
Genomics</h1>.” 2017. Doctoral Dissertation, University of Notre Dame. Accessed December 08, 2019.
https://curate.nd.edu/show/zp38w953f1h.
MLA Handbook (7th Edition):
Choudhury, Olivia. “Expediting Analysis and Improving Fidelity of Big Data
Genomics</h1>.” 2017. Web. 08 Dec 2019.
Vancouver:
Choudhury O. Expediting Analysis and Improving Fidelity of Big Data
Genomics</h1>. [Internet] [Doctoral dissertation]. University of Notre Dame; 2017. [cited 2019 Dec 08].
Available from: https://curate.nd.edu/show/zp38w953f1h.
Council of Science Editors:
Choudhury O. Expediting Analysis and Improving Fidelity of Big Data
Genomics</h1>. [Doctoral Dissertation]. University of Notre Dame; 2017. Available from: https://curate.nd.edu/show/zp38w953f1h

University of Western Ontario
28.
Nascimento de Aguiar, Rafael Felipe.
Spatiotemporal Forecasting At Scale.
Degree: 2019, University of Western Ontario
URL: https://ir.lib.uwo.ca/etd/6316
► Spatiotemporal forecasting can be described as predicting the future value of a variable given when and where it will happen. This type of forecasting task…
(more)
▼ Spatiotemporal forecasting can be described as predicting the future value of a variable given when and where it will happen. This type of forecasting task has the potential to aid many institutions and businesses in asking questions, such as how many people will visit a given hospital in the next hour. Answers to these questions have the potential to spur significant socioeconomic impact, providing privacy-friendly short-term forecasts about geolocated events, which in turn can help entities to plan and operate more efficiently. These seemingly simple questions, however, present complex challenges to forecasting systems. With more GPS-enabled devices connected every year, from smartphones to wearables to IoT devices, the volume of collected spatiotemporal data that accompanies these questions has exploded, following the Big Data trend. This thesis proposes a forecasting framework that employs distributed computing in order to scale its internal components and overcome this high data volume scenario. It also designs discretization components that allow for flexibility in the framing of the forecasting questions. Furthermore, it devises a Geographically Global Model (GGM) backed by an ensemble of Stochastic Gradient Boosted Trees, a collection of Geographically Local Models (GLMs) backed by ARIMA models, and a non-linear blending of those as part of its multistage machine learning pipeline in order to boost its performance and stability. The merit of the proposed research is evaluated in three experiments, each of which comprises millions of records, namely forecasting hourly taxi demand in the city of New York, forecasting daily crime density in the city of Chicago, and forecasting hourly visits to places of interest across Canada. The experimental results show the effectiveness of the proposed Spatiotemporal Forecasting Framework in forecasting stable results across the three domains, while also outperforming the naive baseline by at least 49.8% with respect to the SMAPE residuals.
Subjects/Keywords: Spatiotemporal; Forecasting; Big Data; Distributed Computing; Ensemble Learning; Software Engineering
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Nascimento de Aguiar, R. F. (2019). Spatiotemporal Forecasting At Scale. (Thesis). University of Western Ontario. Retrieved from https://ir.lib.uwo.ca/etd/6316
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
Chicago Manual of Style (16th Edition):
Nascimento de Aguiar, Rafael Felipe. “Spatiotemporal Forecasting At Scale.” 2019. Thesis, University of Western Ontario. Accessed December 08, 2019.
https://ir.lib.uwo.ca/etd/6316.
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
MLA Handbook (7th Edition):
Nascimento de Aguiar, Rafael Felipe. “Spatiotemporal Forecasting At Scale.” 2019. Web. 08 Dec 2019.
Vancouver:
Nascimento de Aguiar RF. Spatiotemporal Forecasting At Scale. [Internet] [Thesis]. University of Western Ontario; 2019. [cited 2019 Dec 08].
Available from: https://ir.lib.uwo.ca/etd/6316.
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
Council of Science Editors:
Nascimento de Aguiar RF. Spatiotemporal Forecasting At Scale. [Thesis]. University of Western Ontario; 2019. Available from: https://ir.lib.uwo.ca/etd/6316
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation

University of Sydney
29.
Khelghatdoust, Mansour.
Scalable and Distributed Resource Management Protocols for Cloud and Big Data Clusters
.
Degree: 2018, University of Sydney
URL: http://hdl.handle.net/2123/19703
► Cloud data centers require an operating system to manage resources and satisfy operational requirements and management objectives. The growth of popularity in cloud services causes…
(more)
▼ Cloud data centers require an operating system to manage resources and satisfy operational requirements and management objectives. The growth of popularity in cloud services causes the appearance of a new spectrum of services with sophisticated workload and resource management requirements. Also, data centers are growing by addition of various type of hardware to accommodate the ever-increasing requests of users. Nowadays a large percentage of cloud resources are executing data-intensive applications which need continuously changing workload fluctuations and specific resource management. To this end, cluster computing frameworks are shifting towards distributed resource management for better scalability and faster decision making. Such systems benefit from the parallelization of control and are resilient to failures. Throughout this thesis we investigate algorithms, protocols and techniques to address these challenges in large-scale data centers. We introduce a distributed resource management framework which consolidates virtual machine to as few servers as possible to reduce the energy consumption of data center and hence decrease the cost of cloud providers. This framework can characterize the workload of virtual machines and hence handle trade-off energy consumption and Service Level Agreement (SLA) of customers efficiently. The algorithm is highly scalable and requires low maintenance cost with dynamic workloads and it tries to minimize virtual machines migration costs. We also introduce a scalable and distributed probe-based scheduling algorithm for Big data analytics frameworks. This algorithm can efficiently address the problem job heterogeneity in workloads that has appeared after increasing the level of parallelism in jobs. The algorithm is massively scalable and can reduce significantly average job completion times in comparison with the-state of-the-art. Finally, we propose a probabilistic fault-tolerance technique as part of the scheduling algorithm.
Subjects/Keywords: Resource Management;
Scalability;
Big data;
Cloud Computing;
Load Balancing;
Scheduling;
Consolidate
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Khelghatdoust, M. (2018). Scalable and Distributed Resource Management Protocols for Cloud and Big Data Clusters
. (Thesis). University of Sydney. Retrieved from http://hdl.handle.net/2123/19703
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
Chicago Manual of Style (16th Edition):
Khelghatdoust, Mansour. “Scalable and Distributed Resource Management Protocols for Cloud and Big Data Clusters
.” 2018. Thesis, University of Sydney. Accessed December 08, 2019.
http://hdl.handle.net/2123/19703.
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
MLA Handbook (7th Edition):
Khelghatdoust, Mansour. “Scalable and Distributed Resource Management Protocols for Cloud and Big Data Clusters
.” 2018. Web. 08 Dec 2019.
Vancouver:
Khelghatdoust M. Scalable and Distributed Resource Management Protocols for Cloud and Big Data Clusters
. [Internet] [Thesis]. University of Sydney; 2018. [cited 2019 Dec 08].
Available from: http://hdl.handle.net/2123/19703.
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
Council of Science Editors:
Khelghatdoust M. Scalable and Distributed Resource Management Protocols for Cloud and Big Data Clusters
. [Thesis]. University of Sydney; 2018. Available from: http://hdl.handle.net/2123/19703
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation

Purdue University
30.
Mohamed, Ahmed.
A Cost-Effective Cloud-Based System for Analyzing Big Real-Time Visual Data From Thousands of Network Cameras.
Degree: PhD, Electrical and Computer Engineering, 2016, Purdue University
URL: https://docs.lib.purdue.edu/open_access_dissertations/1370
► Thousands of network cameras stream public real-time visual data from different environments, such as streets, shopping malls, and natural scenes. The big visual data from…
(more)
▼ Thousands of network cameras stream public real-time visual
data from different environments, such as streets, shopping malls, and natural scenes. The
big visual
data from these cameras can be useful for many applications, but analyzing this
data presents many challenges, such as (i) retrieving
data from heterogeneous cameras (e.g. different brands and
data formats), (ii) providing a software environment for users to simultaneously analyze the large amounts of
data from the cameras, (iii) allocating and managing significant amount of
computing resources. This dissertation presents a web-based system designed to address these challenges. The system enables users to execute analysis programs on the
data from more than 120,000 cameras. The system handles the heterogeneity of the cameras and provides an Application Programming Interface (API) that requires slight changes to the existing analysis programs reading
data from files. The system includes a resource manager that allocates cloud resources in order to meet the analysis requirements. Cloud vendors offer different cloud instance types with different capabilities and hourly costs. The manager reduces the overall cost of the allocated instances while meeting the performance requirements. The resource manager monitors the allocated instances; it allocates more instances if needed and deallocates existing instances to reduce the cost if possible. The manager makes decisions based on many factors, such as analysis programs, frame rates, cameras, and instance types.
Advisors/Committee Members: Yung-Hsiang Lu, Dongyan Xu, Edward J Delp, Wei Tsang Ooi.
Subjects/Keywords: Big Data; Cloud Computing; Computer Vision; Network Cameras; Resource Management
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Mohamed, A. (2016). A Cost-Effective Cloud-Based System for Analyzing Big Real-Time Visual Data From Thousands of Network Cameras. (Doctoral Dissertation). Purdue University. Retrieved from https://docs.lib.purdue.edu/open_access_dissertations/1370
Chicago Manual of Style (16th Edition):
Mohamed, Ahmed. “A Cost-Effective Cloud-Based System for Analyzing Big Real-Time Visual Data From Thousands of Network Cameras.” 2016. Doctoral Dissertation, Purdue University. Accessed December 08, 2019.
https://docs.lib.purdue.edu/open_access_dissertations/1370.
MLA Handbook (7th Edition):
Mohamed, Ahmed. “A Cost-Effective Cloud-Based System for Analyzing Big Real-Time Visual Data From Thousands of Network Cameras.” 2016. Web. 08 Dec 2019.
Vancouver:
Mohamed A. A Cost-Effective Cloud-Based System for Analyzing Big Real-Time Visual Data From Thousands of Network Cameras. [Internet] [Doctoral dissertation]. Purdue University; 2016. [cited 2019 Dec 08].
Available from: https://docs.lib.purdue.edu/open_access_dissertations/1370.
Council of Science Editors:
Mohamed A. A Cost-Effective Cloud-Based System for Analyzing Big Real-Time Visual Data From Thousands of Network Cameras. [Doctoral Dissertation]. Purdue University; 2016. Available from: https://docs.lib.purdue.edu/open_access_dissertations/1370
◁ [1] [2] [3] [4] [5] ▶
.