You searched for subject:(parallelization)
.
Showing records 1 – 30 of
303 total matches.
◁ [1] [2] [3] [4] [5] … [11] ▶

NSYSU
1.
Yang, Yu-Wei.
Design and Implementation of C Programming Language Extension for Parallel GPU Computing.
Degree: Master, Computer Science and Engineering, 2010, NSYSU
URL: http://etd.lib.nsysu.edu.tw/ETD-db/ETD-search/view_etd?URN=etd-0727110-153900
► NVIDIA developed a technique of executing general program on GPU, named CUDA (Compute Unified Device Architecture), in 2006. The CUDA programming model allows a group…
(more)
▼ NVIDIA developed a technique of executing general program on GPU, named CUDA (Compute Unified Device Architecture), in 2006. The CUDA programming model allows a group of same instructions to execute on multi-thread simultaneously, which has advantage of parallel programs in reducing the execution time significantly. Although CUDA provides a series of C-like APIs (Application Programming Interface) so that programmers can easy use CUDA language, it still costs certain efforts to be familiar with the development. In this thesis, we propose a tool to automatically translate C programs into corresponding CUDA programs which reduce program development time effectively.
Advisors/Committee Members: Wei-Kuang Lai (chair), Chun-I Fan (chair), Ying-Chih Lin (chair), Chun-Hung Lin (committee member).
Subjects/Keywords: CUDA; Multi-Thread; Parallelization; GPU
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Yang, Y. (2010). Design and Implementation of C Programming Language Extension for Parallel GPU Computing. (Thesis). NSYSU. Retrieved from http://etd.lib.nsysu.edu.tw/ETD-db/ETD-search/view_etd?URN=etd-0727110-153900
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
Chicago Manual of Style (16th Edition):
Yang, Yu-Wei. “Design and Implementation of C Programming Language Extension for Parallel GPU Computing.” 2010. Thesis, NSYSU. Accessed March 06, 2021.
http://etd.lib.nsysu.edu.tw/ETD-db/ETD-search/view_etd?URN=etd-0727110-153900.
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
MLA Handbook (7th Edition):
Yang, Yu-Wei. “Design and Implementation of C Programming Language Extension for Parallel GPU Computing.” 2010. Web. 06 Mar 2021.
Vancouver:
Yang Y. Design and Implementation of C Programming Language Extension for Parallel GPU Computing. [Internet] [Thesis]. NSYSU; 2010. [cited 2021 Mar 06].
Available from: http://etd.lib.nsysu.edu.tw/ETD-db/ETD-search/view_etd?URN=etd-0727110-153900.
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
Council of Science Editors:
Yang Y. Design and Implementation of C Programming Language Extension for Parallel GPU Computing. [Thesis]. NSYSU; 2010. Available from: http://etd.lib.nsysu.edu.tw/ETD-db/ETD-search/view_etd?URN=etd-0727110-153900
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation

University of Rochester
2.
Bai, Tongxin (1977 - ).
Program parallelization through safe dependence hints and
all-context dependence analysis.
Degree: PhD, 2012, University of Rochester
URL: http://hdl.handle.net/1802/19229
► Speculative parallelization divides a sequential program into possibly parallel tasks and permits these tasks to run in parallel if and only if they show no…
(more)
▼ Speculative parallelization divides a sequential
program into possibly parallel tasks and permits these tasks to run
in parallel if and only if they show no dependences with each
other. The parallelization is safe in that a speculative execution
always produces the same output as the sequential execution. Most
previous systems allow speculation to succeed only if program tasks
are completely independent, i.e. embarrassingly parallel. The goal
of this dissertation is to extend safe parallelization in the
presence of dependences and in particular to identify and support
tasks with partial or conditional parallelism. The dissertation
makes mainly two contributions. The first is safe dependence hints,
an interface for a user to express partial parallelism so
speculative tasks can communicate and synchronize with each other.
The interface extends Cytron's post-wait and recent OpenMP ordering
primitives and makes them safe and safely composable. Dependence
hints are based on channel communication. A unique feature is
channel chaining to express conditional dependences. The second is
parallelization support. The thesis describes STAPLE, a system for
finding safe task parallelism by first analyzing a program using a
compiler and then analyzing its executions using a profiler. The
STAPLE compiler collects profiles for all program constructs
including loops and functions in a single run. After profiling,
STAPLE ranks program constructs by their potential for improving
the whole-program performance. STAPLE analysis proceeds in two
levels. The first analyzes potential parallelism assuming complete
data privatization to remove false dependences. It considers both
loop and function tasks but assumes no reordering of statements
within a loop or function. Often the parallelism can be enhanced by
reordering dependent operations. The second-level analysis
identifies opportunities for such reordering and computes the
increase in parallelism. It combines the (context) tree based
dependence profile and the code-based dependence graph to analyze
and emulate the effect of parallelism enhancing code
transformations. Dependence analysis is costly. It must track all
accesses to all data so not to miss a single dependence.
Previously, loop profilers analyze one loop at a time and ignore
dependences outside the loop. In task profiling, STAPLE has to
consider all dependences in a complete execution, including the
effect of abnormal control flow such as exceptions, which
complicates context tracking. The compiler support is built using
GCC. A set of optimizations is devised to reduce the cost. The
resulting tool is robust and efficient enough to evaluate all SPEC
CPU2006 integer benchmarks (on training inputs). The source code
may have thousands of nested loops and recursive functions (as in
GCC itself), and the unmodified run time can be over 4
minutes.
Subjects/Keywords: Dependence; Parallelization; Profiling; Speculation
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Bai, T. (. -. ). (2012). Program parallelization through safe dependence hints and
all-context dependence analysis. (Doctoral Dissertation). University of Rochester. Retrieved from http://hdl.handle.net/1802/19229
Chicago Manual of Style (16th Edition):
Bai, Tongxin (1977 - ). “Program parallelization through safe dependence hints and
all-context dependence analysis.” 2012. Doctoral Dissertation, University of Rochester. Accessed March 06, 2021.
http://hdl.handle.net/1802/19229.
MLA Handbook (7th Edition):
Bai, Tongxin (1977 - ). “Program parallelization through safe dependence hints and
all-context dependence analysis.” 2012. Web. 06 Mar 2021.
Vancouver:
Bai T(-). Program parallelization through safe dependence hints and
all-context dependence analysis. [Internet] [Doctoral dissertation]. University of Rochester; 2012. [cited 2021 Mar 06].
Available from: http://hdl.handle.net/1802/19229.
Council of Science Editors:
Bai T(-). Program parallelization through safe dependence hints and
all-context dependence analysis. [Doctoral Dissertation]. University of Rochester; 2012. Available from: http://hdl.handle.net/1802/19229

Penn State University
3.
Parimalarangan, Sindhuja.
Fast Parallel Triad Census and Triangle Listing on Shared-Memory Platforms.
Degree: 2016, Penn State University
URL: https://submit-etda.libraries.psu.edu/catalog/28930
► Triad census and triangle counting are essential graph analysis measures used in areas such as social network analysis and systems biology. Triad census is a…
(more)
▼ Triad census and triangle counting are essential graph analysis measures used in areas such as social network analysis and systems biology. Triad census is a graph analytic for comparative network analysis and to characterize local structure in directed networks. For large sparse graphs, an algorithm by Batagelj and Mrvar is considered the state-of-the-art for computing triad census. In this work, we present a new parallel algorithm for triad census. Our algorithm takes advantage of a specific graph vertex identifier ordering to reduce the operation count. We also develop several new variants for exact triangle counting and triangle listing in large sparse, undirected graphs. Further, we implement a parallel sampling-based algorithm for approximate triangle counting. We show that our parallel triangle counting variants outperform other recently-developed triangle counting methods on current Intel multicore and manycore processors. We also achieve good strong scaling for both triad census and triangle counting on these platforms.
Advisors/Committee Members: Kamesh Madduri, Thesis Advisor/Co-Advisor.
Subjects/Keywords: triad census; triangle counting; parallelization
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Parimalarangan, S. (2016). Fast Parallel Triad Census and Triangle Listing on Shared-Memory Platforms. (Thesis). Penn State University. Retrieved from https://submit-etda.libraries.psu.edu/catalog/28930
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
Chicago Manual of Style (16th Edition):
Parimalarangan, Sindhuja. “Fast Parallel Triad Census and Triangle Listing on Shared-Memory Platforms.” 2016. Thesis, Penn State University. Accessed March 06, 2021.
https://submit-etda.libraries.psu.edu/catalog/28930.
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
MLA Handbook (7th Edition):
Parimalarangan, Sindhuja. “Fast Parallel Triad Census and Triangle Listing on Shared-Memory Platforms.” 2016. Web. 06 Mar 2021.
Vancouver:
Parimalarangan S. Fast Parallel Triad Census and Triangle Listing on Shared-Memory Platforms. [Internet] [Thesis]. Penn State University; 2016. [cited 2021 Mar 06].
Available from: https://submit-etda.libraries.psu.edu/catalog/28930.
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
Council of Science Editors:
Parimalarangan S. Fast Parallel Triad Census and Triangle Listing on Shared-Memory Platforms. [Thesis]. Penn State University; 2016. Available from: https://submit-etda.libraries.psu.edu/catalog/28930
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation

Colorado State University
4.
Krieger, Christopher D.
Generalized full sparse tiling of loop chains.
Degree: PhD, Computer Science, 2013, Colorado State University
URL: http://hdl.handle.net/10217/80954
► Computer and computational scientists are tackling increasingly large and complex problems and are seeking ways of improving the performance of their codes. The key issue…
(more)
▼ Computer and computational scientists are tackling increasingly large and complex problems and are seeking ways of improving the performance of their codes. The key issue faced is how to reach an effective balance between parallelism and locality. In trying to reach this balance, a problem commonly encountered is that of ascertaining the data dependences. Approaches that rely on automatic extraction of data dependences are frequently stymied by complications such as interprocedural and alias analysis. Placing the dependence analysis burden upon the programmer creates a significant barrier to adoption. In this work, we present a new programming abstraction, the loop chain, that specifies a series of loops and the data they access. Given this abstraction, a compiler, inspector, or runtime optimizer can avoid the computationally expensive process of formally determining data dependences, yet still determine beneficial and legal data and iteration reorderings. One optimization method that has been previously applied to irregular scientific codes is full sparse tiling. Full sparse tiling has been used to improve the performance of a handful of scientific codes, but in each case the technique had to be applied from scratch by an expert after careful manual analysis of the possible data dependence patterns. The full sparse tiling approach was extended and generalized as part of this work to apply to any code represented by the loop chain abstraction. Using only the abstraction, the generalized algorithm can produce a new data and iteration ordering as well as a parallel execution schedule. Insight into tuning a generalized full sparse tiled application was gained through a study of the different factors influencing tile count. This work lays the foundation for an efficient autotuning approach to optimizing tile count.
Advisors/Committee Members: Strout, Michelle Mills (advisor), Böhm, Wim (committee member), Rajopadhye, Sanjay (committee member), Mueller, Jennifer (committee member).
Subjects/Keywords: compilers; run-time optimization; parallelization
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Krieger, C. D. (2013). Generalized full sparse tiling of loop chains. (Doctoral Dissertation). Colorado State University. Retrieved from http://hdl.handle.net/10217/80954
Chicago Manual of Style (16th Edition):
Krieger, Christopher D. “Generalized full sparse tiling of loop chains.” 2013. Doctoral Dissertation, Colorado State University. Accessed March 06, 2021.
http://hdl.handle.net/10217/80954.
MLA Handbook (7th Edition):
Krieger, Christopher D. “Generalized full sparse tiling of loop chains.” 2013. Web. 06 Mar 2021.
Vancouver:
Krieger CD. Generalized full sparse tiling of loop chains. [Internet] [Doctoral dissertation]. Colorado State University; 2013. [cited 2021 Mar 06].
Available from: http://hdl.handle.net/10217/80954.
Council of Science Editors:
Krieger CD. Generalized full sparse tiling of loop chains. [Doctoral Dissertation]. Colorado State University; 2013. Available from: http://hdl.handle.net/10217/80954

University of Michigan
5.
Mehrara, Mojtaba.
Compiler and Runtime Techniques for Automatic Parallelization of Sequential Applications.
Degree: PhD, Computer Science & Engineering, 2011, University of Michigan
URL: http://hdl.handle.net/2027.42/86499
► Multicore designs have emerged as the mainstream design paradigm for the microprocessor industry. Unfortunately, providing multiple cores does not directly translate into performance for most…
(more)
▼ Multicore designs have emerged as the mainstream design paradigm for the microprocessor
industry. Unfortunately, providing multiple cores does not directly translate into performance
for most applications. An attractive approach for exploiting multiple cores is to rely on tools,
both compilers and runtime optimizers, to automatically extract threads from sequential applications. This dissertation tackles many challenges faced in automatic
parallelization of sequential applications, including general-purpose applications written in C/C++ and client-side web applications written in JavaScript, with the goal of achieving speedup on commodity multicore systems. First, a complete parallelizing compiler system for C/C++ is introduced. This system successfully identifies
parallelization opportunities in programs and transforms the code to a parallel version. A matching runtime system, STMlite, is proposed to monitor the parallelized program behavior and fix any misspeculations that might happen. We show that this system can generate and execute parallel programs that are upto 2.2x faster than their sequential counterparts, when executed on an 8-core commodity system.
The second piece of work focuses on a similar problem in a very different application domain,
JavaScript programs running on the client’s web browser. This dissertation is the first research work that proposes dynamic and automatic
parallelization of JavaScript applications.
The nature of the JavaScript language and its target execution environments impose a completely
different set of challenges that we intend to solve. We first propose the ParaScript parallelizing engine, which identifies and speculatively parallelizes potentially parallel code segments while the code is running in the browser. A low-cost and highly customized speculation approach verifies the execution of the parallelized client-side code and rolls back in case of any misspeculation. Dynamic
parallelization using ParaScript yields an average of 2.18x speedup over sequential JavaScript code on an 8-core commodity system. In addition, we introduce ParaGuard, a technique which executes the runtime checks required by trace-based JavaScript compilers in parallel with the main execution. ParaGuard is able to improve performance by 15% by using an additional core in multi-core systems.
Advisors/Committee Members: Mahlke, Scott (committee member), Austin, Todd M. (committee member), Dick, Robert (committee member), Harris, Tim (committee member), Narayanasamy, Satish (committee member).
Subjects/Keywords: Speculative Parallelization; Software Transactional Memory; JavaScript Parallelization; Multicore Compilation; Dynamic Parallelization; Loop Level Parallelism; Computer Science; Engineering
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Mehrara, M. (2011). Compiler and Runtime Techniques for Automatic Parallelization of Sequential Applications. (Doctoral Dissertation). University of Michigan. Retrieved from http://hdl.handle.net/2027.42/86499
Chicago Manual of Style (16th Edition):
Mehrara, Mojtaba. “Compiler and Runtime Techniques for Automatic Parallelization of Sequential Applications.” 2011. Doctoral Dissertation, University of Michigan. Accessed March 06, 2021.
http://hdl.handle.net/2027.42/86499.
MLA Handbook (7th Edition):
Mehrara, Mojtaba. “Compiler and Runtime Techniques for Automatic Parallelization of Sequential Applications.” 2011. Web. 06 Mar 2021.
Vancouver:
Mehrara M. Compiler and Runtime Techniques for Automatic Parallelization of Sequential Applications. [Internet] [Doctoral dissertation]. University of Michigan; 2011. [cited 2021 Mar 06].
Available from: http://hdl.handle.net/2027.42/86499.
Council of Science Editors:
Mehrara M. Compiler and Runtime Techniques for Automatic Parallelization of Sequential Applications. [Doctoral Dissertation]. University of Michigan; 2011. Available from: http://hdl.handle.net/2027.42/86499
6.
Garcia, Ted.
Analysis of big data technologies and methods: query large web public rdf datasets on Amazon cloud using Hadoop and open source parsers.
Degree: MS, Computer Science, 2013, California State University – Northridge
URL: http://hdl.handle.net/10211.2/3208
► Querying large datasets has become easier with Big Data technologies such as Hadoop's MapReduce. Large public datasets are becoming more available and can be found…
(more)
▼ Querying large datasets has become easier with Big Data technologies such as Hadoop's MapReduce. Large public datasets are becoming more available and can be found on the Amazon Web Service (AWS) Cloud. In particular, Web Data Commons (Web Data Commons, 2012) has extracted and posted RDF Quads from the Common Crawl Corpus (Common Crawl, 2012) found on AWS which comprises over five billion web pages of the Internet. Technologies and methods are in their infancy when attempting to process and query these large web RDF datasets. For example, within the last couple of years, AWS and Elastic MapReduce (EMR) have provided processing of large files with
parallelization and a distributed file system. RDF technologies and methods have existed for some time and the tools are available commercially and open source. RDF Parsers and databases are being used successfully with moderately sized datasets. However, the use and analysis of RDF tools against large datasets, especially in a distributed environment, is relatively new.
In order to assist the BigData developer, this work explores several open source parsing tools and how they perform in Hadoop on the Amazon Cloud. Apache Any23 (Apache Any23, 2012), Apache Jena RIOT/ARQ (Apache Jena, 2013), and SemanticWeb.com's NxParser (NX Parser, 2012) are open source parsers that can process the RDF quads contained in the Web Data Commons files. In order to achieve the highest performance, it is essential to work with large datasets without preprocessing or importing them into a database. Therefore, parsing and querying will be done on the raw Web Data Commons files. Since the parsers do not all have query support, they will be analyzed with extract and parse functionality only. This work includes challenges and lessons learned from using these parsing tools in the Hadoop and Amazon Cloud environments and suggests future research areas.
Advisors/Committee Members: Wang, Taehyung (advisor), Schwartz, Diane L. (committee member).
Subjects/Keywords: parallelization; Dissertations, Academic – CSUN – Computer Science.
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Garcia, T. (2013). Analysis of big data technologies and methods: query large web public rdf datasets on Amazon cloud using Hadoop and open source parsers. (Masters Thesis). California State University – Northridge. Retrieved from http://hdl.handle.net/10211.2/3208
Chicago Manual of Style (16th Edition):
Garcia, Ted. “Analysis of big data technologies and methods: query large web public rdf datasets on Amazon cloud using Hadoop and open source parsers.” 2013. Masters Thesis, California State University – Northridge. Accessed March 06, 2021.
http://hdl.handle.net/10211.2/3208.
MLA Handbook (7th Edition):
Garcia, Ted. “Analysis of big data technologies and methods: query large web public rdf datasets on Amazon cloud using Hadoop and open source parsers.” 2013. Web. 06 Mar 2021.
Vancouver:
Garcia T. Analysis of big data technologies and methods: query large web public rdf datasets on Amazon cloud using Hadoop and open source parsers. [Internet] [Masters thesis]. California State University – Northridge; 2013. [cited 2021 Mar 06].
Available from: http://hdl.handle.net/10211.2/3208.
Council of Science Editors:
Garcia T. Analysis of big data technologies and methods: query large web public rdf datasets on Amazon cloud using Hadoop and open source parsers. [Masters Thesis]. California State University – Northridge; 2013. Available from: http://hdl.handle.net/10211.2/3208

Jawaharlal Nehru University
7.
Varshneya, Renu.
Parallelization of Hierarchical Censored Production
Rules; -.
Degree: Computer Science, 1995, Jawaharlal Nehru University
URL: http://shodhganga.inflibnet.ac.in/handle/10603/14261
► A standard production rule is expressed in the form IF ltconditiongt THEN ltactiongt newlineProduction systems are widely used in Artificial Intelligence for modeling newlineintelligent behavior…
(more)
▼ A standard production rule is expressed in the form
IF ltconditiongt THEN ltactiongt newlineProduction systems are
widely used in Artificial Intelligence for modeling
newlineintelligent behavior and building expert systems. However,
standard production newlinesystems have a rigid structure as they
cannot handle incomplete and imprecise newlineknowledge, which make
them less flexible for adaptation. To capture the newlineuncertain
and imprecise knowledge about the real world Michalski and Winston
newlineintroduced the concept of Variable Precision Logic and
suggested Censored newlineProduction Rules (CPRs) as an underlying
representational and computational newlinemechanism to enable logic
based systems to exhibit variable precision in which
newlinecertainty varies while specificity remains constant. A CPR
is a production rule newlineaugmented with exception conditions,
with the following representation newlineIF ltconditiongt THEN
ltactiongt UNLESS ltcensorgt newlinewhere ltcensofgt is an
exception to the rule. newlineA CPR is quotunable to capture the
taxonomic structure inherent in the knowledge newlineabout the real
world. Bharadwaj and Jain have extended the concept of CPRs
newlineby introducing two new operators to them, viz., GENERALITY
and newlineSPECIFICITY to represent the more general and specific
information, and newlinecalled them Hierarchical Censored
Production Rules (HCPRs). HCPRs can be newlinemade to exhibit
variable precision in reasoning such that both certainty in belief
newlinein a conclusion and its specit1city may be controlled by the
reasoning process. newlineThe general form of an HCPR is newlineIF
B [ bl, b2, ... , bn] newlineTHEN A newlineUNLESS C [ cl, c2, ...
,en] newline{ preconditions } newline{decision I action} newline{
censor conditions } newlineii newlineGENERALITY G
newlineSPECIFICITY S [ sl, s2, ... , sn] newline{ general
information } newline{ specific information } newlineHCPR systems
that support vanous symbolic and genetic based machine
newlinelearning, have been found very useful in developing
knowledge based systems, newlinewith learning capabilities and are
capable of adjusting the certainty of inferences newlineto conform
to time and other resource constraints. Such systems have
newlinenumerous applications in situations
Bibliography p.128, Tables, Figures
given
Advisors/Committee Members: Bharadwaj, K K.
Subjects/Keywords: Computer Science; System Science; Parallelization; Hierarchial; censored
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Varshneya, R. (1995). Parallelization of Hierarchical Censored Production
Rules; -. (Thesis). Jawaharlal Nehru University. Retrieved from http://shodhganga.inflibnet.ac.in/handle/10603/14261
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
Chicago Manual of Style (16th Edition):
Varshneya, Renu. “Parallelization of Hierarchical Censored Production
Rules; -.” 1995. Thesis, Jawaharlal Nehru University. Accessed March 06, 2021.
http://shodhganga.inflibnet.ac.in/handle/10603/14261.
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
MLA Handbook (7th Edition):
Varshneya, Renu. “Parallelization of Hierarchical Censored Production
Rules; -.” 1995. Web. 06 Mar 2021.
Vancouver:
Varshneya R. Parallelization of Hierarchical Censored Production
Rules; -. [Internet] [Thesis]. Jawaharlal Nehru University; 1995. [cited 2021 Mar 06].
Available from: http://shodhganga.inflibnet.ac.in/handle/10603/14261.
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
Council of Science Editors:
Varshneya R. Parallelization of Hierarchical Censored Production
Rules; -. [Thesis]. Jawaharlal Nehru University; 1995. Available from: http://shodhganga.inflibnet.ac.in/handle/10603/14261
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation

Texas A&M University
8.
Liang, Da.
Design and Implementation of Parallel Computing Models for Solar Radiation Simulation.
Degree: MS, Computer Engineering, 2015, Texas A&M University
URL: http://hdl.handle.net/1969.1/156418
► In order to simulate geographical phenomenon, many complex and high precision models have been developed by scientists. But at most time common hardware and implementation…
(more)
▼ In order to simulate geographical phenomenon, many complex and high precision models have been developed by scientists. But at most time common hardware and implementation of those computation models are not capable of processing large amounts of data, and the time performance might be unacceptable. Nowadays, the growth in the speed of modern graphics processing units is incredible, and the flops/dollar radio provided by GPU is also growing very fast, which makes large scale GPU clusters gain popularity in the scientific computing community. However, GPU programming and clusters' software deployment and development are associated with a number of challenges.
In this thesis, the geo-science model developed by I. D. Dobreva and M. P. Bishop proposed in A Spatial Temporal, Topographic and Spectral GIS based Solar Radiation Model (SRM) was analyzed. I built a heterogeneous cluster and developed its software framework which could provide powerful computation service for complex geographic models. Time performance and computation accuracy has been analyzed. Issues and challenges such as GPU programming, job balancing and scheduling are addressed.
The SRM application running on this framework can process data fast enough and be able to give researchers rendering images as feedback in a short time, which improved the performance by hundreds of times when compared to the current performance in our available hardware, and the speedup can easily be scaled by adding new machines.
Advisors/Committee Members: Liu, Jyh-Charn (advisor), Sarin, Vivek (committee member), Bishop, Michael P. (committee member).
Subjects/Keywords: solar radiation; simulation; parallelization; GPU; heterogeneous cluster
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Liang, D. (2015). Design and Implementation of Parallel Computing Models for Solar Radiation Simulation. (Masters Thesis). Texas A&M University. Retrieved from http://hdl.handle.net/1969.1/156418
Chicago Manual of Style (16th Edition):
Liang, Da. “Design and Implementation of Parallel Computing Models for Solar Radiation Simulation.” 2015. Masters Thesis, Texas A&M University. Accessed March 06, 2021.
http://hdl.handle.net/1969.1/156418.
MLA Handbook (7th Edition):
Liang, Da. “Design and Implementation of Parallel Computing Models for Solar Radiation Simulation.” 2015. Web. 06 Mar 2021.
Vancouver:
Liang D. Design and Implementation of Parallel Computing Models for Solar Radiation Simulation. [Internet] [Masters thesis]. Texas A&M University; 2015. [cited 2021 Mar 06].
Available from: http://hdl.handle.net/1969.1/156418.
Council of Science Editors:
Liang D. Design and Implementation of Parallel Computing Models for Solar Radiation Simulation. [Masters Thesis]. Texas A&M University; 2015. Available from: http://hdl.handle.net/1969.1/156418

Penn State University
9.
Chatterjee, Bodhisatwa.
Optimization of Modern Compilers using Approximate Computing.
Degree: 2020, Penn State University
URL: https://submit-etda.libraries.psu.edu/catalog/17657bxc583
► Dependence tests in modern compilers are conservative and inexact in nature; if the non-existence of a dependence cannot be proved, then a dependence is always…
(more)
▼ Dependence tests in modern compilers are conservative and inexact in nature; if the non-existence of a dependence cannot be proved, then a dependence is always assumed. This conservative strategy prompts compilers relying on them to follow sequential execution order for the cases where a dependency `may be' present, for sake of preserving program semantics just in case. In this work, we focus on the non-conventional idea of relaxing these `maybe-dependencies' and incorporate its consequences as an approximate computing approach for optimizing compilers. Parallelizing code fragments which are maybe-dependent might result either in abnormal behavior of program (segmentation faults or errors) or superior performance with increased parallelism or both. Regardless, we reason that in any case, there is ample chance of obtaining superior performance in certain applications. First, we present a LLVM-based framework which captures maybe-dependencies in applications. We enlist their causes and potential ways to avoid them. Then we provide a mechanism for systematically relaxing them and discuss its implications on the program semantics and application performance. We present the results of our experiments for the GAP Benchmark Suite.
Advisors/Committee Members: Mahmut Taylan Kandemir, Thesis Advisor/Co-Advisor, Chitaranjan Das, Program Head/Chair, Kamesh Madduri, Committee Member.
Subjects/Keywords: Compiler Optimization; Approximate Computing; Dependencies; Parallelization
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Chatterjee, B. (2020). Optimization of Modern Compilers using Approximate Computing. (Thesis). Penn State University. Retrieved from https://submit-etda.libraries.psu.edu/catalog/17657bxc583
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
Chicago Manual of Style (16th Edition):
Chatterjee, Bodhisatwa. “Optimization of Modern Compilers using Approximate Computing.” 2020. Thesis, Penn State University. Accessed March 06, 2021.
https://submit-etda.libraries.psu.edu/catalog/17657bxc583.
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
MLA Handbook (7th Edition):
Chatterjee, Bodhisatwa. “Optimization of Modern Compilers using Approximate Computing.” 2020. Web. 06 Mar 2021.
Vancouver:
Chatterjee B. Optimization of Modern Compilers using Approximate Computing. [Internet] [Thesis]. Penn State University; 2020. [cited 2021 Mar 06].
Available from: https://submit-etda.libraries.psu.edu/catalog/17657bxc583.
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
Council of Science Editors:
Chatterjee B. Optimization of Modern Compilers using Approximate Computing. [Thesis]. Penn State University; 2020. Available from: https://submit-etda.libraries.psu.edu/catalog/17657bxc583
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation

University of Toronto
10.
Huang, Diego.
Programmer-assisted Automatic Parallelization.
Degree: 2011, University of Toronto
URL: http://hdl.handle.net/1807/30628
► Parallel software is now required to exploit the abundance of threads and processors in modern multicore computers. Unfortunately, manual parallelization is too time-consuming and error-prone…
(more)
▼ Parallel software is now required to exploit the abundance of threads and processors in modern multicore computers. Unfortunately, manual parallelization is too time-consuming and error-prone for all but the most advanced programmers. While automatic parallelization promises threaded software with little programmer effort, current auto-parallelizers are easily thwarted by pointers and other forms of ambiguity in the code. In this dissertation we profile the loops in SPEC CPU2006, categorize the loops in terms of available parallelism, and focus on promising loops that are not parallelized by IBM's XL C/C++ V10 auto-parallelizer. For those loops we propose methods of improved interaction between the programmer and compiler that can facilitate their parallelization. In particular, we (i) suggest methods for the compiler to better identify to the programmer the parallelization-blockers; (ii) suggest methods for the programmer to provide guarantees to the compiler that overcome these parallelization-blockers; and (iii) evaluate the resulting impact on performance.
MAST
Advisors/Committee Members: Steffan, J. Gregory, Electrical and Computer Engineering.
Subjects/Keywords: compiler; automatic parallelization; programmer guarantees; 0984; 0537
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Huang, D. (2011). Programmer-assisted Automatic Parallelization. (Masters Thesis). University of Toronto. Retrieved from http://hdl.handle.net/1807/30628
Chicago Manual of Style (16th Edition):
Huang, Diego. “Programmer-assisted Automatic Parallelization.” 2011. Masters Thesis, University of Toronto. Accessed March 06, 2021.
http://hdl.handle.net/1807/30628.
MLA Handbook (7th Edition):
Huang, Diego. “Programmer-assisted Automatic Parallelization.” 2011. Web. 06 Mar 2021.
Vancouver:
Huang D. Programmer-assisted Automatic Parallelization. [Internet] [Masters thesis]. University of Toronto; 2011. [cited 2021 Mar 06].
Available from: http://hdl.handle.net/1807/30628.
Council of Science Editors:
Huang D. Programmer-assisted Automatic Parallelization. [Masters Thesis]. University of Toronto; 2011. Available from: http://hdl.handle.net/1807/30628

Brigham Young University
11.
McNabb, Andrew W.
Parallel Particle Swarm Optimization and Large Swarms.
Degree: MS, 2011, Brigham Young University
URL: https://scholarsarchive.byu.edu/cgi/viewcontent.cgi?article=3479&context=etd
► Optimization is the search for the maximum or minimum of a given objective function. Particle Swarm Optimization (PSO) is a simple and effective evolutionary algorithm,…
(more)
▼ Optimization is the search for the maximum or minimum of a given objective function. Particle Swarm Optimization (PSO) is a simple and effective evolutionary algorithm, but it may take hours or days to optimize difficult objective functions which are deceptive or expensive. Deceptive functions may be highly multimodal and multidimensional, and PSO requires extensive exploration to avoid being trapped in local optima. Expensive functions, whose computational complexity may arise from dependence on detailed simulations or large datasets, take a long time to evaluate. For deceptive or expensive objective functions, PSO must be parallelized to use multiprocessor systems and clusters efficiently. This thesis investigates the implications of parallelizing PSO and in particular, the details of parallelization and the effects of large swarms. PSO can be expressed naturally in Google's MapReduce framework to develop a simple and robust parallel implementation that automatically includes communication, load balancing, and fault tolerance. This flexible implementation makes it easy to apply modifications to the algorithm, such as those that improve optimization of difficult objective functions and improve parallel performance. Results show that larger swarms help with both of these goals, but they are most effective if arranged into sparse topologies with lower overhead from communication. Additionally, PSO must be modified to use communication more efficiently in a large sparse swarm for objective functions where information ideally flows quickly through a large swarm. Swarm size is usually fixed at a modest number around 50, but particularly in a parallel computational environment, much larger swarms are much more effective for deceptive objective functions. Likewise, swarms much smaller than 50 are more effective for expensive but less deceptive functions. In general, swarm size should be carefully chosen using all available information about the objective function and computational environment.
Subjects/Keywords: particle swarm optimization; parallelization; Computer Sciences
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
McNabb, A. W. (2011). Parallel Particle Swarm Optimization and Large Swarms. (Masters Thesis). Brigham Young University. Retrieved from https://scholarsarchive.byu.edu/cgi/viewcontent.cgi?article=3479&context=etd
Chicago Manual of Style (16th Edition):
McNabb, Andrew W. “Parallel Particle Swarm Optimization and Large Swarms.” 2011. Masters Thesis, Brigham Young University. Accessed March 06, 2021.
https://scholarsarchive.byu.edu/cgi/viewcontent.cgi?article=3479&context=etd.
MLA Handbook (7th Edition):
McNabb, Andrew W. “Parallel Particle Swarm Optimization and Large Swarms.” 2011. Web. 06 Mar 2021.
Vancouver:
McNabb AW. Parallel Particle Swarm Optimization and Large Swarms. [Internet] [Masters thesis]. Brigham Young University; 2011. [cited 2021 Mar 06].
Available from: https://scholarsarchive.byu.edu/cgi/viewcontent.cgi?article=3479&context=etd.
Council of Science Editors:
McNabb AW. Parallel Particle Swarm Optimization and Large Swarms. [Masters Thesis]. Brigham Young University; 2011. Available from: https://scholarsarchive.byu.edu/cgi/viewcontent.cgi?article=3479&context=etd
12.
Edler Von Koch, Tobias Joseph Kastulus.
Automated detection of structured coarse-grained parallelism in sequential legacy applications.
Degree: PhD, 2014, University of Edinburgh
URL: http://hdl.handle.net/1842/9976
► The efficient execution of sequential legacy applications on modern, parallel computer architectures is one of today’s most pressing problems. Automatic parallelization has been investigated as…
(more)
▼ The efficient execution of sequential legacy applications on modern, parallel computer architectures is one of today’s most pressing problems. Automatic parallelization has been investigated as a potential solution for several decades but its success generally remains restricted to small niches of regular, array-based applications. This thesis investigates two techniques that have the potential to overcome these limitations. Beginning at the lowest level of abstraction, the binary executable, it presents a study of the limits of Dynamic Binary Parallelization (Dbp), a recently proposed technique that takes advantage of an underlying multicore host to transparently parallelize a sequential binary executable. While still in its infancy, Dbp has received broad interest within the research community. This thesis seeks to gain an understanding of the factors contributing to the limits of Dbp and the costs and overheads of its implementation. An extensive evaluation using a parameterizable Dbp system targeting a Cmp with light-weight architectural Tls support is presented. The results show that there is room for a significant reduction of up to 54% in the number of instructions on the critical paths of legacy Spec Cpu2006 benchmarks, but that it is much harder to translate these savings into actual performance improvements, with a realistic hardware-supported implementation achieving a speedup of 1.09 on average. While automatically parallelizing compilers have traditionally focused on data parallelism, additional parallelism exists in a plethora of other shapes such as task farms, divide & conquer, map/reduce and many more. These algorithmic skeletons, i.e. high-level abstractions for commonly used patterns of parallel computation, differ substantially from data parallel loops. Unfortunately, algorithmic skeletons are largely informal programming abstractions and are lacking a formal characterization in terms of established compiler concepts. This thesis develops compiler-friendly characterizations of popular algorithmic skeletons using a novel notion of commutativity based on liveness. A hybrid static/dynamic analysis framework for the context-sensitive detection of skeletons in legacy code that overcomes limitations of static analysis by complementing it with profiling information is described. A proof-of-concept implementation of this framework in the Llvm compiler infrastructure is evaluated against Spec Cpu2006 benchmarks for the detection of a typical skeleton. The results illustrate that skeletons are often context-sensitive in nature. Like the two approaches presented in this thesis, many dynamic parallelization techniques exploit the fact that some statically detected data and control flow dependences do not manifest themselves in every possible program execution (may-dependences) but occur only infrequently, e.g. for some corner cases, or not at all for any legal program input. While the effectiveness of dynamic parallelization techniques critically depends on the absence of such dependences, not much is known…
Subjects/Keywords: 005.2; compilers; automatic parallelization; multicore; skeletons; programming
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Edler Von Koch, T. J. K. (2014). Automated detection of structured coarse-grained parallelism in sequential legacy applications. (Doctoral Dissertation). University of Edinburgh. Retrieved from http://hdl.handle.net/1842/9976
Chicago Manual of Style (16th Edition):
Edler Von Koch, Tobias Joseph Kastulus. “Automated detection of structured coarse-grained parallelism in sequential legacy applications.” 2014. Doctoral Dissertation, University of Edinburgh. Accessed March 06, 2021.
http://hdl.handle.net/1842/9976.
MLA Handbook (7th Edition):
Edler Von Koch, Tobias Joseph Kastulus. “Automated detection of structured coarse-grained parallelism in sequential legacy applications.” 2014. Web. 06 Mar 2021.
Vancouver:
Edler Von Koch TJK. Automated detection of structured coarse-grained parallelism in sequential legacy applications. [Internet] [Doctoral dissertation]. University of Edinburgh; 2014. [cited 2021 Mar 06].
Available from: http://hdl.handle.net/1842/9976.
Council of Science Editors:
Edler Von Koch TJK. Automated detection of structured coarse-grained parallelism in sequential legacy applications. [Doctoral Dissertation]. University of Edinburgh; 2014. Available from: http://hdl.handle.net/1842/9976
13.
Park, Eun Hyun.
Shared memory parallelization for large scale 3D polyhedral particle simulations.
Degree: PhD, Civil Engineering, 2020, University of Illinois – Urbana-Champaign
URL: http://hdl.handle.net/2142/108001
► Granular materials such as sands, gravels, railroad ballast, and rock are inherently highly heterogeneous and anisotropic. While they are known as one of the most…
(more)
▼ Granular materials such as sands, gravels, railroad ballast, and rock are inherently highly heterogeneous and anisotropic. While they are known as one of the most widely used materials in industry, their complex behaviors remain not fully understood. Particle-based numerical methods were introduced to account for complex particle interactions yet are computationally demanding. Significant algorithmic developments have been made to enhance the computational performance, nevertheless simulations with realistic particle shape are still computationally expensive due to its complex geometry.
In this study, novel parallel algorithms for polyhedral particle simulations were developed and implemented to reduce the computational cost. The
parallelization study showed that the code achieved approximately 30 times speed-up with 48 cores on a LINUX machine. With this parallelized particle-based code, engineering applications were conducted: large-scale particle granular flow simulation, full-scale ballasted track simulations, and parametric study of angle of repose:
The code successfully captured the runout distances of dry granular flow. This novel approach extended the capability of simulation size up to 52 million 3D polyhedral particles.
In the ballast simulation, the simulations employed similar particle sizes and shapes of the ballast, as well as the full-scale geometry as the physical setup. The simulations successfully reproduced the displacement and vibration of ties in the experiment.
In the angle of repose simulation, the simulations investigated the effects of input parameters on microscopic particle interactions by measuring angle of repose. The simulations demonstrated the ability to capture self-organized criticality related to natural complex system by showing the distribution of sliding mass that followed a power law relationship.
The parallelized particle-based simulation extends the limits of application size by reducing computational cost. The parallelized code is successfully exploited for the study of granular material behaviors. The large-scale particle-based simulation contributes our understanding of complex behaviors of granular materials.
Advisors/Committee Members: Hashash, Youssef M.A. (advisor), Hashash, Youssef M.A. (Committee Chair), Tutumluer, Erol (committee member), Ghaboussi, Jamshid (committee member), Olson, Scott M (committee member), Kindratenko, Volodymyr (committee member).
Subjects/Keywords: DiscreteElementMethod; Parallelization
…Increase speed of 3D polyhedral particle simulation by shared memory parallelization.
2)… …full package. The work of parallelization is focused on
BLOKS.anl.
17
2
Figure 1-8 An… …not require force calculation.
2.1.3 Shared memory parallelization for particle-based… …1). Code
parallelization helps to enlarge the scale of particle simulations by… …reducing overall computational
time.
There are different ways of parallelization approaches. A…
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Park, E. H. (2020). Shared memory parallelization for large scale 3D polyhedral particle simulations. (Doctoral Dissertation). University of Illinois – Urbana-Champaign. Retrieved from http://hdl.handle.net/2142/108001
Chicago Manual of Style (16th Edition):
Park, Eun Hyun. “Shared memory parallelization for large scale 3D polyhedral particle simulations.” 2020. Doctoral Dissertation, University of Illinois – Urbana-Champaign. Accessed March 06, 2021.
http://hdl.handle.net/2142/108001.
MLA Handbook (7th Edition):
Park, Eun Hyun. “Shared memory parallelization for large scale 3D polyhedral particle simulations.” 2020. Web. 06 Mar 2021.
Vancouver:
Park EH. Shared memory parallelization for large scale 3D polyhedral particle simulations. [Internet] [Doctoral dissertation]. University of Illinois – Urbana-Champaign; 2020. [cited 2021 Mar 06].
Available from: http://hdl.handle.net/2142/108001.
Council of Science Editors:
Park EH. Shared memory parallelization for large scale 3D polyhedral particle simulations. [Doctoral Dissertation]. University of Illinois – Urbana-Champaign; 2020. Available from: http://hdl.handle.net/2142/108001

Princeton University
14.
Oh, Taewook.
Automatic Exploitation of Input Parallelism
.
Degree: PhD, 2015, Princeton University
URL: http://arks.princeton.edu/ark:/88435/dsp018336h4299
► Parallelism may reside in the input of a program rather than the program itself. A script interpreter, for example, is hard to parallelize because its…
(more)
▼ Parallelism may reside in the input of a program rather than the program itself. A script interpreter, for example, is hard to parallelize because its dynamic behavior is unpredictable until an input script is given. Once the interpreter is combined with the script, the resulting program becomes predictable, and even parallelizable if the input script contains parallelism. Despite recent progress in automatic
parallelization research, however, existing techniques cannot take advantage of the parallelism within program inputs, even when the inputs remain fixed across multiple executions of the program.
This dissertation shows that the automatic exploitation of parallelism within fixed program inputs can be achieved by coupling program specialization with automatic
parallelization techniques. Program specialization marries a program with the values that remain invariant across the program execution, including fixed inputs, and creates a program that is highly optimized for the invariants. The proposed technique exploits program specialization as an enabling transformation for automatic
parallelization; through specialization, the parallelism within the fixed program inputs can be materialized within the specialized program.
First, this dissertation presents Invariant-induced Pattern-based Loop Specialization (IPLS). IPLS folds the parallelism within the program invariants into the specialized program, thereby creating a more complete and predictable program that is easier to parallelize. Second, this dissertation applies automatic speculative
parallelization techniques to specialized programs to exploit parallelism in inputs. As existing techniques fail to extract parallelism from complex programs such as IPLS specialized programs, context-sensitive speculation and optimized design of the speculation run-time system are proposed to improve the applicability and minimize the execution overhead of the parallelized program.
A prototype of the proposed technique is evaluated against two widely-used opensource script interpreters. Experimental results demonstrate the effectiveness of the proposed techniques.
Advisors/Committee Members: August, David I (advisor).
Subjects/Keywords: Automatic Parallelization;
Input Parallelism;
Program Specialization
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Oh, T. (2015). Automatic Exploitation of Input Parallelism
. (Doctoral Dissertation). Princeton University. Retrieved from http://arks.princeton.edu/ark:/88435/dsp018336h4299
Chicago Manual of Style (16th Edition):
Oh, Taewook. “Automatic Exploitation of Input Parallelism
.” 2015. Doctoral Dissertation, Princeton University. Accessed March 06, 2021.
http://arks.princeton.edu/ark:/88435/dsp018336h4299.
MLA Handbook (7th Edition):
Oh, Taewook. “Automatic Exploitation of Input Parallelism
.” 2015. Web. 06 Mar 2021.
Vancouver:
Oh T. Automatic Exploitation of Input Parallelism
. [Internet] [Doctoral dissertation]. Princeton University; 2015. [cited 2021 Mar 06].
Available from: http://arks.princeton.edu/ark:/88435/dsp018336h4299.
Council of Science Editors:
Oh T. Automatic Exploitation of Input Parallelism
. [Doctoral Dissertation]. Princeton University; 2015. Available from: http://arks.princeton.edu/ark:/88435/dsp018336h4299
15.
Jimborean, Alexandra.
Adapting the polytope model for dynamic and speculative parallelization : Adaptation du modèle polyhédrique à la parallélisation dynamique et spéculatice.
Degree: Docteur es, Informatique, 2012, Université de Strasbourg
URL: http://www.theses.fr/2012STRAD020
► Dans cette thèse, nous décrivons la conception et l'implémentation d'une plate-forme logicielle de spéculation de threads, ou fils d'exécution, appelée VMAD, pour "Virtual Machine for…
(more)
▼ Dans cette thèse, nous décrivons la conception et l'implémentation d'une plate-forme logicielle de spéculation de threads, ou fils d'exécution, appelée VMAD, pour "Virtual Machine for Advanced Dynamic analysis and transformation", et dont la fonction principale est d'être capable de paralléliser de manière spéculative un nid de boucles séquentiel de différentes façons, en ré-ordonnançant ses itérations. La transformation à appliquer est sélectionnée au cours de l'exécution avec pour objectifs de minimiser le nombre de retours arrières et de maximiser la performance. Nous effectuons des transformations de code en appliquant le modèle polyédrique que nous avons adapté à la parallélisation spéculative au cours de l'exécution. Pour cela, nous construisons au préalable un patron de code qui est "patché" par notre "runtime", ou support d'exécution logiciel, selon des informations de profilage collectées sur des échantillons du temps d'exécution. L'adaptabilité est assurée en considérant des tranches de code de tailles différentes, qui sont exécutées successivement, chacune étant parallélisée différemment, ou exécutée en séquentiel, selon le comportement des accès à la mémoire observé. Nous montrons, sur plusieurs programmes que notre plate-forme offre de bonnes performances, pour des codes qui n'auraient pas pu être traités efficacement par les systèmes spéculatifs de threads proposés précédemment.
In this thesis, we present a Thread-Level Speculation (TLS) framework whose main feature is to speculatively parallelize a sequential loop nest in various ways, to maximize performance. We perform code transformations by applying the polyhedral model that we adapted for speculative and runtime code parallelization. For this purpose, we designed a parallel code pattern which is patched by our runtime system according to the profiling information collected on some execution samples. We show on several benchmarks that our framework yields good performance on codes which could not be handled efficiently by previously proposed TLS systems.
Advisors/Committee Members: Clauss, Philippe (thesis director).
Subjects/Keywords: Programmation parallèle; Speculative parallelization; Runtime system; Compiler; Polyhedral model; Dynamic optimizations; Loops; Partial parallelism; LLVM; Automatic parallelization; 005
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Jimborean, A. (2012). Adapting the polytope model for dynamic and speculative parallelization : Adaptation du modèle polyhédrique à la parallélisation dynamique et spéculatice. (Doctoral Dissertation). Université de Strasbourg. Retrieved from http://www.theses.fr/2012STRAD020
Chicago Manual of Style (16th Edition):
Jimborean, Alexandra. “Adapting the polytope model for dynamic and speculative parallelization : Adaptation du modèle polyhédrique à la parallélisation dynamique et spéculatice.” 2012. Doctoral Dissertation, Université de Strasbourg. Accessed March 06, 2021.
http://www.theses.fr/2012STRAD020.
MLA Handbook (7th Edition):
Jimborean, Alexandra. “Adapting the polytope model for dynamic and speculative parallelization : Adaptation du modèle polyhédrique à la parallélisation dynamique et spéculatice.” 2012. Web. 06 Mar 2021.
Vancouver:
Jimborean A. Adapting the polytope model for dynamic and speculative parallelization : Adaptation du modèle polyhédrique à la parallélisation dynamique et spéculatice. [Internet] [Doctoral dissertation]. Université de Strasbourg; 2012. [cited 2021 Mar 06].
Available from: http://www.theses.fr/2012STRAD020.
Council of Science Editors:
Jimborean A. Adapting the polytope model for dynamic and speculative parallelization : Adaptation du modèle polyhédrique à la parallélisation dynamique et spéculatice. [Doctoral Dissertation]. Université de Strasbourg; 2012. Available from: http://www.theses.fr/2012STRAD020
16.
Sarvestani, Amin Shafiee.
Automated Recognition of Algorithmic Patterns in DSP Programs.
Degree: The Institute of Technology, 2011, Linköping UniversityLinköping University
URL: http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-73934
► We introduce an extensible knowledge based tool for idiom (pattern) recognition in DSP(digital signal processing) programs. Our tool utilizesfunctionality provided by the Cetus compiler…
(more)
▼ We introduce an extensible knowledge based tool for idiom (pattern) recognition in DSP(digital signal processing) programs. Our tool utilizesfunctionality provided by the Cetus compiler infrastructure fordetecting certain computation patterns that frequently occurin DSP code. We focus on recognizing patterns for for-loops andstatements in their bodies as these often are the performance criticalconstructs in DSP applications for which replacementby highly optimized, target-specific parallel algorithms will bemost profitable. For better structuring and efficiency of patternrecognition, we classify patterns by different levels of complexitysuch that patterns in higher levels are defined in terms of lowerlevel patterns.The tool works statically on the intermediate representation(IR). It traverses the abstract syntax tree IR in post-orderand applies bottom-up pattern matching, at each IR nodeutilizing information about the patterns already matched for itschildren or siblings.For better extensibility and abstraction,most of the structuralpart of recognition rules is specified in XML form to separatethe tool implementation from the pattern specifications.Information about detected patterns will later be used foroptimized code generation by local algorithm replacement e.g. for thelow-power high-throughput multicore DSP architecture ePUMA.
Subjects/Keywords: Automatic Parallelization; Algorithmic Pattern Recognition; Cetus; DSP; DSP Code Parallelization; Compiler Frameworks; Computer Sciences; Datavetenskap (datalogi)
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Sarvestani, A. S. (2011). Automated Recognition of Algorithmic Patterns in DSP Programs. (Thesis). Linköping UniversityLinköping University. Retrieved from http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-73934
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
Chicago Manual of Style (16th Edition):
Sarvestani, Amin Shafiee. “Automated Recognition of Algorithmic Patterns in DSP Programs.” 2011. Thesis, Linköping UniversityLinköping University. Accessed March 06, 2021.
http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-73934.
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
MLA Handbook (7th Edition):
Sarvestani, Amin Shafiee. “Automated Recognition of Algorithmic Patterns in DSP Programs.” 2011. Web. 06 Mar 2021.
Vancouver:
Sarvestani AS. Automated Recognition of Algorithmic Patterns in DSP Programs. [Internet] [Thesis]. Linköping UniversityLinköping University; 2011. [cited 2021 Mar 06].
Available from: http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-73934.
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
Council of Science Editors:
Sarvestani AS. Automated Recognition of Algorithmic Patterns in DSP Programs. [Thesis]. Linköping UniversityLinköping University; 2011. Available from: http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-73934
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation

Universidade do Rio Grande do Sul
17.
Silveira, Jaime Kirch da.
Parallel SAT solvers and their application in automatic parallelization.
Degree: 2014, Universidade do Rio Grande do Sul
URL: http://hdl.handle.net/10183/95373
► Since the slowdown in improvement in the frequency of processors, a new tendency has arisen to allow software to take advantage of faster hardware: parallelization.…
(more)
▼ Since the slowdown in improvement in the frequency of processors, a new tendency has arisen to allow software to take advantage of faster hardware: parallelization. However, different from increasing the frequency of processors, using parallelization requires a different kind of programming, parallel programming, which is usually harder than common sequential programming. In this context, automatic parallelization has arisen, allowing software to take advantage of parallelism without the need of parallel programming. We present here two proposals: SAT-PaDdlinG and RePaSAT. SAT-PaDdlinG is a parallel DPLL SAT Solver on GPU, which allows RePaSAT to use this environment. RePaSAT is our proposal of a parallel machine that uses the SAT Problem to automatically parallelize sequential code. Because GPU provides a cheap, massively parallel environment, SATPaDdlinG aims at providing this massive parallelism and low cost to RePaSAT, as well as to any other tool or problem that uses SAT Solvers.
Desde a diminuição da tendência de aumento na frequência de processadores, uma nova tendência surgiu para permitir que softwares tirem proveito de harwares mais rápidos: a paralelização. Contudo, diferente de aumentar a frequência de processadores, utilizar parallelização requer um tipo diferente de programação, a programação paralela, que é geralmente mais difícil que a programação sequencial comum. Neste contexto, a paralelização automática apareceu, permitindo que o software tire proveito do paralelismo sem a necessidade de programação paralela. Nós apresentamos aqui duas propostas: SAT-PaDdlinG e RePaSAT. SAT-PaDdlinG é um SAT Solver DPLL paralelo que roda em GPU, o que permite que RePaSAT utilize esse ambiente. RePaSAT é a nossa proposta de uma máquina paralela que utiliza o Problema SAT para paralelizar automaticamente código sequencial. Como uma GPU provê um ambiente barato e massivamente paralelo, SAT-PaDdlinG tem como objetivo prover esse paralelismo massivo a baixo custo para RePaSAT, como para qualquer outra ferramenta ou problema que utilize SAT Solvers.
Advisors/Committee Members: Carro, Luigi.
Subjects/Keywords: Microeletrônica; Parallel SAT solver; Processadores; Automatic parallelization; RePaSAT; SAT-PaDdlinG
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Silveira, J. K. d. (2014). Parallel SAT solvers and their application in automatic parallelization. (Thesis). Universidade do Rio Grande do Sul. Retrieved from http://hdl.handle.net/10183/95373
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
Chicago Manual of Style (16th Edition):
Silveira, Jaime Kirch da. “Parallel SAT solvers and their application in automatic parallelization.” 2014. Thesis, Universidade do Rio Grande do Sul. Accessed March 06, 2021.
http://hdl.handle.net/10183/95373.
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
MLA Handbook (7th Edition):
Silveira, Jaime Kirch da. “Parallel SAT solvers and their application in automatic parallelization.” 2014. Web. 06 Mar 2021.
Vancouver:
Silveira JKd. Parallel SAT solvers and their application in automatic parallelization. [Internet] [Thesis]. Universidade do Rio Grande do Sul; 2014. [cited 2021 Mar 06].
Available from: http://hdl.handle.net/10183/95373.
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
Council of Science Editors:
Silveira JKd. Parallel SAT solvers and their application in automatic parallelization. [Thesis]. Universidade do Rio Grande do Sul; 2014. Available from: http://hdl.handle.net/10183/95373
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation

Universidade do Rio Grande do Sul
18.
Esposito, Adelano.
Programação paralela e sequencial aplicada à otimização de estruturas metálicas com o algoritmo PSO.
Degree: 2012, Universidade do Rio Grande do Sul
URL: http://hdl.handle.net/10183/79825
► Um dos métodos heurísticos bastante explorados em engenharia é o PSO (Otimização por enxame de partículas). O PSO é uma meta-heurística baseada em populações de…
(more)
▼ Um dos métodos heurísticos bastante explorados em engenharia é o PSO (Otimização por enxame de partículas). O PSO é uma meta-heurística baseada em populações de indivíduos, na qual candidatos à solução evoluem através da simulação de um modelo simplificado de adaptação social. Este método vem conquistando grande popularidade, no entanto, o elevado número de avaliações da função objetivo limita a sua aplicação em problemas de grande porte de engenharia. Por outro lado, esse algoritmo pode ser facilmente paralelizado, o que torna a computação paralela uma alternativa atraente para sua utilização. Neste trabalho, são desenvolvidas duas versões seriais do algoritmo por enxame de partícula e suas respectivas extensões paralelas. Os algoritmos paralelos, por meio de funções disponíveis na biblioteca do MATLAB®, utilizam os paradigmas mestre-escravo e múltiplas populações, diferindo entre si pela forma de atualização das partículas do enxame (revoada ou pseudo-revoada) bem como pelo modo de comunicação entre os processadores (síncrono ou assíncrono). Os modelos propostos foram aplicados na otimização de problemas clássicos da engenharia estrutural, tradicionalmente encontrados na literatura (benchmarks) e seus resultados são comparados quanto às métricas utilizadas na literatura para avaliação dos algoritmos. Os resultados obtidos demonstram que a computação paralela possibilitou uma melhora no desempenho do algoritmo sequencial assíncrono. Também são registrados bons ganhos de tempo de processamento para as duas extensões paralelas do algoritmo, salvo que o algoritmo paralelo síncrono, diferentemente da versão paralela assíncrona, demonstrou um crescente desempenho computacional à medida que mais processadores são utilizados.
Amongst heuristic algorithms, PSO (Particle Swarm Optimization) is one of the most explored. PSO is a metaheuristic based on a population of individuals, in which solution candidates evolve by simulating a simplified model of social adaptation. This method has becoming popular, however, the large number of evaluations of the objective function limits its application to large-scale engineering problems. On the other hand, this algorithm can easily be parallelized, which makes parallel computation an attractive alternative to be used. In this work, two versions of the serial particle swarm algorithm and their parallel extensions are developed. The parallel algorithms, by means of available MATLAB® functionalities, use the master-slave paradigm and multiple populations, differing from each other by the way the particle swarm is updated (flocking or pseudo-flocking) as well as by the communication between processors (synchronous or asynchronous). The proposed models were applied to the optimization of classical structural engineering problems found in the literature (benchmarks) and the results are compared in terms usual metrics used for algorithm evaluation. The results show that parallel computing has enabled an improvement in the performance of asynchronous parallel algorithm. Good time savings…
Advisors/Committee Members: Miguel, Letícia Fleck Fadel.
Subjects/Keywords: Metaheuristic; Estruturas (Engenharia); Sequential; Processamento paralelo; Otimização matemática; Parallelization; Structural optimization
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Esposito, A. (2012). Programação paralela e sequencial aplicada à otimização de estruturas metálicas com o algoritmo PSO. (Thesis). Universidade do Rio Grande do Sul. Retrieved from http://hdl.handle.net/10183/79825
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
Chicago Manual of Style (16th Edition):
Esposito, Adelano. “Programação paralela e sequencial aplicada à otimização de estruturas metálicas com o algoritmo PSO.” 2012. Thesis, Universidade do Rio Grande do Sul. Accessed March 06, 2021.
http://hdl.handle.net/10183/79825.
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
MLA Handbook (7th Edition):
Esposito, Adelano. “Programação paralela e sequencial aplicada à otimização de estruturas metálicas com o algoritmo PSO.” 2012. Web. 06 Mar 2021.
Vancouver:
Esposito A. Programação paralela e sequencial aplicada à otimização de estruturas metálicas com o algoritmo PSO. [Internet] [Thesis]. Universidade do Rio Grande do Sul; 2012. [cited 2021 Mar 06].
Available from: http://hdl.handle.net/10183/79825.
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
Council of Science Editors:
Esposito A. Programação paralela e sequencial aplicada à otimização de estruturas metálicas com o algoritmo PSO. [Thesis]. Universidade do Rio Grande do Sul; 2012. Available from: http://hdl.handle.net/10183/79825
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation

Universidade do Rio Grande do Sul
19.
Caixeta, Rafael Moniz.
Simulação geoestatística utilizando múltiplos passeios aleatórios.
Degree: 2015, Universidade do Rio Grande do Sul
URL: http://hdl.handle.net/10183/143317
► Simulação geoestatística compreende uma variedade de técnicas que permitem gerar cenários que reproduzem a continuidade espacial e o histograma do fenômeno de interesse. Essas técnicas…
(more)
▼ Simulação geoestatística compreende uma variedade de técnicas que permitem gerar cenários que reproduzem a continuidade espacial e o histograma do fenômeno de interesse. Essas técnicas podem ser usadas para ajudar nas tomadas de decisões, permitindo um acesso à incerteza nas funções de resposta (que dependem dos cenários simulados), geralmente por meio de uma relação não-linear (retorno financeiro, recuperação geometalúrgica...). No entanto, uma de suas limitações é que as simulações podem demandar um tempo considerável para serem executadas em grandes depósitos. E a motivação dessa dissertação se concentra justamente nesse fato, buscando uma alternativa para acelerar o processamento computacional dessas simulações. A opção escolhida para isso foi desenvolver a Simulação via Múltiplos Passeios Aleatórios, que é uma nova abordagem para se realizar simulações geoestatística. Ela combina a krigagem com a simulação de passeios aleatórios independentes, de modo a gerar cenários simulados de uma maneira mais rápida que os algoritmos tradicionais. Essa dissertação apresenta detalhes do método e importantes contribuições desenvolvidas para melhorar o desempenho e a qualidade dos resultados gerados com o método. Foi também desenvolvido um software específico para possibilitar um uso simples, prático e rápido da técnica em qualquer situação (2D ou 3D). Estudos de caso foram feitos para checar a validade das simulações, que demonstraram boa reprodução dos histogramas e variogramas, além de um ganho de velocidade considerável, alcançando uma aceleração de até 5,65 x (em comparação com a Simulação por Bandas Rotativas) na simulação de um depósito de ferro em 3D, desempenho que pode ser melhor ainda à medida que mais dados amostrais estão disponíveis.
Geostatistical simulation comprises a variety of techniques, which allow the generation of multiple scenarios reproducing the spatial continuity and the histogram of the desired phenomenon (grades for instance). These methods can be used in the decision making process, allowing the assess to the uncertainty of functions responses (which depend on the simulated inputs) commonly through a non-linear relationship (net present value, interest tax return, ore geometallurgical recovery…). However, one of their limitations is that running simulation can take a considerable processing time to be executed in large deposits or large grids. Therefore, the motivation of this dissertation focuses on this fact, leading to the main goal, i.e. investigating an alternative to accelerate the simulation process. The option chosen is based on the development and adaptation of the Multiple Random Walk Simulation, which is algorithm to build geostatistical simulations. It combines kriging with the simulation of independent random walks in order to generate simulated scenarios faster than via traditional simulation algorithms. This dissertation presents details of the method and new important contributions developed to improve its performance and statistics reproduction. An algorithm and software was…
Advisors/Committee Members: Costa, Joao Felipe Coimbra Leite.
Subjects/Keywords: Simulação geoestatística; Geostatistics; Krigagem; Conditional simulation; Mineração; Mining; Parallelization
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Caixeta, R. M. (2015). Simulação geoestatística utilizando múltiplos passeios aleatórios. (Thesis). Universidade do Rio Grande do Sul. Retrieved from http://hdl.handle.net/10183/143317
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
Chicago Manual of Style (16th Edition):
Caixeta, Rafael Moniz. “Simulação geoestatística utilizando múltiplos passeios aleatórios.” 2015. Thesis, Universidade do Rio Grande do Sul. Accessed March 06, 2021.
http://hdl.handle.net/10183/143317.
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
MLA Handbook (7th Edition):
Caixeta, Rafael Moniz. “Simulação geoestatística utilizando múltiplos passeios aleatórios.” 2015. Web. 06 Mar 2021.
Vancouver:
Caixeta RM. Simulação geoestatística utilizando múltiplos passeios aleatórios. [Internet] [Thesis]. Universidade do Rio Grande do Sul; 2015. [cited 2021 Mar 06].
Available from: http://hdl.handle.net/10183/143317.
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
Council of Science Editors:
Caixeta RM. Simulação geoestatística utilizando múltiplos passeios aleatórios. [Thesis]. Universidade do Rio Grande do Sul; 2015. Available from: http://hdl.handle.net/10183/143317
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation

University of Alberta
20.
Brandon, Tyler.
Parallel-Node Low-Density Parity-Check Convolutional Code
Encoder and Decoder Architectures.
Degree: PhD, Department of Electrical and Computer
Engineering, 2010, University of Alberta
URL: https://era.library.ualberta.ca/files/7s75dd88x
► We present novel architectures for parallel-node low-density parity-check convolutional code (PN-LDPC-CC) encoders and decoders. Based on a recently introduced implementation-aware class of LDPC-CCs, these encoders…
(more)
▼ We present novel architectures for parallel-node
low-density parity-check convolutional code (PN-LDPC-CC) encoders
and decoders. Based on a recently introduced implementation-aware
class of LDPC-CCs, these encoders and decoders take advantage of
increased node-parallelization to simultaneously decrease the
energy-per-bit and increase the decoded information throughput. A
series of progressively improved encoder and decoder designs are
presented and characterized using synthesis results with respect to
power, area and throughput. The best of the encoder and decoder
designs significantly advance the state-of-the-art in terms of both
the energy-per-bit and throughput/area metrics. One of the
presented decoders, for an Eb /N0 of 2.5 dB has a bit-error-rate of
10−6, takes 4.5 mm2 in a CMOS 90-nm process, and achieves an
energy-per-decoded-information-bit of 65 pJ and a decoded
information throughput of 4.8 Gbits/s. We implement an earlier
non-parallel node LDPC-CC encoder, decoder and a channel emulator
in silicon. We provide readers, via two sets of tables, the ability
to look up our decoder hardware metrics, across four different
process technologies, for over 1000 variations of our PN-LDPC-CC
decoders. By imposing practical decoder implementation constraints
on power or area, which in turn drives trade-offs in code size
versus the number of decoder processors, we compare the code BER
performance. An extensive comparison to known LDPC-BC/CC decoder
implementations is provided.
Subjects/Keywords: LDPC; Achitecture; throughput; Encoder; VLSI; Decoder; parallelization; Convolutional; energy-per-bit
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Brandon, T. (2010). Parallel-Node Low-Density Parity-Check Convolutional Code
Encoder and Decoder Architectures. (Doctoral Dissertation). University of Alberta. Retrieved from https://era.library.ualberta.ca/files/7s75dd88x
Chicago Manual of Style (16th Edition):
Brandon, Tyler. “Parallel-Node Low-Density Parity-Check Convolutional Code
Encoder and Decoder Architectures.” 2010. Doctoral Dissertation, University of Alberta. Accessed March 06, 2021.
https://era.library.ualberta.ca/files/7s75dd88x.
MLA Handbook (7th Edition):
Brandon, Tyler. “Parallel-Node Low-Density Parity-Check Convolutional Code
Encoder and Decoder Architectures.” 2010. Web. 06 Mar 2021.
Vancouver:
Brandon T. Parallel-Node Low-Density Parity-Check Convolutional Code
Encoder and Decoder Architectures. [Internet] [Doctoral dissertation]. University of Alberta; 2010. [cited 2021 Mar 06].
Available from: https://era.library.ualberta.ca/files/7s75dd88x.
Council of Science Editors:
Brandon T. Parallel-Node Low-Density Parity-Check Convolutional Code
Encoder and Decoder Architectures. [Doctoral Dissertation]. University of Alberta; 2010. Available from: https://era.library.ualberta.ca/files/7s75dd88x

University of Georgia
21.
Medikonduru, Harini.
The two dimensional coupled nonlinear SchrÖdinger equation.
Degree: 2014, University of Georgia
URL: http://hdl.handle.net/10724/28048
► The coupled nonlinear Schrödinger equation is of tremendous importance in both theory and applications. Coupled nonlinear Schrödinger equation (CNLS) is the vectorial version of the…
(more)
▼ The coupled nonlinear Schrödinger equation is of tremendous importance in both theory and applications. Coupled nonlinear Schrödinger equation (CNLS) is the vectorial version of the nonlinear Schrödinger equation (NLS). The NLS equation is
the main governing equation in the area of optical solitons. In this thesis, we perform numerical simulations of two dimensional CNLS equation using various numerical methods like the split-step Fourier methods and the finite difference methods. We
implement parallel methods on the zcluster multiprocessor system. We then compare the numerical results time wise to see whether these methods have given a good speed up thus enhancing performance.
Subjects/Keywords: Split-step method; CNLS; Parallelization; FFTW; Finite difference Method; MPI
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Medikonduru, H. (2014). The two dimensional coupled nonlinear SchrÖdinger equation. (Thesis). University of Georgia. Retrieved from http://hdl.handle.net/10724/28048
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
Chicago Manual of Style (16th Edition):
Medikonduru, Harini. “The two dimensional coupled nonlinear SchrÖdinger equation.” 2014. Thesis, University of Georgia. Accessed March 06, 2021.
http://hdl.handle.net/10724/28048.
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
MLA Handbook (7th Edition):
Medikonduru, Harini. “The two dimensional coupled nonlinear SchrÖdinger equation.” 2014. Web. 06 Mar 2021.
Vancouver:
Medikonduru H. The two dimensional coupled nonlinear SchrÖdinger equation. [Internet] [Thesis]. University of Georgia; 2014. [cited 2021 Mar 06].
Available from: http://hdl.handle.net/10724/28048.
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
Council of Science Editors:
Medikonduru H. The two dimensional coupled nonlinear SchrÖdinger equation. [Thesis]. University of Georgia; 2014. Available from: http://hdl.handle.net/10724/28048
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation

Texas A&M University
22.
Dong, Wei.
Parallel Algorithms for Time and Frequency Domain Circuit Simulation.
Degree: PhD, Computer Engineering, 2010, Texas A&M University
URL: http://hdl.handle.net/1969.1/ETD-TAMU-2009-08-7021
► As a most critical form of pre-silicon verification, transistor-level circuit simulation is an indispensable step before committing to an expensive manufacturing process. However, considering the…
(more)
▼ As a most critical form of pre-silicon verification, transistor-level circuit simulation
is an indispensable step before committing to an expensive manufacturing process.
However, considering the nature of circuit simulation, it can be computationally
expensive, especially for ever-larger transistor circuits with more complex device models.
Therefore, it is becoming increasingly desirable to accelerate circuit simulation.
On the other hand, the emergence of multi-core machines offers a promising solution
to circuit simulation besides the known application of distributed-memory clustered
computing platforms, which provides abundant hardware computing resources. This
research addresses the limitations of traditional serial circuit simulations and proposes
new techniques for both time-domain and frequency-domain parallel circuit
simulations.
For time-domain simulation, this dissertation presents a parallel transient simulation
methodology. This new approach, called WavePipe, exploits coarse-grained
application-level parallelism by simultaneously computing circuit solutions at multiple
adjacent time points in a way resembling hardware pipelining. There are two
embodiments in WavePipe: backward and forward pipelining schemes. While the
former creates independent computing tasks that contribute to a larger future time
step, the latter performs predictive computing along the forward direction. Unlike
existing relaxation methods, WavePipe facilitates parallel circuit simulation without jeopardizing convergence and accuracy. As a coarse-grained parallel approach, it requires
low parallel programming effort, furthermore it creates new avenues to have a
full utilization of increasingly parallel hardware by going beyond conventional finer
grained parallel device model evaluation and matrix solutions.
This dissertation also exploits the recently developed explicit telescopic projective
integration method for efficient parallel transient circuit simulation by addressing the
stability limitation of explicit numerical integration. The new method allows the
effective time step controlled by accuracy requirement instead of stability limitation.
Therefore, it not only leads to noticeable efficiency improvement, but also lends itself
to straightforward
parallelization due to its explicit nature.
For frequency-domain simulation, this dissertation presents a parallel harmonic
balance approach, applicable to the steady-state and envelope-following analyses of
both driven and autonomous circuits. The new approach is centered on a naturally-parallelizable
preconditioning technique that speeds up the core computation in harmonic
balance based analysis. The proposed method facilitates parallel computing
via the use of domain knowledge and simplifies parallel programming compared with
fine-grained strategies. As a result, favorable runtime speedups are achieved.
Advisors/Committee Members: Li, Peng (advisor), Khatri, Sunil P. (committee member), Martinez, Jose Silva (committee member), Walker, Duncan M. (committee member).
Subjects/Keywords: circuit simulation; parallelization; multicore
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Dong, W. (2010). Parallel Algorithms for Time and Frequency Domain Circuit Simulation. (Doctoral Dissertation). Texas A&M University. Retrieved from http://hdl.handle.net/1969.1/ETD-TAMU-2009-08-7021
Chicago Manual of Style (16th Edition):
Dong, Wei. “Parallel Algorithms for Time and Frequency Domain Circuit Simulation.” 2010. Doctoral Dissertation, Texas A&M University. Accessed March 06, 2021.
http://hdl.handle.net/1969.1/ETD-TAMU-2009-08-7021.
MLA Handbook (7th Edition):
Dong, Wei. “Parallel Algorithms for Time and Frequency Domain Circuit Simulation.” 2010. Web. 06 Mar 2021.
Vancouver:
Dong W. Parallel Algorithms for Time and Frequency Domain Circuit Simulation. [Internet] [Doctoral dissertation]. Texas A&M University; 2010. [cited 2021 Mar 06].
Available from: http://hdl.handle.net/1969.1/ETD-TAMU-2009-08-7021.
Council of Science Editors:
Dong W. Parallel Algorithms for Time and Frequency Domain Circuit Simulation. [Doctoral Dissertation]. Texas A&M University; 2010. Available from: http://hdl.handle.net/1969.1/ETD-TAMU-2009-08-7021

University of Toronto
23.
Wang, Mike Dai.
A Study of the Potential of Intel's Transactional Synchronization Extensions for Hybrid Transactional Memory.
Degree: 2014, University of Toronto
URL: http://hdl.handle.net/1807/68008
► Hardware Transactional Memory (TM) attempts to deliver on the promises made with Software Transactional Memory by enabling scalable parallel execution with low programming effort. Intel's…
(more)
▼ Hardware Transactional Memory (TM) attempts to deliver on the promises made with Software Transactional Memory by enabling scalable parallel execution with low programming effort. Intel's TSX is a recent such effort to provide hardware support for Transactional Memory on commodity hardware. This dissertation studies the limitations of TSX and proposes both manual and automated techniques for improving application parallelism. We find that using only TSX may result in high abort rates and poor performance for certain workloads due to hardware resource limitations. Using the Moldyn benchmark, we present a series of manual optimizations on TSX to achieve performance comparable to fine-grained locking. We then translate the identified manual optimizations into automated techniques in the form of a software-hardware hybrid transactional memory library called HyTM. Using a realistic benchmark, SynQuake, we experimentally show that HyTM can significantly increase the fraction of transactions that take advantage of TSX and execute in hardware.
M.A.S.
Advisors/Committee Members: Amza, Cristiana, Electrical and Computer Engineering.
Subjects/Keywords: concurrency; parallelization; scalability; synchronization; transactional memory; Transactional Synchronization Extensions; 0464
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Wang, M. D. (2014). A Study of the Potential of Intel's Transactional Synchronization Extensions for Hybrid Transactional Memory. (Masters Thesis). University of Toronto. Retrieved from http://hdl.handle.net/1807/68008
Chicago Manual of Style (16th Edition):
Wang, Mike Dai. “A Study of the Potential of Intel's Transactional Synchronization Extensions for Hybrid Transactional Memory.” 2014. Masters Thesis, University of Toronto. Accessed March 06, 2021.
http://hdl.handle.net/1807/68008.
MLA Handbook (7th Edition):
Wang, Mike Dai. “A Study of the Potential of Intel's Transactional Synchronization Extensions for Hybrid Transactional Memory.” 2014. Web. 06 Mar 2021.
Vancouver:
Wang MD. A Study of the Potential of Intel's Transactional Synchronization Extensions for Hybrid Transactional Memory. [Internet] [Masters thesis]. University of Toronto; 2014. [cited 2021 Mar 06].
Available from: http://hdl.handle.net/1807/68008.
Council of Science Editors:
Wang MD. A Study of the Potential of Intel's Transactional Synchronization Extensions for Hybrid Transactional Memory. [Masters Thesis]. University of Toronto; 2014. Available from: http://hdl.handle.net/1807/68008
24.
Li, Run.
Theory And Application Development Of Electronic Structure Methods Involving Heavy Computation.
Degree: PhD, Chemistry, 2017, University of North Dakota
URL: https://commons.und.edu/theses/2269
► The propargyl radical, the most stable isomer of C3H3, is very important in combustion reactions. However, theoretical calculations have never been able to find…
(more)
▼ The propargyl radical, the most stable isomer of C3H3, is very important in combustion reactions. However, theoretical calculations have never been able to find a strong absorption around 242 nm as seen in experiments. In this study, we calculated the electronic energy levels of the propargyl radical using highly accurate multireference methods, including multireference configuration interaction singles and doubles method with triples and quadruples treated perturbatively [denoted as MRCISD(TQ)], as well as second and third order generalized Van Vleck perturbation theories (GVVPT2 and GVVPT3). Calculations indicate that this absorption can be solely attributed to a Franck-Condon-allowed transition from the ground B1 state to the Rydberg-like first A1 excited state. Calculations also show that GVVPT2 with a relatively small active space fails to capture enough Rydberg character of this excited state, while it can be recovered by GVVPT3, MRCISD, and MRCISD(TQ).
In order to speed up MRCISD(TQ) calculations, the triple and quadruple (TQ) perturbative corrections, the most time-consuming part of MRCISD(TQ) calculations, were parallelized using Message Passing Interface (MPI). The MRCISD(TQ) method is organized in the framework of macroconfigurations, which allows the use of incomplete reference spaces and provides an efficient means of screening large number of non-interacting configuration state functions (CSFs). The test calculations show that the parallel code achieved close to linear speed-up when the number of CSFs in each macroconfiguration is small. The speed-up suffers when large numbers of CSFs exist in only a few macroconfigurations.
The computer algorithm for second-order generalized van Vleck multireference perturbation theory (GVVPT2) was similarly parallelized using the MPI protocol, organized in the framework of macroconfigurations. The maximum number of CSFs per macroconfiguration is found to have less influence on the MPI speedup and scaling than in the case of MRCISD(TQ).
It was previously found that unrestricted local density approximation (LDA) orbitals can be used in place of MCSCF to provide orbitals for GVVPT2. This inspired us to use the more controllable restricted density functional theory (DFT) to provide unbiased orbitals for GVVPT2 calculations. In this study, the relationship between restricted DFT and unrestricted DFT were explored and the restricted DFT results were obtained by utilizing subroutines from unrestricted DFT calculations. We also found that the DIIS technique drastically sped up the convergence of RDFT calculations.
Plane wave DFT methods are commonly used to efficiently evaluate solid state materials. In this work, the electronic properties of pristine graphene and Zn-phthalocyanine tetrasulfonic acid (Zn-PcS) physisorbed on single-layer graphene were calculated using plane wave DFT. The Perdew-Burke-Ernzerhof functional with dispersion correction (PBE-D2) was used. The densities of states were…
Advisors/Committee Members: Mark R. Hoffmann.
Subjects/Keywords: electronic structure theory; GVVPT2; MPI; MRCISD(TQ); multireference perturbation theory; parallelization
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Li, R. (2017). Theory And Application Development Of Electronic Structure Methods Involving Heavy Computation. (Doctoral Dissertation). University of North Dakota. Retrieved from https://commons.und.edu/theses/2269
Chicago Manual of Style (16th Edition):
Li, Run. “Theory And Application Development Of Electronic Structure Methods Involving Heavy Computation.” 2017. Doctoral Dissertation, University of North Dakota. Accessed March 06, 2021.
https://commons.und.edu/theses/2269.
MLA Handbook (7th Edition):
Li, Run. “Theory And Application Development Of Electronic Structure Methods Involving Heavy Computation.” 2017. Web. 06 Mar 2021.
Vancouver:
Li R. Theory And Application Development Of Electronic Structure Methods Involving Heavy Computation. [Internet] [Doctoral dissertation]. University of North Dakota; 2017. [cited 2021 Mar 06].
Available from: https://commons.und.edu/theses/2269.
Council of Science Editors:
Li R. Theory And Application Development Of Electronic Structure Methods Involving Heavy Computation. [Doctoral Dissertation]. University of North Dakota; 2017. Available from: https://commons.und.edu/theses/2269
25.
Khairullah, M. (author).
Parallelization of Ensemble Kalman Filter (EnKF) for Oil Reservoirs.
Degree: 2012, Delft University of Technology
URL: http://resolver.tudelft.nl/uuid:9328b51a-7eb5-4206-b463-48d99b86aafd
► This thesis describes the design and implementation of a parallel algorithm for data assimilation with ensemble Kalman filter (EnKF) for oil reservoir management. The implemented…
(more)
▼ This thesis describes the design and implementation of a parallel algorithm for data assimilation with ensemble Kalman filter (EnKF) for oil reservoir management. The implemented application works on large number of observations from time-lapse seismic, which lead to a large turnaround time for the analysis step, in addition to the time consuming simulations of the realizations. Provided that parallel resources are used for the parallel simulations of the realizations, the analysis step also deserves parallelization. Our experiments show that parallelization of the analysis step in addition to the forecast step also scales well, exploiting the same set of resources with some additional efforts.
Computer Simulations for Science and Engineering
Applied mathematics
Electrical Engineering, Mathematics and Computer Science
Advisors/Committee Members: Heemink, A.W. (mentor), Lin, H.X. (mentor).
Subjects/Keywords: Data assimilation; EnKF; Parallelization
…Serial implementation of the EnKF for data assimilation . . . . .
1st level parallelization of… …the simulator . . . . . . . . . . . . . .
2nd level parallelization of the simulator… …A multi level parallelization . . . . . . . . . . . . . . . . . . . .
Proposed parallel… …reservoirs and
2
present the experimental results. The results show that by parallelization of… …proposed. The first level of parallelization is
usually during the forecast step, where each…
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Khairullah, M. (. (2012). Parallelization of Ensemble Kalman Filter (EnKF) for Oil Reservoirs. (Masters Thesis). Delft University of Technology. Retrieved from http://resolver.tudelft.nl/uuid:9328b51a-7eb5-4206-b463-48d99b86aafd
Chicago Manual of Style (16th Edition):
Khairullah, M (author). “Parallelization of Ensemble Kalman Filter (EnKF) for Oil Reservoirs.” 2012. Masters Thesis, Delft University of Technology. Accessed March 06, 2021.
http://resolver.tudelft.nl/uuid:9328b51a-7eb5-4206-b463-48d99b86aafd.
MLA Handbook (7th Edition):
Khairullah, M (author). “Parallelization of Ensemble Kalman Filter (EnKF) for Oil Reservoirs.” 2012. Web. 06 Mar 2021.
Vancouver:
Khairullah M(. Parallelization of Ensemble Kalman Filter (EnKF) for Oil Reservoirs. [Internet] [Masters thesis]. Delft University of Technology; 2012. [cited 2021 Mar 06].
Available from: http://resolver.tudelft.nl/uuid:9328b51a-7eb5-4206-b463-48d99b86aafd.
Council of Science Editors:
Khairullah M(. Parallelization of Ensemble Kalman Filter (EnKF) for Oil Reservoirs. [Masters Thesis]. Delft University of Technology; 2012. Available from: http://resolver.tudelft.nl/uuid:9328b51a-7eb5-4206-b463-48d99b86aafd

University of Wisconsin – Milwaukee
26.
Liu, Ke.
Radial Basis Functions: Biomedical Applications and Parallelization.
Degree: PhD, Engineering, 2016, University of Wisconsin – Milwaukee
URL: https://dc.uwm.edu/etd/1382
► Radial basis function (RBF) is a real-valued function whose values depend only on the distances between an interpolation point and a set of user-specified…
(more)
▼ Radial basis function (RBF) is a real-valued function whose values depend only on the distances between an interpolation point and a set of user-specified points called centers. RBF interpolation is one of the primary methods to reconstruct functions from multi-dimensional scattered data. Its abilities to generalize arbitrary space dimensions and to provide spectral accuracy have made it particularly popular in different application areas, including but not limited to: finding numerical solutions of partial differential equations (PDEs), image processing, computer vision and graphics, deep learning and neural networks, etc.
The present thesis discusses three applications of RBF interpolation in biomedical engineering areas: (1) Calcium dynamics modeling, in which we numerically solve a set of PDEs by using meshless numerical methods and RBF-based interpolation techniques; (2) Image restoration and transformation, where an image is restored from its triangular mesh representation or transformed under translation, rotation, and scaling, etc. from its original form; (3) Porous structure design, in which the RBF interpolation used to reconstruct a 3D volume containing porous structures from a set of regularly or randomly placed points inside a user-provided surface shape. All these three applications have been investigated and their effectiveness has been supported with numerous experimental results. In particular, we innovatively utilize anisotropic distance metrics to define the distance in RBF interpolation and apply them to the aforementioned second and third applications, which show significant improvement in preserving image features or capturing connected porous structures over the isotropic distance-based RBF method.
Beside the algorithm designs and their applications in biomedical areas, we also explore several common
parallelization techniques (including OpenMP and CUDA-based GPU programming) to accelerate the performance of the present algorithms. In particular, we analyze how parallel programming can help RBF interpolation to speed up the meshless PDE solver as well as image processing. While RBF has been widely used in various science and engineering fields, the current thesis is expected to trigger some more interest from computational scientists or students into this fast-growing area and specifically apply these techniques to biomedical problems such as the ones investigated in the present work.
Advisors/Committee Members: Zeyun Yu.
Subjects/Keywords: Biomedical Applications; Parallelization; Radial Basis Functions; Computer Sciences
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Liu, K. (2016). Radial Basis Functions: Biomedical Applications and Parallelization. (Doctoral Dissertation). University of Wisconsin – Milwaukee. Retrieved from https://dc.uwm.edu/etd/1382
Chicago Manual of Style (16th Edition):
Liu, Ke. “Radial Basis Functions: Biomedical Applications and Parallelization.” 2016. Doctoral Dissertation, University of Wisconsin – Milwaukee. Accessed March 06, 2021.
https://dc.uwm.edu/etd/1382.
MLA Handbook (7th Edition):
Liu, Ke. “Radial Basis Functions: Biomedical Applications and Parallelization.” 2016. Web. 06 Mar 2021.
Vancouver:
Liu K. Radial Basis Functions: Biomedical Applications and Parallelization. [Internet] [Doctoral dissertation]. University of Wisconsin – Milwaukee; 2016. [cited 2021 Mar 06].
Available from: https://dc.uwm.edu/etd/1382.
Council of Science Editors:
Liu K. Radial Basis Functions: Biomedical Applications and Parallelization. [Doctoral Dissertation]. University of Wisconsin – Milwaukee; 2016. Available from: https://dc.uwm.edu/etd/1382

Queensland University of Technology
27.
Craik, Andrew.
A framework for reasoning about inherent parallelism in modern object-oriented languages.
Degree: 2011, Queensland University of Technology
URL: https://eprints.qut.edu.au/40877/
► With the emergence of multi-core processors into the mainstream, parallel programming is no longer the specialized domain it once was. There is a growing need…
(more)
▼ With the emergence of multi-core processors into the mainstream, parallel programming is no longer the specialized domain it once was. There is a growing need for systems to allow programmers to more easily reason about data dependencies and inherent parallelism in general purpose programs. Many of these programs are written in popular imperative programming languages like Java and C]. In this thesis I present a system for reasoning about side-effects of evaluation in an abstract and composable manner that is suitable for use by both programmers and automated tools such as compilers. The goal of developing such a system is to both facilitate the automatic exploitation of the inherent parallelism present in imperative programs and to allow programmers to reason about dependencies which may be limiting the parallelism available for exploitation in their applications. Previous work on languages and type systems for parallel computing has tended to focus on providing the programmer with tools to facilitate the manual parallelization of programs; programmers must decide when and where it is safe to employ parallelism without the assistance of the compiler or other automated tools. None of the existing systems combine abstraction and composition with parallelization and correctness checking to produce a framework which helps both programmers and automated tools to reason about inherent parallelism. In this work I present a system for abstractly reasoning about side-effects and data dependencies in modern, imperative, object-oriented languages using a type and effect system based on ideas from Ownership Types. I have developed sufficient conditions for the safe, automated detection and exploitation of a number task, data and loop parallelism patterns in terms of ownership relationships. To validate my work, I have applied my ideas to the C] version 3.0 language to produce a language extension called Zal. I have implemented a compiler for the Zal language as an extension of the GPC] research compiler as a proof of concept of my system. I have used it to parallelize a number of real-world applications to demonstrate the feasibility of my proposed approach. In addition to this empirical validation, I present an argument for the correctness of the type system and language semantics I have proposed as well as sketches of proofs for the correctness of the sufficient conditions for parallelization proposed.
Subjects/Keywords: programming languages; Ownership Types; parallelization; inherent parallelism; conditional parallelism; effect system
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Craik, A. (2011). A framework for reasoning about inherent parallelism in modern object-oriented languages. (Thesis). Queensland University of Technology. Retrieved from https://eprints.qut.edu.au/40877/
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
Chicago Manual of Style (16th Edition):
Craik, Andrew. “A framework for reasoning about inherent parallelism in modern object-oriented languages.” 2011. Thesis, Queensland University of Technology. Accessed March 06, 2021.
https://eprints.qut.edu.au/40877/.
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
MLA Handbook (7th Edition):
Craik, Andrew. “A framework for reasoning about inherent parallelism in modern object-oriented languages.” 2011. Web. 06 Mar 2021.
Vancouver:
Craik A. A framework for reasoning about inherent parallelism in modern object-oriented languages. [Internet] [Thesis]. Queensland University of Technology; 2011. [cited 2021 Mar 06].
Available from: https://eprints.qut.edu.au/40877/.
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
Council of Science Editors:
Craik A. A framework for reasoning about inherent parallelism in modern object-oriented languages. [Thesis]. Queensland University of Technology; 2011. Available from: https://eprints.qut.edu.au/40877/
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation

Universidade Nova
28.
Murillo, Carlos Andrés Osorio.
Parallelization of web processing services on cloud computing: A case study of Geostatistical Methods.
Degree: 2011, Universidade Nova
URL: http://www.rcaap.pt/detail.jsp?id=oai:run.unl.pt:10362/8294
► Dissertation submitted in partial fulfillment of the requirements for the Degree of Master of Science in Geospatial Technologies.
In the last decade the publication of…
(more)
▼ Dissertation submitted in partial fulfillment of the requirements for the Degree of Master of Science in Geospatial Technologies.
In the last decade the publication of geographic information has increased in Internet,
especially with the emergence of new technologies to share information. This
information requires the use of technologies of geoprocessing online that use new
platforms such as Cloud Computing. This thesis work evaluates the parallelization of
geoprocesses on the Cloud platform Amazon Web Service (AWS), through OGC
Web Processing Services (WPS) using the 52North WPS framework. This evaluation
is performed using a new implementation of a Geostatistical library in Java with
parallelization capabilities. The geoprocessing is tested by incrementing the number
of micro instances on the Cloud through GridGain technology. The Geostatistical library obtains similar interpolated values compared with the
software ArcGIS. In the Inverse Distance Weight (IDW) and Radial Basis Functions
(RBF) methods were not found differences. In the Ordinary and Universal Kriging
methods differences have been found of 0.01% regarding the Root Mean Square
(RMS) error.The parallelization process demonstrates that the duration of the interpolation
decreases when the number of nodes increases. The duration behavior depends on the
size of input dataset and the number of pixels to be interpolated. The maximum
reduction in time was found with the largest configuration used in the research
(1.000.000 of pixels and a dataset of 10.000 points). The execution time decreased in
83% working with 10 nodes in the Ordinary Kriging and IDW methods. However,
the differences in duration working with 5 nodes and 10 nodes were not statistically
significant. The reductions with 5 nodes were 72% and 71% in the Ordinary Kriging
and IDW methods respectively. Finally, the experiments show that the geoprocessing on Cloud Computing is feasible
using the WPS interface. The performance of the geostatistical methods deployed
through the WPS services can improve by the parallelization technique. This thesis
proves that the parallelization on the Cloud is viable using a Grid configuration. The
evaluation also showed that parallelization of geoprocesses on the Cloud for
academic purposes is inexpensive using Amazon AWS platform.
Advisors/Committee Members: Guijarro, Joaquín Huerta, Remke, Albert, Painho, Marco.
Subjects/Keywords: Web Processing Services; Parallelization Algorithms; Interpolation; Geostatistics; Cloud Computing
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Murillo, C. A. O. (2011). Parallelization of web processing services on cloud computing: A case study of Geostatistical Methods. (Thesis). Universidade Nova. Retrieved from http://www.rcaap.pt/detail.jsp?id=oai:run.unl.pt:10362/8294
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
Chicago Manual of Style (16th Edition):
Murillo, Carlos Andrés Osorio. “Parallelization of web processing services on cloud computing: A case study of Geostatistical Methods.” 2011. Thesis, Universidade Nova. Accessed March 06, 2021.
http://www.rcaap.pt/detail.jsp?id=oai:run.unl.pt:10362/8294.
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
MLA Handbook (7th Edition):
Murillo, Carlos Andrés Osorio. “Parallelization of web processing services on cloud computing: A case study of Geostatistical Methods.” 2011. Web. 06 Mar 2021.
Vancouver:
Murillo CAO. Parallelization of web processing services on cloud computing: A case study of Geostatistical Methods. [Internet] [Thesis]. Universidade Nova; 2011. [cited 2021 Mar 06].
Available from: http://www.rcaap.pt/detail.jsp?id=oai:run.unl.pt:10362/8294.
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
Council of Science Editors:
Murillo CAO. Parallelization of web processing services on cloud computing: A case study of Geostatistical Methods. [Thesis]. Universidade Nova; 2011. Available from: http://www.rcaap.pt/detail.jsp?id=oai:run.unl.pt:10362/8294
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
29.
Neugebauer, Olaf.
Efficient implementation of resource-constrained cyber-physical systems using multi-core parallelism.
Degree: 2018, Technische Universität Dortmund
URL: http://dx.doi.org/10.17877/DE290R-18927
► The quest for more performance of applications and systems became more challenging in the recent years. Especially in the cyber-physical and mobile domain, the performance…
(more)
▼ The quest for more performance of applications and systems became more challenging in the recent years. Especially in the cyber-physical and mobile domain, the performance requirements increased significantly. Applications, previously found in the high-performance domain, emerge in the area of resource-constrained domain. Modern heterogeneous high-performance MPSoCs provide a solid foundation to satisfy the high demand. Such systems combine general processors with specialized accelerators ranging from GPUs to machine learning chips. On the other side of the performance spectrum, the demand for small energy efficient systems exposed by modern IoT applications increased vastly. Developing efficient software for such resource-constrained multi-core systems is an error-prone, time-consuming and challenging task. This thesis provides with PA4RES a holistic semiautomatic approach to parallelize and implement applications for such platforms efficiently.
Our solution supports the developer to find good trade-offs to tackle the requirements exposed by modern applications and systems. With PICO, we propose a comprehensive approach to express parallelism in sequential applications. PICO detects data dependencies and implements required synchronization automatically. Using a genetic algorithm, PICO optimizes the data synchronization. The evolutionary algorithm considers channel capacity, memory mapping, channel merging and flexibility offered by the channel implementation with respect to execution time, energy consumption and memory footprint. PICO's communication optimization phase was able to generate a speedup almost 2 or an energy improvement of 30% for certain benchmarks. The PAMONO sensor approach enables a fast detection of biological viruses using optical methods. With a sophisticated virus detection software, a real-time virus detection running on stationary computers was achieved. Within this thesis, we were able to derive a
soft real-time capable virus detection running on a high-performance embedded system, commonly found in today's smart phones. This was accomplished with smart DSE algorithm which optimizes for execution time, energy consumption and detection quality. Compared to a baseline implementation, our solution achieved a speedup of 4.1 and 87% energy savings and satisfied the soft real-time requirements. Accepting a degradation of the detection quality, which still is usable in medical context, led to a speedup of 11.1. This work provides the fundamentals for a truly mobile real-time virus detection solution. The growing demand for processing power can no longer satisfied following well-known approaches like higher frequencies. These so-called performance walls expose a serious challenge for the growing performance demand. Approximate computing is a promising approach to overcome or at least shift the performance walls by accepting a degradation in the output quality to gain improvements in
other objectives. Especially for a safe integration of approximation into existing application or during the…
Advisors/Committee Members: Marwedel, Peter (advisor), Müller, Heinrich (referee).
Subjects/Keywords: Parallelization; MPSoC; Cyber-physical system; Resource-constrained; 004
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Neugebauer, O. (2018). Efficient implementation of resource-constrained cyber-physical systems using multi-core parallelism. (Doctoral Dissertation). Technische Universität Dortmund. Retrieved from http://dx.doi.org/10.17877/DE290R-18927
Chicago Manual of Style (16th Edition):
Neugebauer, Olaf. “Efficient implementation of resource-constrained cyber-physical systems using multi-core parallelism.” 2018. Doctoral Dissertation, Technische Universität Dortmund. Accessed March 06, 2021.
http://dx.doi.org/10.17877/DE290R-18927.
MLA Handbook (7th Edition):
Neugebauer, Olaf. “Efficient implementation of resource-constrained cyber-physical systems using multi-core parallelism.” 2018. Web. 06 Mar 2021.
Vancouver:
Neugebauer O. Efficient implementation of resource-constrained cyber-physical systems using multi-core parallelism. [Internet] [Doctoral dissertation]. Technische Universität Dortmund; 2018. [cited 2021 Mar 06].
Available from: http://dx.doi.org/10.17877/DE290R-18927.
Council of Science Editors:
Neugebauer O. Efficient implementation of resource-constrained cyber-physical systems using multi-core parallelism. [Doctoral Dissertation]. Technische Universität Dortmund; 2018. Available from: http://dx.doi.org/10.17877/DE290R-18927
30.
Wieczorek, Calvin L.
3d terrain visualization and CPU parallelization of particle swarm optimization.
Degree: 2018, IUPUI
URL: http://hdl.handle.net/1805/15952
► Indiana University-Purdue University Indianapolis (IUPUI)
Particle Swarm Optimization is a bio-inspired optimization technique used to approximately solve the non-deterministic polynomial (NP) problem of asset allocation…
(more)
▼ Indiana University-Purdue University Indianapolis (IUPUI)
Particle Swarm Optimization is a bio-inspired optimization technique used to approximately solve the non-deterministic polynomial (NP) problem of asset allocation in 3D space, frequency, antenna azimuth [1], and elevation orientation [1]. This research uses QT Data Visualization to display the PSO solutions, assets, transmitters in 3D space from the work done in [2]. Elevation and Imagery data was extracted from ARCGIS (a geographic information system (GIS) database) to add overlapping elevation and imagery data to that the 3D visualization displays proper topological data. The 3D environment range was improved and is now dynamic; giving the user appropriate coordinates based from the ARCGIS latitude and longitude ranges. The second part of the research improves the performance of the PSOs runtime, using OpenMP with CPU threading to parallelize the evaluation of the PSO by particle. Lastly, this implementation uses CPU multithreading with 4 threads to improve the performance of the PSO by 42% - 51% in comparison to running the PSO without CPU multithreading. The contributions provided allow for the PSO project to be more realistically simulate its use in the Electronic Warfare (EW) space, adding additional CPU multithreading implementation for further performance improvements.
Advisors/Committee Members: Christopher, Lauren, King, Brian, Lee, John.
Subjects/Keywords: pso; thread; visualization; parallelization
…May 2018. 3D Terrain Visualization and CPU Parallelization of Particle Swarm Optimization… …increase their
performance by utilizing the growing field of parallelization. With multi-core CPU… …parallelization. In [13], the authors created a parallel Particle Swarm Optimization
(PSO… …like the GPSO to increase parallelization. This research also uses a relatively
small number…
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Wieczorek, C. L. (2018). 3d terrain visualization and CPU parallelization of particle swarm optimization. (Thesis). IUPUI. Retrieved from http://hdl.handle.net/1805/15952
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
Chicago Manual of Style (16th Edition):
Wieczorek, Calvin L. “3d terrain visualization and CPU parallelization of particle swarm optimization.” 2018. Thesis, IUPUI. Accessed March 06, 2021.
http://hdl.handle.net/1805/15952.
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
MLA Handbook (7th Edition):
Wieczorek, Calvin L. “3d terrain visualization and CPU parallelization of particle swarm optimization.” 2018. Web. 06 Mar 2021.
Vancouver:
Wieczorek CL. 3d terrain visualization and CPU parallelization of particle swarm optimization. [Internet] [Thesis]. IUPUI; 2018. [cited 2021 Mar 06].
Available from: http://hdl.handle.net/1805/15952.
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
Council of Science Editors:
Wieczorek CL. 3d terrain visualization and CPU parallelization of particle swarm optimization. [Thesis]. IUPUI; 2018. Available from: http://hdl.handle.net/1805/15952
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
◁ [1] [2] [3] [4] [5] … [11] ▶
.