You searched for subject:(Parallel Computing)
.
Showing records 1 – 30 of
1051 total matches.
◁ [1] [2] [3] [4] [5] … [36] ▶
1.
Zulian, Patrick.
Geometry–aware finite element framework for multi–physics
simulations: an algorithmic and software-centric
perspective.
Degree: 2017, Università della Svizzera italiana
URL: http://doc.rero.ch/record/304964
► In finite element simulations, the handling of geometrical objects and their discrete representation is a critical aspect in both serial and parallel scientific software environments.…
(more)
▼ In finite element simulations, the handling of
geometrical objects and their discrete representation is a critical
aspect in both serial and
parallel scientific software
environments. The development of codes targeting such envinronments
is
subject to great development effort and man-hours invested. In
this thesis we approach these issues from three fronts. First,
stable and efficient techniques for the transfer of discrete fields
between non matching volume or surface meshes are an essential
ingredient for the discretization and numerical solution of coupled
multi-physics and multi-scale problems. In particular
L2-projections allows for the transfer of discrete fields between
unstructured meshes, both in the volume and on the surface. We
present an algorithm for parallelizing the assembly of the
L2-transfer operator for unstructured meshes which are arbitrarily
distributed among different processes. The algorithm requires no a
priori information on the geometrical relationship between the
different meshes. Second, the geometric representation is often a
limiting factor which imposes a trade-off between how accurately
the shape is described, and what methods can be employed for
solving a system of differential equations. Parametric
finite-elements and bijective mappings between polygons or
polyhedra allow us to flexibly construct finite element
discretizations with arbitrary resolutions without sacrificing the
accuracy of the shape description. Such flexibility allows
employing state-of-the-art techniques, such as geometric multigrid
methods, on meshes with almost any shape.t, the way numerical
techniques are represented in software libraries and approached
from a development perspective, affect both usability and
maintainability of such libraries. Completely separating the intent
of high-level routines from the actual implementation and
technologies allows for portable and maintainable performance. We
provide an overview on current trends in the development of
scientific software and showcase our open-source library
utopia.
Advisors/Committee Members: Rolf (Dir.).
Subjects/Keywords: Parallel computing
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Zulian, P. (2017). Geometry–aware finite element framework for multi–physics
simulations: an algorithmic and software-centric
perspective. (Thesis). Università della Svizzera italiana. Retrieved from http://doc.rero.ch/record/304964
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
Chicago Manual of Style (16th Edition):
Zulian, Patrick. “Geometry–aware finite element framework for multi–physics
simulations: an algorithmic and software-centric
perspective.” 2017. Thesis, Università della Svizzera italiana. Accessed March 04, 2021.
http://doc.rero.ch/record/304964.
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
MLA Handbook (7th Edition):
Zulian, Patrick. “Geometry–aware finite element framework for multi–physics
simulations: an algorithmic and software-centric
perspective.” 2017. Web. 04 Mar 2021.
Vancouver:
Zulian P. Geometry–aware finite element framework for multi–physics
simulations: an algorithmic and software-centric
perspective. [Internet] [Thesis]. Università della Svizzera italiana; 2017. [cited 2021 Mar 04].
Available from: http://doc.rero.ch/record/304964.
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
Council of Science Editors:
Zulian P. Geometry–aware finite element framework for multi–physics
simulations: an algorithmic and software-centric
perspective. [Thesis]. Università della Svizzera italiana; 2017. Available from: http://doc.rero.ch/record/304964
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation

McMaster University
2.
Bouman, Tanya.
Parallel Windowed Method for Scalar Multiplication in Elliptic Curve Cryptography.
Degree: MSc, 2021, McMaster University
URL: http://hdl.handle.net/11375/26191
► Commercial applications, including Blockchain, require large numbers of cryptographic signing and verification operations, increasingly using Elliptic Curve Cryptography. This uses a group operation (called point…
(more)
▼ Commercial applications, including Blockchain, require large numbers of cryptographic
signing and verification operations, increasingly using Elliptic Curve Cryptography. This uses a
group operation (called point addition) in the set of points on an elliptic curve over a prime field. Scalar multiplication of the repeated addition of a fixed point, P , in the curve. Along with the infinity point, which serves as the identity of addition and the zero of scalar multiplication, this forms a vector space over the prime field. The scalar multiplication can be accelerated by decomposing the number of additions into nibbles or other digits, and using a pre-computed table of values P , 2P , 3P, . . . This is called a windowed method. To avoid side-channel attacks, implementations must ensure that the time and power used do not depend on the scalar. Avoiding conditional execution ensures constant-time and constant-power execution.
This thesis presents a theoretical reduction in latency for the windowed method by introducing parallelism. Using three cores can achieve an improvement of 42% in the latency versus a single-threaded computation.
Thesis
Master of Science (MSc)
Advisors/Committee Members: Anand, Christopher, Kahl, Wolfram, Computing and Software.
Subjects/Keywords: cryptography; parallel computing
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Bouman, T. (2021). Parallel Windowed Method for Scalar Multiplication in Elliptic Curve Cryptography. (Masters Thesis). McMaster University. Retrieved from http://hdl.handle.net/11375/26191
Chicago Manual of Style (16th Edition):
Bouman, Tanya. “Parallel Windowed Method for Scalar Multiplication in Elliptic Curve Cryptography.” 2021. Masters Thesis, McMaster University. Accessed March 04, 2021.
http://hdl.handle.net/11375/26191.
MLA Handbook (7th Edition):
Bouman, Tanya. “Parallel Windowed Method for Scalar Multiplication in Elliptic Curve Cryptography.” 2021. Web. 04 Mar 2021.
Vancouver:
Bouman T. Parallel Windowed Method for Scalar Multiplication in Elliptic Curve Cryptography. [Internet] [Masters thesis]. McMaster University; 2021. [cited 2021 Mar 04].
Available from: http://hdl.handle.net/11375/26191.
Council of Science Editors:
Bouman T. Parallel Windowed Method for Scalar Multiplication in Elliptic Curve Cryptography. [Masters Thesis]. McMaster University; 2021. Available from: http://hdl.handle.net/11375/26191
3.
Hamsapriya T.
Certain investigations on parallel Computing for
engineering Applications;.
Degree: Certain investigations on parallel Computing for
engineering Applications, 2014, Anna University
URL: http://shodhganga.inflibnet.ac.in/handle/10603/29893
► The need for faster computers is driven by the demands of both newlineComputation intensive applications in science and engineering and dataintensive newlineapplications in commerce The…
(more)
▼ The need for faster computers is driven by the
demands of both newlineComputation intensive applications in
science and engineering and dataintensive newlineapplications in
commerce The main motivation of parallelization of newlineany
sequential algorithm is the desire to reduce the total execution
time of the newlinealgorithm Parallel computing provides a way to
reduce the time taken to newlineexecute a task by dividing the task
between multiple computers that can work newlinesimultaneously to
complete the job As the cost of dedicated parallel machines
newlineis high computing on a cluster of workstations is a viable
alternative for a newlinewide range of engineering applications The
thesis aims at finding cost newlineeffective parallel
implementation of algorithms on a cluster to solve complex
newlineapplications The first part of the thesis addresses the
issues that arise in the newlineparallel implementations The major
issues of concern are load balancing, a newlinewidely used
technique to balance the workload among multiple processors
newlineand fault tolerance the ability to tolerate failures The
proposed load newlinebalancing approach is based on a central
scheduler with hybrid genetic newlinealgorithms and genetic local
search methods Replication approach has been newlineproposed to
improve the performance and fault tolerance of a multiprocessor
newlinesystem newline newline
appendix p215-217, reference
p218-228.
Advisors/Committee Members: Sumathi S.
Subjects/Keywords: Dataintensive applications; Parallel computing
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
T, H. (2014). Certain investigations on parallel Computing for
engineering Applications;. (Thesis). Anna University. Retrieved from http://shodhganga.inflibnet.ac.in/handle/10603/29893
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
Chicago Manual of Style (16th Edition):
T, Hamsapriya. “Certain investigations on parallel Computing for
engineering Applications;.” 2014. Thesis, Anna University. Accessed March 04, 2021.
http://shodhganga.inflibnet.ac.in/handle/10603/29893.
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
MLA Handbook (7th Edition):
T, Hamsapriya. “Certain investigations on parallel Computing for
engineering Applications;.” 2014. Web. 04 Mar 2021.
Vancouver:
T H. Certain investigations on parallel Computing for
engineering Applications;. [Internet] [Thesis]. Anna University; 2014. [cited 2021 Mar 04].
Available from: http://shodhganga.inflibnet.ac.in/handle/10603/29893.
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
Council of Science Editors:
T H. Certain investigations on parallel Computing for
engineering Applications;. [Thesis]. Anna University; 2014. Available from: http://shodhganga.inflibnet.ac.in/handle/10603/29893
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation

Texas A&M University
4.
Hoxha, Dielli.
Sparse Matrices and Summa Matrix Multiplication Algorithm in STAPL Matrix Framework.
Degree: MS, Computer Engineering, 2016, Texas A&M University
URL: http://hdl.handle.net/1969.1/157157
► Applications of matrices are found in most scientific fields, such as physics, computer graphics, numerical analysis, etc. The high applicability of matrix algorithms and representations…
(more)
▼ Applications of matrices are found in most scientific fields, such as physics, computer graphics, numerical analysis, etc. The high applicability of matrix algorithms and representations make them an important component in any
parallel programming language, therefore matrix frameworks are a continuous research effort in high performance
computing. This work focuses on a generic matrix framework in the STAPL library. First, we extend the STAPL library by adding a sparse matrix container. Second we implement SUMMA, the
parallel matrix-multiplication algorithm, for fine grained computations. Then, implement
parallel matrix-matrix algorithms for the sparse matrix container. Finally, we conduct experimental studies for each of the components we have implemented and discuss the findings. Experiments are conducted on a Cray XE6m cluster. Experimental studies consist of multiple matrix and data inputs that showcase and stress the matrix models implemented. We find that the sparse matrix container outperforms its dense counterpart in sparse in-puts, and vice versa. Both containers, and the matrix summa implementation show scalability up to 512 cores.
Advisors/Committee Members: Amato, Nancy M. (advisor), Rauchwerger, Lawrence (advisor), Ragusa, Jean (committee member).
Subjects/Keywords: Parallel Computing; Sparse Matrix
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Hoxha, D. (2016). Sparse Matrices and Summa Matrix Multiplication Algorithm in STAPL Matrix Framework. (Masters Thesis). Texas A&M University. Retrieved from http://hdl.handle.net/1969.1/157157
Chicago Manual of Style (16th Edition):
Hoxha, Dielli. “Sparse Matrices and Summa Matrix Multiplication Algorithm in STAPL Matrix Framework.” 2016. Masters Thesis, Texas A&M University. Accessed March 04, 2021.
http://hdl.handle.net/1969.1/157157.
MLA Handbook (7th Edition):
Hoxha, Dielli. “Sparse Matrices and Summa Matrix Multiplication Algorithm in STAPL Matrix Framework.” 2016. Web. 04 Mar 2021.
Vancouver:
Hoxha D. Sparse Matrices and Summa Matrix Multiplication Algorithm in STAPL Matrix Framework. [Internet] [Masters thesis]. Texas A&M University; 2016. [cited 2021 Mar 04].
Available from: http://hdl.handle.net/1969.1/157157.
Council of Science Editors:
Hoxha D. Sparse Matrices and Summa Matrix Multiplication Algorithm in STAPL Matrix Framework. [Masters Thesis]. Texas A&M University; 2016. Available from: http://hdl.handle.net/1969.1/157157

University of Illinois – Urbana-Champaign
5.
Manikandan, Gowthami Jayashri.
Comprehensive evaluation of error correction methods for high-throughput sequencing data.
Degree: MS, Electrical & Computer Engr, 2017, University of Illinois – Urbana-Champaign
URL: http://hdl.handle.net/2142/97787
► The advent of DNA and RNA sequencing has significantly revolutionized the study of genomics and molecular biology. Development of high-throughput sequencing technologies have brought about…
(more)
▼ The advent of DNA and RNA sequencing has significantly revolutionized the study of genomics and molecular biology. Development of high-throughput sequencing technologies have brought about a quick and cheaper way to sequence genomes. Different technologies use different underlying methods for sequencing and are prone to different error rates. Though many tools exist for error correction in high-throughput sequencing data, no standard technology-independent method is available yet to evaluate the accuracy and effectiveness of these error correction tools. In order to supply a standard way to evaluate error correction methods for DNA and RNA sequencing, this thesis presents a Software Package for Error Correction Tool Assessment on nuCLEic acid sequences (SPECTACLE). SPECTACLE can evaluate corrected DNA and RNA reads from many underlying sequencing technologies and differentiate heterozygous alleles from sequencing errors. The work provides some key insights on many factors that stress the challenges in error correction by compiling high-throughput sequencing read sets from technologies like Illumina, PacBio and ONT. The performances of 23 different error correction tools have been analyzed using SPECTACLE and the compiled datasets. This thesis also provides unique and helpful insights into the strengths and weaknesses of various error correction tools and aims to establish a standard platform for evaluating error correction tools in the future.
Advisors/Committee Members: Chen, Deming (advisor).
Subjects/Keywords: Computational genomics; Algorithms; Parallel computing
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Manikandan, G. J. (2017). Comprehensive evaluation of error correction methods for high-throughput sequencing data. (Thesis). University of Illinois – Urbana-Champaign. Retrieved from http://hdl.handle.net/2142/97787
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
Chicago Manual of Style (16th Edition):
Manikandan, Gowthami Jayashri. “Comprehensive evaluation of error correction methods for high-throughput sequencing data.” 2017. Thesis, University of Illinois – Urbana-Champaign. Accessed March 04, 2021.
http://hdl.handle.net/2142/97787.
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
MLA Handbook (7th Edition):
Manikandan, Gowthami Jayashri. “Comprehensive evaluation of error correction methods for high-throughput sequencing data.” 2017. Web. 04 Mar 2021.
Vancouver:
Manikandan GJ. Comprehensive evaluation of error correction methods for high-throughput sequencing data. [Internet] [Thesis]. University of Illinois – Urbana-Champaign; 2017. [cited 2021 Mar 04].
Available from: http://hdl.handle.net/2142/97787.
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
Council of Science Editors:
Manikandan GJ. Comprehensive evaluation of error correction methods for high-throughput sequencing data. [Thesis]. University of Illinois – Urbana-Champaign; 2017. Available from: http://hdl.handle.net/2142/97787
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation

University of Ontario Institute of Technology
6.
Dubitski, Alexander.
A parallel adaptive method for pseudo-arclength continuation.
Degree: 2011, University of Ontario Institute of Technology
URL: http://hdl.handle.net/10155/196
► We parallelize the pseudo-arclength continuation method for solving nonlinear systems of equations. Pseudo-arclength continuation is a predictor-corrector method where the correction step consists of solving…
(more)
▼ We parallelize the pseudo-arclength continuation method for solving nonlinear systems
of equations. Pseudo-arclength continuation is a predictor-corrector method where the
correction step consists of solving a linear system of algebraic equations. Our algorithm
parallelizes adaptive step-length selection and inexact prediction. Prior attempts to parallelize
pseudo-arclength continuation are typically based on parallelization of the linear
solver which leads to completely solver-dependent software. In contrast, our method is
completely independent of the internal solver and therefore applicable to a large domain
of problems. Our software is easy to use and does not require the user to have extensive
prior experience with High Performance
Computing; all the user needs to provide is the
implementation of the corrector step. When corrector steps are costly or continuation
curves are complicated, we observe up to sixty percent speed up with moderate numbers
of processors. We present results for a synthetic problem and a problem in turbulence.
Advisors/Committee Members: Van Veen, Lennaert.
Subjects/Keywords: HPC; Arclength; Parallel; Computing
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Dubitski, A. (2011). A parallel adaptive method for pseudo-arclength continuation. (Thesis). University of Ontario Institute of Technology. Retrieved from http://hdl.handle.net/10155/196
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
Chicago Manual of Style (16th Edition):
Dubitski, Alexander. “A parallel adaptive method for pseudo-arclength continuation.” 2011. Thesis, University of Ontario Institute of Technology. Accessed March 04, 2021.
http://hdl.handle.net/10155/196.
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
MLA Handbook (7th Edition):
Dubitski, Alexander. “A parallel adaptive method for pseudo-arclength continuation.” 2011. Web. 04 Mar 2021.
Vancouver:
Dubitski A. A parallel adaptive method for pseudo-arclength continuation. [Internet] [Thesis]. University of Ontario Institute of Technology; 2011. [cited 2021 Mar 04].
Available from: http://hdl.handle.net/10155/196.
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
Council of Science Editors:
Dubitski A. A parallel adaptive method for pseudo-arclength continuation. [Thesis]. University of Ontario Institute of Technology; 2011. Available from: http://hdl.handle.net/10155/196
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation

University of Ontario Institute of Technology
7.
Hazari, Shihab Shahriar.
Design and development of a parallel Proof of Work for permissionless blockchain systems.
Degree: 2019, University of Ontario Institute of Technology
URL: http://hdl.handle.net/10155/1037
► Blockchain, which is the underlying technology for the Bitcoin cryptocurrency, is a distributed ledger forming a decentralized consensus in a peer-to-peer network. A large number…
(more)
▼ Blockchain, which is the underlying technology for the Bitcoin cryptocurrency, is a distributed ledger forming a decentralized consensus in a peer-to-peer network. A large number of the current cryptocurrencies use blockchain technology to maintain the network and the peers use a consensus mechanism called Proof of Work to verify and confirm the transactions. However, the transaction speed in this process is significantly slower than traditional digital transaction systems such as credit cards or PayPal. In this thesis, a
parallel Proof of Work model is proposed in order to increase the scalability of the processing of the transactions. The goal of this model is to ensure that, no two or more miners put the same effort into solving the same block. This model differs from traditional Proof of Work or the Bitcoin pool mining in many aspects, such as the responsibilities of the manager, contribution of active miners, and the reward system. A proof of concept prototype of the proposed model has been constructed based on the attributes of Bitcoin. The prototype has been tested in a local as well as in a cloud environment and results show the feasibility of the proposed model.
Advisors/Committee Members: Mahmoud, Qusay.
Subjects/Keywords: Blockchain; Parallel computing; Scalability; Bitcoin
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Hazari, S. S. (2019). Design and development of a parallel Proof of Work for permissionless blockchain systems. (Thesis). University of Ontario Institute of Technology. Retrieved from http://hdl.handle.net/10155/1037
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
Chicago Manual of Style (16th Edition):
Hazari, Shihab Shahriar. “Design and development of a parallel Proof of Work for permissionless blockchain systems.” 2019. Thesis, University of Ontario Institute of Technology. Accessed March 04, 2021.
http://hdl.handle.net/10155/1037.
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
MLA Handbook (7th Edition):
Hazari, Shihab Shahriar. “Design and development of a parallel Proof of Work for permissionless blockchain systems.” 2019. Web. 04 Mar 2021.
Vancouver:
Hazari SS. Design and development of a parallel Proof of Work for permissionless blockchain systems. [Internet] [Thesis]. University of Ontario Institute of Technology; 2019. [cited 2021 Mar 04].
Available from: http://hdl.handle.net/10155/1037.
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
Council of Science Editors:
Hazari SS. Design and development of a parallel Proof of Work for permissionless blockchain systems. [Thesis]. University of Ontario Institute of Technology; 2019. Available from: http://hdl.handle.net/10155/1037
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation

University of New South Wales
8.
Ke, Jing.
Parallel Computing and Performance Optimisation in Remotely Sensed Image Processing.
Degree: Computer Science & Engineering, 2018, University of New South Wales
URL: http://handle.unsw.edu.au/1959.4/59793
;
https://unsworks.unsw.edu.au/fapi/datastream/unsworks:49745/SOURCE02?view=true
► Remote sensing is the acquisition of physical response from an object without touch or contact, often collected by remote sensors mounted on satellites or aircraft.…
(more)
▼ Remote sensing is the acquisition of physical response from an object without touch or contact, often collected by remote sensors mounted on satellites or aircraft. Such datasets have been used for many decades in a wide range of applications. With advances in sensor technology, earth imaging is now possible at an unprecedented level of detail, and the amount of data acquired by imaging sensors has been growing rapidly in recent years. Many procedures for processing remotely sensed information are characterised by massively
parallel data processing, intensive computation and complex processing algorithms. These characteristics make real-time processing of large datasets very crucial. Graphical Processing Unit (GPU) is a typical
parallel computing and multicore architecture, designed to perform computations on large amounts of independent data. In recent years, GPU has become dominant for high performance
computing and its massive computational capability is well suited to analyse large-scale remotely sensed information.The thesis studies the
computing architecture and memory hierarchy of GPUs accelerators, and discusses the main performance issues in scientific and
parallel computing.
Parallel computing frameworks, parallelisation strategies, performance optimisation and acceleration algorithms for remotely sensed image processing are provided. Effective and efficient solutions are provided to dynamic programming based NP-hard optimisation problems and pixel-classification based hyperspectral imaging unmixing procedures. Verification methods are desgined to evaluate performance in terms of accuracy and speedup. The proposed methods assume spatial smoothness in remotely sensed images, which distinguishes them from artificial arts or natural photos. The benchmark tests show good performance accelerations compared with both sequential and
parallel implementations in the literature on NVIDIA GPU accelerators. An analysis of performance profile results is presented and can be referred as a guide to similar parallelisation strategies for other
computing platforms.
Advisors/Committee Members: Sowmya, Arcot, Computer Science & Engineering, Faculty of Engineering, UNSW, Bednarz, Tomasz, Art, Faculty of Art & Design, UNSW.
Subjects/Keywords: Parallel computing; GPGPU; Remote sensing
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Ke, J. (2018). Parallel Computing and Performance Optimisation in Remotely Sensed Image Processing. (Doctoral Dissertation). University of New South Wales. Retrieved from http://handle.unsw.edu.au/1959.4/59793 ; https://unsworks.unsw.edu.au/fapi/datastream/unsworks:49745/SOURCE02?view=true
Chicago Manual of Style (16th Edition):
Ke, Jing. “Parallel Computing and Performance Optimisation in Remotely Sensed Image Processing.” 2018. Doctoral Dissertation, University of New South Wales. Accessed March 04, 2021.
http://handle.unsw.edu.au/1959.4/59793 ; https://unsworks.unsw.edu.au/fapi/datastream/unsworks:49745/SOURCE02?view=true.
MLA Handbook (7th Edition):
Ke, Jing. “Parallel Computing and Performance Optimisation in Remotely Sensed Image Processing.” 2018. Web. 04 Mar 2021.
Vancouver:
Ke J. Parallel Computing and Performance Optimisation in Remotely Sensed Image Processing. [Internet] [Doctoral dissertation]. University of New South Wales; 2018. [cited 2021 Mar 04].
Available from: http://handle.unsw.edu.au/1959.4/59793 ; https://unsworks.unsw.edu.au/fapi/datastream/unsworks:49745/SOURCE02?view=true.
Council of Science Editors:
Ke J. Parallel Computing and Performance Optimisation in Remotely Sensed Image Processing. [Doctoral Dissertation]. University of New South Wales; 2018. Available from: http://handle.unsw.edu.au/1959.4/59793 ; https://unsworks.unsw.edu.au/fapi/datastream/unsworks:49745/SOURCE02?view=true

Rice University
9.
Shi, Jia.
Normal modes, surface-wave and time-harmonic body-wave computational modeling and inverse modeling on unstructured, deformable meshes.
Degree: PhD, Natural Sciences, 2019, Rice University
URL: http://hdl.handle.net/1911/107672
► A novel computational framework is established for modeling normal modes, surface waves and time-harmonic body waves with a combination of several highly parallel algorithms. To…
(more)
▼ A novel computational framework is established for modeling normal modes, surface waves and time-harmonic body waves with a combination of several highly
parallel algorithms. To deal with complex geological features such as topography and interior discontinuities, both forward modeling and inverse modeling are performed on unstructured, deformable tetrahedral meshes. To study the inverse boundary value problem for time-harmonic elastic waves, for the recovery of P- and S-wave speeds from vibroseis data or the Neumann-to-Dirichlet map, a procedure for full waveform inversion with iterative regularization is designed. The multi-level iterative regularization is implemented by projecting gradients, after scaling, onto subspaces to avoid over-parametrization yielding conditional Lipschitz stability. The procedure is illustrated in computational experiments recovering the rough shapes and wave speeds of geological bodies from simple starting models, near and far from the boundary, that is the free surface. To study the seismic spectra, a Rayleigh-Ritz with mixed continuous Galerkin finite-element method based approach is developed to compute the normal modes of a planet in the presence of an essential spectrum. The relevant generalized eigenvalue problem is solved by a Lanczos approach combined with polynomial filtering and separation of the essential spectrum. Self-gravitation is treated as an N-body problem and the relevant gravitational potential is evaluated directly and efficiently utilizing the fast multipole method. In contrast with the standard shift-and-invert and the full-mode coupling algorithms, the polynomial filtering technique is ideally suited for solving large-scale three-dimensional interior eigenvalue problems since it significantly enhances the memory and computational efficiency without loss of accuracy. The
parallel efficiency and scalability of the proposed approach are demonstrated on several world-class supercomputers. To include the effects of the rotation and solve the resulting quadratic eigenvalue problem, the extended Lanczos vectors computed from a non-rotating planet are utilized as a subspace to reduce the dimension of the original problem significantly. The reduced system can further be solved via a standard eigensolver. Several computational experiments are performed to study the effects of the normal modes due to three-dimensional fine variations, rotation, and Coriolis effects.
Advisors/Committee Members: de Hoop, Maarten Valentijn (advisor).
Subjects/Keywords: normal modes; parallel computing
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Shi, J. (2019). Normal modes, surface-wave and time-harmonic body-wave computational modeling and inverse modeling on unstructured, deformable meshes. (Doctoral Dissertation). Rice University. Retrieved from http://hdl.handle.net/1911/107672
Chicago Manual of Style (16th Edition):
Shi, Jia. “Normal modes, surface-wave and time-harmonic body-wave computational modeling and inverse modeling on unstructured, deformable meshes.” 2019. Doctoral Dissertation, Rice University. Accessed March 04, 2021.
http://hdl.handle.net/1911/107672.
MLA Handbook (7th Edition):
Shi, Jia. “Normal modes, surface-wave and time-harmonic body-wave computational modeling and inverse modeling on unstructured, deformable meshes.” 2019. Web. 04 Mar 2021.
Vancouver:
Shi J. Normal modes, surface-wave and time-harmonic body-wave computational modeling and inverse modeling on unstructured, deformable meshes. [Internet] [Doctoral dissertation]. Rice University; 2019. [cited 2021 Mar 04].
Available from: http://hdl.handle.net/1911/107672.
Council of Science Editors:
Shi J. Normal modes, surface-wave and time-harmonic body-wave computational modeling and inverse modeling on unstructured, deformable meshes. [Doctoral Dissertation]. Rice University; 2019. Available from: http://hdl.handle.net/1911/107672

Georgia Tech
10.
Schieber, Matthew Cole.
Optimizing computational kernels in quantum chemistry.
Degree: MS, Computational Science and Engineering, 2018, Georgia Tech
URL: http://hdl.handle.net/1853/59949
► Density fitting is a rank reduction technique popularly used in quantum chemistry in order to reduce the computational cost of evaluating, transforming, and processing the…
(more)
▼ Density fitting is a rank reduction technique popularly used in quantum chemistry in
order to reduce the computational cost of evaluating, transforming, and processing the
4-center electron repulsion integrals (ERIs). By utilizing the resolution of the identity technique, density fitting reduces the 4-center ERIs into a 3-center form. Doing so not only alleviates the high storage cost of the ERIs, but it also reduces the computational cost of operations involving them. Still, these operations can remain as computational bottlenecks which commonly plague quantum chemistry procedures. The goal of this thesis is to investigate various optimizations for density-fitted version of computational kernels used ubiquitously throughout quantum chemistry. First, we detail the spatial sparsity available to the 3-center integrals and the application of such sparsity to various operations, including integral computation, metric contractions, and integral transformations. Next, we investigate sparse memory layouts and their implication on the performance of the integral transformation kernel. Next, we analyze two transformation algorithms and how their performance will vary depending on the context in which they are used. Then, we propose two sparse memory layouts and the resulting performance of Coulomb and exchange evaluations. Since the memory required for these tensors grows rapidly, we frame these discussions in the context of their in-core and disk performance. We implement these methods in the P SI 4 electronic structure package and reveal the optimal algorithm for the kernel should vary depending on whether a disk-based implementation must be used.
Advisors/Committee Members: Sherrill, Charles D. (advisor), Chow, Edmond (committee member), McDaniel, Jesse (committee member).
Subjects/Keywords: Quantum chemistry; Parallel computing; High performance computing
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Schieber, M. C. (2018). Optimizing computational kernels in quantum chemistry. (Masters Thesis). Georgia Tech. Retrieved from http://hdl.handle.net/1853/59949
Chicago Manual of Style (16th Edition):
Schieber, Matthew Cole. “Optimizing computational kernels in quantum chemistry.” 2018. Masters Thesis, Georgia Tech. Accessed March 04, 2021.
http://hdl.handle.net/1853/59949.
MLA Handbook (7th Edition):
Schieber, Matthew Cole. “Optimizing computational kernels in quantum chemistry.” 2018. Web. 04 Mar 2021.
Vancouver:
Schieber MC. Optimizing computational kernels in quantum chemistry. [Internet] [Masters thesis]. Georgia Tech; 2018. [cited 2021 Mar 04].
Available from: http://hdl.handle.net/1853/59949.
Council of Science Editors:
Schieber MC. Optimizing computational kernels in quantum chemistry. [Masters Thesis]. Georgia Tech; 2018. Available from: http://hdl.handle.net/1853/59949

Louisiana State University
11.
Yang, Shuangyang.
A Persistent Storage Model for Extreme Computing.
Degree: PhD, Computer Sciences, 2014, Louisiana State University
URL: etd-10312014-133428
;
https://digitalcommons.lsu.edu/gradschool_dissertations/2910
► The continuing technological progress resulted in a dramatic growth in aggregate computational performance of the largest supercomputing systems. Unfortunately, these advances did not translate to…
(more)
▼ The continuing technological progress resulted in a dramatic growth in aggregate computational performance of the largest supercomputing systems. Unfortunately, these advances did not translate to the required extent into accompanying I/O systems and little more in terms of architecture or effective access latency. New classes of algorithms developed for massively parallel applications, that gracefully handle the challenges of asynchrony, heavily multi-threaded distributed codes, and message-driven computation, must be matched by similar advances in I/O methods and algorithms to produce a well performing and balanced supercomputing system. This dissertation proposes PXFS, a storage model for persistent objects inspired by the ParalleX model of execution that addresses many of these challenges. The PXFS model is designed to be asynchronous in nature to comply with ParalleX model and proposes an active TupleSpace concept to hold all kinds of metadata/meta-object for either storage objects or runtime objects. The new active TupleSpace can also register ParalleX actions to be triggered under certain tuple operations. An first implementation of PXFS utilizing a well-known Orange parallel file system as its back-end via asynchronous I/O layer and the implementation of TupleSpace component in HPX, the implementation of ParalleX. These details are also described along with the preliminary performance data. A house-made micro benchmark is developed to measure the disk I/O throughput of the PXFS asynchronous interface. The results show perfect scalability and 3x to 20x times speedup of I/O throughput performance comparing to OrangeFS synchronous user interface. Use cases of TupleSpace components are discussed for real-world applications including micro check-pointing. By utilizing TupleSpace in HPX applications for I/O, global barrier can be replaced with fine-grained parallelism to overlap more computation with communication and greatly boost the performance and efficiency. Also the dissertation showcases the distributed directory service in Orange file system which process directory entries in parallel and effectively improves the directory metada operations.
Subjects/Keywords: parallel computing; parallel file system; parallel runtime system
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Yang, S. (2014). A Persistent Storage Model for Extreme Computing. (Doctoral Dissertation). Louisiana State University. Retrieved from etd-10312014-133428 ; https://digitalcommons.lsu.edu/gradschool_dissertations/2910
Chicago Manual of Style (16th Edition):
Yang, Shuangyang. “A Persistent Storage Model for Extreme Computing.” 2014. Doctoral Dissertation, Louisiana State University. Accessed March 04, 2021.
etd-10312014-133428 ; https://digitalcommons.lsu.edu/gradschool_dissertations/2910.
MLA Handbook (7th Edition):
Yang, Shuangyang. “A Persistent Storage Model for Extreme Computing.” 2014. Web. 04 Mar 2021.
Vancouver:
Yang S. A Persistent Storage Model for Extreme Computing. [Internet] [Doctoral dissertation]. Louisiana State University; 2014. [cited 2021 Mar 04].
Available from: etd-10312014-133428 ; https://digitalcommons.lsu.edu/gradschool_dissertations/2910.
Council of Science Editors:
Yang S. A Persistent Storage Model for Extreme Computing. [Doctoral Dissertation]. Louisiana State University; 2014. Available from: etd-10312014-133428 ; https://digitalcommons.lsu.edu/gradschool_dissertations/2910

University of Tasmania
12.
Atkinson, AK.
Tupleware: a distributed tuple space for the development and execution of array-based applications in a cluster computing environment.
Degree: 2010, University of Tasmania
URL: https://eprints.utas.edu.au/9996/1/Alistair_Atkinson_PhD_Thesis.pdf
► This thesis describes Tupleware, an implementation of a distributed tuple space which acts as a scalable and efficient cluster middleware for computationally intensive numerical and…
(more)
▼ This thesis describes Tupleware, an implementation of a distributed tuple space which acts as a scalable and efficient cluster middleware for computationally intensive numerical and scientific applications. Tupleware is based on the Linda coordination language (Gelernter 1985), and incorporates additional techniques such as peer-to-peer communications and exploitation of data locality in order to address problems such as scalability and performance, which are commonly encountered by traditional centralised tuple space implementations.
Tupleware is implemented in such as way that, whilepr ocessing is taking place, all communication between cluster nodes is decentralised in a peer-to-peer fashion. Communication events are initiated by a node requesting a tuple which is located on a remote node, and in order to make tuple retrieval as efficient as possible, a tuple
search algorithm is used to minimise the number of communication instances required to retrieve a remote tuple. This algorithm is based on the locality of a remote
tuple and the success of previous remote tuple requests. As Tupleware is targetted at numerical applications which generally involve the partitioning and processing of 1-D or 2-D arrays, the locality of a remote tuple can generally be determined as being located on one of a small number nodes which are processing neighbouring partitions of the array.
Furthermore, unlike some other distributed tuple space implementations, Tupleware does not burden the programmer with any additional complexity due to this distribution. At the application level, the Tupleware middleware behaves exactly like a centralised tuple space, and provides much greater flexibility with regards to where components of a system are executed.
The design and implementation of Tupleware is described, and placed in the context of other distributed tuple space implementations, along with the specific requirements of the applications that the system caters for. Finally, Tupleware is evaluated using several numerical and/or scientific applications, which show it to provide a sufficient level of scalability for a broad range tasks.
The main contribution of this work is the identification of techniques which enable a tuple space to be efficiently and transparently distributed across the nodes in a cluster. Central to this is the use of an algorithm for tuple retrieval which minimises the number of communication instances which occur during system execution. Distribution transparency is ensured by the provision of a simple interface to the underlying system, so that the distributed tuple space appears to the programmer as a single unified resource.
It is hoped that this research in some way furthers the adoption of the tuple space programming model for distributed computing, by enhancing its ability to
provide improved performance, scalability, flexibility and simplicity for a range of applications not traditionally suited to tuple space based systems.
Subjects/Keywords: Distributed computing; parallel computing; concurrency; high-performance computing; tuple space.
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Atkinson, A. (2010). Tupleware: a distributed tuple space for the development and execution of array-based applications in a cluster computing environment. (Thesis). University of Tasmania. Retrieved from https://eprints.utas.edu.au/9996/1/Alistair_Atkinson_PhD_Thesis.pdf
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
Chicago Manual of Style (16th Edition):
Atkinson, AK. “Tupleware: a distributed tuple space for the development and execution of array-based applications in a cluster computing environment.” 2010. Thesis, University of Tasmania. Accessed March 04, 2021.
https://eprints.utas.edu.au/9996/1/Alistair_Atkinson_PhD_Thesis.pdf.
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
MLA Handbook (7th Edition):
Atkinson, AK. “Tupleware: a distributed tuple space for the development and execution of array-based applications in a cluster computing environment.” 2010. Web. 04 Mar 2021.
Vancouver:
Atkinson A. Tupleware: a distributed tuple space for the development and execution of array-based applications in a cluster computing environment. [Internet] [Thesis]. University of Tasmania; 2010. [cited 2021 Mar 04].
Available from: https://eprints.utas.edu.au/9996/1/Alistair_Atkinson_PhD_Thesis.pdf.
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
Council of Science Editors:
Atkinson A. Tupleware: a distributed tuple space for the development and execution of array-based applications in a cluster computing environment. [Thesis]. University of Tasmania; 2010. Available from: https://eprints.utas.edu.au/9996/1/Alistair_Atkinson_PhD_Thesis.pdf
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation

Penn State University
13.
Park, Jeonghyung.
Reuse distance models for accelerating scientific computing workloads on multicore processors.
Degree: 2015, Penn State University
URL: https://submit-etda.libraries.psu.edu/catalog/26411
► As the number of cores increase in chip multiprocessor microarchitecture (CMP) or multicores, we often observe performance degradation due to complex memory behavior on such…
(more)
▼ As the number of cores increase in chip multiprocessor microarchitecture (CMP)
or multicores, we often observe performance degradation due to complex memory
behavior on such systems. To mitigate such �inefficiencies, we develop schemes that
can be used to characterize and improve the memory behavior of a multicore node
for scientific
computing applications that require high performance.
We leverage the fact that such scientific
computing applications often comprise
code blocks that are repeated, leading to certain periodic properties. We conjecture
that their periodic properties and their observable impacts on cache performance
can be characterized in �sufficient detail by simple 'alpha + beta*sine'models. Additionally,
starting from such a model of the observable reuse distances, we develop
a predictive cache miss model, followed by appropriate extensions for predictive
capability in the presence of interference.
We consider the utilization of our reuse distance and cache miss models for accelerating
scientific workloads on multicore system. We use our cache miss model
to determine a set of preferred applications to be co-scheduled with a given application
to minimize performance degradation from interference. Further, we propose
a reuse distance reducing ordering that improves the performance of Laplacian
mesh smoothing. We reorder mesh vertices based on the initial quality for each
node and its neighboring nodes so that we can improve both temporal and spatial
localities. The reordering results show that 38.75% of performance improvement
of Laplacian mesh smoothing can be obtained by our reuse distance reducing ordering
when running on a single core. 75x of speedup is obtained when scaling up
to 32 cores.
Advisors/Committee Members: Padma Raghavan, Dissertation Advisor/Co-Advisor, Padma Raghavan, Committee Chair/Co-Chair, Mahmut Taylan Kandemir, Committee Member, Kamesh Madduri, Committee Member, Christopher J Duffy, Committee Member.
Subjects/Keywords: high performance computing; scientific computing; parallel computing; performance optimization
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Park, J. (2015). Reuse distance models for accelerating scientific computing workloads on multicore processors. (Thesis). Penn State University. Retrieved from https://submit-etda.libraries.psu.edu/catalog/26411
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
Chicago Manual of Style (16th Edition):
Park, Jeonghyung. “Reuse distance models for accelerating scientific computing workloads on multicore processors.” 2015. Thesis, Penn State University. Accessed March 04, 2021.
https://submit-etda.libraries.psu.edu/catalog/26411.
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
MLA Handbook (7th Edition):
Park, Jeonghyung. “Reuse distance models for accelerating scientific computing workloads on multicore processors.” 2015. Web. 04 Mar 2021.
Vancouver:
Park J. Reuse distance models for accelerating scientific computing workloads on multicore processors. [Internet] [Thesis]. Penn State University; 2015. [cited 2021 Mar 04].
Available from: https://submit-etda.libraries.psu.edu/catalog/26411.
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
Council of Science Editors:
Park J. Reuse distance models for accelerating scientific computing workloads on multicore processors. [Thesis]. Penn State University; 2015. Available from: https://submit-etda.libraries.psu.edu/catalog/26411
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation

Northeastern University
14.
Sun, Enqiang.
Cross-platform heterogeneous runtime environment.
Degree: PhD, Department of Electrical and Computer Engineering, 2016, Northeastern University
URL: http://hdl.handle.net/2047/D20213163
► Heterogeneous platforms are becoming widely adopted thanks for the support from new languages and programming models. Among these languages/models, OpenCL is an industry standard for…
(more)
▼ Heterogeneous platforms are becoming widely adopted thanks for the support from new languages and programming models. Among these languages/models, OpenCL is an industry standard for parallel programming on heterogeneous devices. With OpenCL, compute-intensive portions of an application can be offloaded to a variety of processing units within a system. OpenCL is one of the first standards that focuses on portability, allowing programs to be written once and run unmodified on multiple heterogeneous devices, regardless of vendor.; While OpenCL has been widely adopted, there still remains a lack of support for automatic workload balancing and data consistency when multiple devices are present in the system. To address this need, we have designed a cross-platform heterogeneous runtime environment which provides a high-level, unified execution model that is coupled with an intelligent resource management facility. The main motivation for developing this runtime environment is to provide OpenCL programmers with a convenient programming paradigm to fully utilize all possible devices in a system and incorporate flexible workload balancing schemes without compromising the user's ability to assign tasks according to the data affinity. Our work removes much of the cumbersome initialization of the platform, and now devices and related OpenCL objects are hidden under the hood.; Equipped with this new runtime environment and associated programming interface, the programmer can focus on designing the application and worry less about customization to the target platform. Further, the programmer can now take advantage of multiple devices using a dynamic workload balancing algorithm to reap the benefits of task-level parallelism.; To demonstrate the value of this cross-platform heterogeneous runtime environment, we have evaluated it running both micro benchmarks and popular OpenCL benchmark applications. With minimal overhead of managing data objects across devices, the experimental results show a scalable performance speedup with increasing number of computing devices, without any changes of program source code.
Subjects/Keywords: parallel computing; runtime; Heterogeneous computing; Parallel programming (Computer science); Parallel processing (Electronic computers); Parallel computers; Electronic data processing; Distributed processing
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Sun, E. (2016). Cross-platform heterogeneous runtime environment. (Doctoral Dissertation). Northeastern University. Retrieved from http://hdl.handle.net/2047/D20213163
Chicago Manual of Style (16th Edition):
Sun, Enqiang. “Cross-platform heterogeneous runtime environment.” 2016. Doctoral Dissertation, Northeastern University. Accessed March 04, 2021.
http://hdl.handle.net/2047/D20213163.
MLA Handbook (7th Edition):
Sun, Enqiang. “Cross-platform heterogeneous runtime environment.” 2016. Web. 04 Mar 2021.
Vancouver:
Sun E. Cross-platform heterogeneous runtime environment. [Internet] [Doctoral dissertation]. Northeastern University; 2016. [cited 2021 Mar 04].
Available from: http://hdl.handle.net/2047/D20213163.
Council of Science Editors:
Sun E. Cross-platform heterogeneous runtime environment. [Doctoral Dissertation]. Northeastern University; 2016. Available from: http://hdl.handle.net/2047/D20213163

University of Utah
15.
Hunsaker, Isaac L.
Parallel distributed, reciprocal monte carlo radiation in coupled, large eddy combustion simulations.
Degree: PhD, Chemical Engineering, 2015, University of Utah
URL: http://content.lib.utah.edu/cdm/singleitem/collection/etd3/id/4020/rec/1795
► Radiation is the dominant mode of heat transfer in high temperature combustion environments. Radiative heat transfer affects the gas and particle phases, including all the…
(more)
▼ Radiation is the dominant mode of heat transfer in high temperature combustion environments. Radiative heat transfer affects the gas and particle phases, including all the associated combustion chemistry. The radiative properties are in turn affected by the turbulent flow field. This bi-directional coupling of radiation turbulence interactions poses a major challenge in creating parallel-capable, high-fidelity combustion simulations. In this work, a new model was developed in which reciprocal monte carlo radiation was coupled with a turbulent, large-eddy simulation combustion model. A technique wherein domain patches are stitched together was implemented to allow for scalable parallelism. The combustion model runs in parallel on a decomposed domain. The radiation model runs in parallel on a recomposed domain. The recomposed domain is stored on each processor after information sharing of the decomposed domain is handled via the message passing interface. Verification and validation testing of the new radiation model were favorable. Strong scaling analyses were performed on the Ember cluster and the Titan cluster for the CPU-radiation model and GPU-radiation model, respectively. The model demonstrated strong scaling to over 1,700 and 16,000 processing cores on Ember and Titan, respectively.
Subjects/Keywords: combustion; parallel computing; radiation; ray tracing
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Hunsaker, I. L. (2015). Parallel distributed, reciprocal monte carlo radiation in coupled, large eddy combustion simulations. (Doctoral Dissertation). University of Utah. Retrieved from http://content.lib.utah.edu/cdm/singleitem/collection/etd3/id/4020/rec/1795
Chicago Manual of Style (16th Edition):
Hunsaker, Isaac L. “Parallel distributed, reciprocal monte carlo radiation in coupled, large eddy combustion simulations.” 2015. Doctoral Dissertation, University of Utah. Accessed March 04, 2021.
http://content.lib.utah.edu/cdm/singleitem/collection/etd3/id/4020/rec/1795.
MLA Handbook (7th Edition):
Hunsaker, Isaac L. “Parallel distributed, reciprocal monte carlo radiation in coupled, large eddy combustion simulations.” 2015. Web. 04 Mar 2021.
Vancouver:
Hunsaker IL. Parallel distributed, reciprocal monte carlo radiation in coupled, large eddy combustion simulations. [Internet] [Doctoral dissertation]. University of Utah; 2015. [cited 2021 Mar 04].
Available from: http://content.lib.utah.edu/cdm/singleitem/collection/etd3/id/4020/rec/1795.
Council of Science Editors:
Hunsaker IL. Parallel distributed, reciprocal monte carlo radiation in coupled, large eddy combustion simulations. [Doctoral Dissertation]. University of Utah; 2015. Available from: http://content.lib.utah.edu/cdm/singleitem/collection/etd3/id/4020/rec/1795

University of California – Berkeley
16.
Rhoden, Barret.
Operating System Support for Parallel Processes.
Degree: Computer Science, 2014, University of California – Berkeley
URL: http://www.escholarship.org/uc/item/8gt545mj
► High-performance, parallel programs want uninterrupted access to physical resources. This characterization is true not only for traditional scientific computing, but also for high-priority data center…
(more)
▼ High-performance, parallel programs want uninterrupted access to physical resources. This characterization is true not only for traditional scientific computing, but also for high-priority data center applications that run on parallel processors. These applications require high, predictable performance and low latency, and they are important enough to warrant engineering effort at all levels of the software stack. Given the recent resurgence of interest in parallel computing as well as the increasing importance of data center applications, what changes can we make to operating system abstractions to support parallel programs?Akaros is a research operating system designed for single-node, large-scale SMP and many-core architectures. The primary feature of Akaros is a new process abstraction called the "Many-Core Process" (MCP) that embodies transparency, application control of physical resources, and performance isolation. The MCP is built on the idea of separating cores from threads: the operating system grants spatially partitioned cores to the MCP, and the application schedules its threads on those cores. Data centers typically have a mix of high-priority applications and background batch jobs, where the demands of the high-priority application can change over time. For this reason, an important part of Akaros is the provisioning, allocation, and preemption of resources, and the MCP must be able to handle having a resource revoked at any moment.In this work, I describe the MCP abstraction and the salient details of Akaros. I discuss how the kernel and user-level libraries work together to give an application control over its physical resources and to adapt to the revocation of cores at any time - even when the code is holding locks. I show an order of magnitude less interference for the MCP compared to Linux, more resilience to the loss of cores for an HPC application, and how a customized user-level scheduler can increase the performance of a simple webserver.
Subjects/Keywords: Computer science; Operating Systems; Parallel Computing
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Rhoden, B. (2014). Operating System Support for Parallel Processes. (Thesis). University of California – Berkeley. Retrieved from http://www.escholarship.org/uc/item/8gt545mj
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
Chicago Manual of Style (16th Edition):
Rhoden, Barret. “Operating System Support for Parallel Processes.” 2014. Thesis, University of California – Berkeley. Accessed March 04, 2021.
http://www.escholarship.org/uc/item/8gt545mj.
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
MLA Handbook (7th Edition):
Rhoden, Barret. “Operating System Support for Parallel Processes.” 2014. Web. 04 Mar 2021.
Vancouver:
Rhoden B. Operating System Support for Parallel Processes. [Internet] [Thesis]. University of California – Berkeley; 2014. [cited 2021 Mar 04].
Available from: http://www.escholarship.org/uc/item/8gt545mj.
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
Council of Science Editors:
Rhoden B. Operating System Support for Parallel Processes. [Thesis]. University of California – Berkeley; 2014. Available from: http://www.escholarship.org/uc/item/8gt545mj
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation

University of Georgia
17.
Agarwal, Abhishek.
Merging parallel simulations.
Degree: 2014, University of Georgia
URL: http://hdl.handle.net/10724/22049
► I n earlier work cloning is proposed as a means for efficiently splitting a running simulation midway through its execution into multiple parallel simulations. In…
(more)
▼ I n earlier work cloning is proposed as a means for efficiently splitting a running simulation midway through its execution into multiple parallel simulations. In simulation cloning, clones usually are able to share computations that occur
early in the simulation, but as their states diverge individual logical processes (LP’s) are replicated as necessary so that their computations proceed independently. Over time the state of the clones (or their constituent LPs) may converge.
Traditionally, these converged LPs would continue to execute identical events. We address this inefficiency by merging of previously cloned LPs. We show that such merging can further increase efficiency beyond that obtained through cloning only. We
discuss our implementation of merging, and illustrate its effectiveness in several example simulation scenarios.
Subjects/Keywords: Parallel and Distributed Computing; Simulations; Cloning; Merging.
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Agarwal, A. (2014). Merging parallel simulations. (Thesis). University of Georgia. Retrieved from http://hdl.handle.net/10724/22049
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
Chicago Manual of Style (16th Edition):
Agarwal, Abhishek. “Merging parallel simulations.” 2014. Thesis, University of Georgia. Accessed March 04, 2021.
http://hdl.handle.net/10724/22049.
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
MLA Handbook (7th Edition):
Agarwal, Abhishek. “Merging parallel simulations.” 2014. Web. 04 Mar 2021.
Vancouver:
Agarwal A. Merging parallel simulations. [Internet] [Thesis]. University of Georgia; 2014. [cited 2021 Mar 04].
Available from: http://hdl.handle.net/10724/22049.
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
Council of Science Editors:
Agarwal A. Merging parallel simulations. [Thesis]. University of Georgia; 2014. Available from: http://hdl.handle.net/10724/22049
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
18.
Washington, Ian D.
Large-Scale Dynamic Optimization Under Uncertainty using Parallel Computing.
Degree: PhD, 2016, McMaster University
URL: http://hdl.handle.net/11375/19092
► This research focuses on the development of a solution strategy for the optimization of large-scale dynamic systems under uncertainty. Uncertainty resides naturally within the external…
(more)
▼ This research focuses on the development of a solution strategy for the optimization of
large-scale dynamic systems under uncertainty. Uncertainty resides naturally within the
external forces posed to the system or from within the system itself. For example, in chemical
process systems, external inputs include flow rates, temperatures or compositions; while
internal sources include kinetic or mass transport parameters; and empirical parameters
used within thermodynamic correlations and expressions. The goal in devising a dynamic
optimization approach which explicitly accounts for uncertainty is to do so in a manner
which is computationally tractable and is general enough to handle various types and
sources of uncertainty. The approach developed in this thesis follows a so-called multiperiod
technique whereby the infinite dimensional uncertainty space is discretized at numerous
points (known as periods or scenarios) which creates different possible realizations of the
uncertain parameters. The resulting optimization formulation encompasses an approximated
expected value of a chosen objective functional subject to a dynamic model for all the
generated realizations of the uncertain parameters. The dynamic model can be solved,
using an appropriate numerical method, in an embedded manner for which the solution
is used to construct the optimization formulation constraints; or alternatively the model
could be completely discretized over the temporal domain and posed directly as part of the
optimization formulation.
Our approach in this thesis has mainly focused on the embedded model technique for
dynamic optimization which can either follow a single- or multiple-shooting solution method.
The first contribution of the thesis investigates a combined multiperiod multiple-shooting
dynamic optimization approach for the design of dynamic systems using ordinary differential
equation (ODE) or differential-algebraic equation (DAE) process models. A major aspect
of this approach is the analysis of the parallel solution of the embedded model within the
optimization formulation. As part of this analysis, we further consider the application of
the dynamic optimization approach to several design and operation applications. Another
vmajor contribution of the thesis is the development of a nonlinear programming (NLP) solver
based on an approach that combines sequential quadratic programming (SQP) with an
interior-point method (IPM) for the quadratic programming subproblem. A unique aspect of
the approach is that the inherent structure (and parallelism) of the multiperiod formulation
is exploited at the linear algebra level within the SQP-IPM nonlinear programming algorithm
using an explicit Schur-complement decomposition. Our NLP solution approach is further
assessed using several static and dynamic optimization benchmark examples.
Thesis
Doctor of Philosophy (PhD)
Advisors/Committee Members: Swartz, Christopher L.E., Chemical Engineering.
Subjects/Keywords: Dynamic Optimization; DAE process models; Parallel Computing
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Washington, I. D. (2016). Large-Scale Dynamic Optimization Under Uncertainty using Parallel Computing. (Doctoral Dissertation). McMaster University. Retrieved from http://hdl.handle.net/11375/19092
Chicago Manual of Style (16th Edition):
Washington, Ian D. “Large-Scale Dynamic Optimization Under Uncertainty using Parallel Computing.” 2016. Doctoral Dissertation, McMaster University. Accessed March 04, 2021.
http://hdl.handle.net/11375/19092.
MLA Handbook (7th Edition):
Washington, Ian D. “Large-Scale Dynamic Optimization Under Uncertainty using Parallel Computing.” 2016. Web. 04 Mar 2021.
Vancouver:
Washington ID. Large-Scale Dynamic Optimization Under Uncertainty using Parallel Computing. [Internet] [Doctoral dissertation]. McMaster University; 2016. [cited 2021 Mar 04].
Available from: http://hdl.handle.net/11375/19092.
Council of Science Editors:
Washington ID. Large-Scale Dynamic Optimization Under Uncertainty using Parallel Computing. [Doctoral Dissertation]. McMaster University; 2016. Available from: http://hdl.handle.net/11375/19092

Penn State University
19.
Kirmani, Shad.
EXPLOITING GRAPH EMBEDDING FOR PARALLELISM AND PERFORMANCE.
Degree: 2015, Penn State University
URL: https://submit-etda.libraries.psu.edu/catalog/27325
► Problems in very large data scientific computing simulations and big data analytics employ large clusters to scale up the performance. Without a good partitioner and…
(more)
▼ Problems in very large data scientific
computing simulations and big data analytics employ large clusters to scale up the performance. Without a good partitioner and with naive mapping of the partitioned processes to the processors in the network, speedups are difficult to achieve on large HPC systems. The data in such simulations is usually in the form of matrices, which can be abstracted as graphs. We propose that considering the geometry of the application graphs as well as the underlying processor network is of vital importance when scaling up.
As a first step, the data associated with the problem, abstracted as a graph, needs to be distributed on to the processors in the network.
Parallel multilevel partitioners, such as Pt-Scotch and ParMetis, produce good quality partitions but their performance scales poorly. Coordinate bisection schemes such as those in Zoltan, which can be applied only to graphs with coordinates, scale well but partition quality is often compromised. We seek to address this gap by developing a scalable
parallel scheme which imparts coordinates to a graph through a lattice-based multilevel embedding. Partitions are computed with a
parallel formulation of a geometric scheme that has been shown to provide provably good cuts on certain classes of graphs. We analyze the
parallel complexity of our scheme and we observe speed-ups and cut-sizes on large graphs. Our results indicate that our method is substantially faster than ParMetis and Pt-Scotch for hundreds to thousands of processors, while producing high quality cuts.
We then consider the problem of mapping irregular applications to multiprocessor architectures whose interconnect topologies affect the latencies of data movement across processor nodes. The starting point for solutions to this problem concern suitable weighted graph representations of an irregular application and a processor topology. Prior works for this problem have demonstrated that graph partitioning approaches can provide high quality solutions. Additionally, when coordinate information is available for the weighted graph of the application, geometric mapping schemes can also provide high quality solutions. We develop and present a scheme that we call “embedded sectioning” that directly computes a locality enhancing embedding of the weighted graph representation which is then mapped to the processor topology using recursive coordinate bisection. Our scheme is specifically directed at gaining high quality mappings for highly irregular applications where the amount of communication can vary greatly. We evaluate the quality of mappings produced by embedded sectioning for mesh-based processor topologies using well accepted measures including congestion, dilation and their product, referred to as communication volume. For a test suite of unit-weight graphs mapped to 32 × 32 mesh of processors, our method improves congestion by 26%, dilation by 52% and communication volume by 64% relative to the best values of these measures from 9 other schemes. Additionally, we observe that…
Advisors/Committee Members: Padma Raghavan, Dissertation Advisor/Co-Advisor, Padma Raghavan, Committee Chair/Co-Chair, Kamesh Madduri, Committee Member, Mahmut Taylan Kandemir, Committee Member, Christopher J Duffy, Committee Member.
Subjects/Keywords: Parallel computing; graph partitioning; topology; graphs
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Kirmani, S. (2015). EXPLOITING GRAPH EMBEDDING FOR PARALLELISM AND PERFORMANCE. (Thesis). Penn State University. Retrieved from https://submit-etda.libraries.psu.edu/catalog/27325
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
Chicago Manual of Style (16th Edition):
Kirmani, Shad. “EXPLOITING GRAPH EMBEDDING FOR PARALLELISM AND PERFORMANCE.” 2015. Thesis, Penn State University. Accessed March 04, 2021.
https://submit-etda.libraries.psu.edu/catalog/27325.
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
MLA Handbook (7th Edition):
Kirmani, Shad. “EXPLOITING GRAPH EMBEDDING FOR PARALLELISM AND PERFORMANCE.” 2015. Web. 04 Mar 2021.
Vancouver:
Kirmani S. EXPLOITING GRAPH EMBEDDING FOR PARALLELISM AND PERFORMANCE. [Internet] [Thesis]. Penn State University; 2015. [cited 2021 Mar 04].
Available from: https://submit-etda.libraries.psu.edu/catalog/27325.
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
Council of Science Editors:
Kirmani S. EXPLOITING GRAPH EMBEDDING FOR PARALLELISM AND PERFORMANCE. [Thesis]. Penn State University; 2015. Available from: https://submit-etda.libraries.psu.edu/catalog/27325
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation

Texas A&M University
20.
Sakaida, Shohei.
Downhole Temperature Modeling of a Hydraulically Fractured Horizontal Well Using Parallel Computing.
Degree: MS, Petroleum Engineering, 2019, Texas A&M University
URL: http://hdl.handle.net/1969.1/188791
► Diagnosing hydraulic fracture performance is essential to evaluate and optimize fracturing treatment designs in horizontal wells. Distributed temperature sensing (DTS) is a valuable tool to…
(more)
▼ Diagnosing hydraulic fracture performance is essential to evaluate and optimize fracturing treatment designs in horizontal wells. Distributed temperature sensing (DTS) is a valuable tool to monitor downhole conditions and diagnose hydraulic fractures. Although various temperature prediction models have been proposed to interpret the measured temperature data, quantitative interpretation is still challenging. To predict temperature in near-wellbore regions accurately, a forward model is needed to consider both reservoir and wellbore domains in transient conditions. In addition, the model has to be computationally efficient to implement history matching for field-scale reservoirs. Yoshida et al. (2016) developed a comprehensive thermal and flow model and successfully interpreted the DTS temperature data. This numerical model consists of a reservoir model and a wellbore model, which are coupled iteratively through boundary conditions. In each domain, mass, momentum and energy conservation are solved in transient conditions to obtain profiles of wellbore and sand=face temperature during fracturing treatment, shut-in, and production in a fractured well. This model enables us to interpret the DTS temperature quantitatively; however it is not practical for field applications from the point of view of computational efficiency. This study presents a
parallel version of the numerical thermal and flow model.
Parallel computing is generally used as an effective way to improve computational speed. A
parallel computing interface, MPI (Message Passing Interface) is implemented in this study because of its flexibility. The
parallel model allows us to simulate the temperature in field-scale reservoirs efficiently. Results of improvement are shown as comparisons of computational speed between the original model and the
parallel model during the processors of water injection and production.
Advisors/Committee Members: Hill, Alfred (advisor), Zhu, Ding (committee member), Banerjee, Debjyoti (committee member).
Subjects/Keywords: DTS; Downhole Temperature Modeling; Parallel Computing
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Sakaida, S. (2019). Downhole Temperature Modeling of a Hydraulically Fractured Horizontal Well Using Parallel Computing. (Masters Thesis). Texas A&M University. Retrieved from http://hdl.handle.net/1969.1/188791
Chicago Manual of Style (16th Edition):
Sakaida, Shohei. “Downhole Temperature Modeling of a Hydraulically Fractured Horizontal Well Using Parallel Computing.” 2019. Masters Thesis, Texas A&M University. Accessed March 04, 2021.
http://hdl.handle.net/1969.1/188791.
MLA Handbook (7th Edition):
Sakaida, Shohei. “Downhole Temperature Modeling of a Hydraulically Fractured Horizontal Well Using Parallel Computing.” 2019. Web. 04 Mar 2021.
Vancouver:
Sakaida S. Downhole Temperature Modeling of a Hydraulically Fractured Horizontal Well Using Parallel Computing. [Internet] [Masters thesis]. Texas A&M University; 2019. [cited 2021 Mar 04].
Available from: http://hdl.handle.net/1969.1/188791.
Council of Science Editors:
Sakaida S. Downhole Temperature Modeling of a Hydraulically Fractured Horizontal Well Using Parallel Computing. [Masters Thesis]. Texas A&M University; 2019. Available from: http://hdl.handle.net/1969.1/188791

University of Houston
21.
Chitral, Pooja 1986-.
Accelerator Benchmark Suite Using OpenACC Directives.
Degree: MS, Computer Science, 2014, University of Houston
URL: http://hdl.handle.net/10657/1686
► In recent years, GPU computing has been very popular for scientific applications, especially after the release of programming languages like CUDA, OpenCL, and OpenACC. The…
(more)
▼ In recent years, GPU
computing has been very popular for scientific applications, especially after the release of programming languages like CUDA, OpenCL, and OpenACC. The growing popularity of GPU computation in commercial and scientific fields is attributed to the high computational power of GPU cores. The accelerator benchmark suite using OpenACC 2.0 is a combination of very popular benchmarks – the Parboil and NAS
Parallel benchmarks. These benchmarks contain a wide range of throughput
computing applications, which are useful for studying the performance of
computing architectures and compilers. The Parboil benchmark includes applications from different scientific and commercial fields including image processing, biomolecular simulation, and astronomy. The NAS
Parallel benchmark has a set of applications that target different areas of computational fluid dynamics.
The accelerator benchmark suite that has been designed exploits the computational power of GPU architecture by using the emerging directives and clauses provided by OpenACC 2.0. This benchmark can act as a reference point for new programmers in GPU
computing, reducing the time taken to understand one of the most powerful
parallel programming paradigms.
Finally, the goal of the accelerator benchmark is to evaluate the applicability of one of the high-level programming models OpenACC for accelerators. This benchmark will help provide the OpenACC community with valuable feedback to improve the model further.
Advisors/Committee Members: Chapman, Barbara M. (advisor), Gnawali, Omprakash (committee member), Gurkan, Deniz (committee member).
Subjects/Keywords: GPU computing; OpenACC; Parboil; NAS parallel Benchmarks
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Chitral, P. 1. (2014). Accelerator Benchmark Suite Using OpenACC Directives. (Masters Thesis). University of Houston. Retrieved from http://hdl.handle.net/10657/1686
Chicago Manual of Style (16th Edition):
Chitral, Pooja 1986-. “Accelerator Benchmark Suite Using OpenACC Directives.” 2014. Masters Thesis, University of Houston. Accessed March 04, 2021.
http://hdl.handle.net/10657/1686.
MLA Handbook (7th Edition):
Chitral, Pooja 1986-. “Accelerator Benchmark Suite Using OpenACC Directives.” 2014. Web. 04 Mar 2021.
Vancouver:
Chitral P1. Accelerator Benchmark Suite Using OpenACC Directives. [Internet] [Masters thesis]. University of Houston; 2014. [cited 2021 Mar 04].
Available from: http://hdl.handle.net/10657/1686.
Council of Science Editors:
Chitral P1. Accelerator Benchmark Suite Using OpenACC Directives. [Masters Thesis]. University of Houston; 2014. Available from: http://hdl.handle.net/10657/1686
22.
-1801-637X.
High-Performance Sparse Fourier Transform on Parallel Architectures.
Degree: PhD, Computer Science, 2016, University of Houston
URL: http://hdl.handle.net/10657/3269
► Fast Fourier Transform (FFT) is one of the most important numerical algorithms widely used in numerous scientific and engineering computations. With the emergence of big…
(more)
▼ Fast Fourier Transform (FFT) is one of the most important numerical algorithms widely used in numerous scientific and engineering computations. With the emergence of big data problems, however, in which the size of the processed data can easily exceed terabytes, it is challenging to acquire, process and store a sufficient amount of data to compute the FFT in the first place. The recently developed it{sparse} FFT (sFFT) algorithm provides a solution to this problem. The sFFT can compute a compressed Fourier transform by using only a small subset of the input data, thus achieves significant performance improvement.
Modern homogeneous and heterogeneous multicore and manycore architectures are now part of the mainstream
computing scene and can offer impressive performance for many applications. The computations that arise in sFFT lend it naturally to efficient
parallel implementations. In this dissertation, we present efficient
parallel implementations of the sFFT algorithm on three state-of-the-art
parallel architectures, namely multicore CPUs, GPUs and a heterogeneous multicore embedded system. While the increase in the number of cores and memory bandwidth on modern architectures provide an opportunity to improve the performance through sophisticated
parallel algorithm design, the sFFT is inherently complex, and numerous challenges need to address to deliver the optimal performance. In this dissertation, various parallelization and performance optimization techniques are proposed and implemented. Our
parallel sFFT is more than 5x and 20x faster than the sequential sFFT on multicore CPUs and GPUs, respectively. Compared to full-size FFT libraries, the
parallel sFFT achieves more than 9x speedup on multicore CPUs and 12x speedup on GPUs for a broad range of signal spectra.
Advisors/Committee Members: Chapman, Barbara M. (advisor), Gnawali, Omprakash (committee member), Shah, Shishir Kirit (committee member), Shi, Weidong (committee member), May, Elebeoba E. (committee member).
Subjects/Keywords: Sparse FFT; Parallel computing; Compressive Sensing; GPU
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
-1801-637X. (2016). High-Performance Sparse Fourier Transform on Parallel Architectures. (Doctoral Dissertation). University of Houston. Retrieved from http://hdl.handle.net/10657/3269
Note: this citation may be lacking information needed for this citation format:
Author name may be incomplete
Chicago Manual of Style (16th Edition):
-1801-637X. “High-Performance Sparse Fourier Transform on Parallel Architectures.” 2016. Doctoral Dissertation, University of Houston. Accessed March 04, 2021.
http://hdl.handle.net/10657/3269.
Note: this citation may be lacking information needed for this citation format:
Author name may be incomplete
MLA Handbook (7th Edition):
-1801-637X. “High-Performance Sparse Fourier Transform on Parallel Architectures.” 2016. Web. 04 Mar 2021.
Note: this citation may be lacking information needed for this citation format:
Author name may be incomplete
Vancouver:
-1801-637X. High-Performance Sparse Fourier Transform on Parallel Architectures. [Internet] [Doctoral dissertation]. University of Houston; 2016. [cited 2021 Mar 04].
Available from: http://hdl.handle.net/10657/3269.
Note: this citation may be lacking information needed for this citation format:
Author name may be incomplete
Council of Science Editors:
-1801-637X. High-Performance Sparse Fourier Transform on Parallel Architectures. [Doctoral Dissertation]. University of Houston; 2016. Available from: http://hdl.handle.net/10657/3269
Note: this citation may be lacking information needed for this citation format:
Author name may be incomplete

Universitat Autònoma de Barcelona
23.
Vera Rodríguez, Gonzalo.
R/parallel Parallel Computing for R in non‐dedicated environments.
Degree: Departament d'Arquitectura de Computadors i Sistemes Operatius, 2010, Universitat Autònoma de Barcelona
URL: http://hdl.handle.net/10803/121248
► Traditionally, parallel computing has been associated with special purpose applications designed to run in complex computing clusters, specifically set up with a software stack of…
(more)
▼ Traditionally,
parallel computing has been associated with special purpose
applications designed to run in complex
computing clusters, specifically set up with
a software stack of dedicated libraries together with advanced administration tools to
manage co Traditionally,
parallel computing has been associated with special purpose applications designed to run in complex
computing clusters, specifically set up with a software stack of dedicated libraries together with advanced administration tools to manage complex IT infrastructures. These High Performance
Computing (HPC) solutions, although being the most efficient solutions in terms of performance and scalability, impose technical and practical barriers for most common scientists whom, with reduced IT knowledge, time and resources, are unable to embrace classical HPC solutions without considerable efforts. Moreover, two important technology advances are increasing the need for
parallel computing. For example in the bioinformatics field, and similarly in other experimental science disciplines, new high throughput screening devices are generating huge amounts of data within very short time which requires their analysis in equally short time periods to avoid delaying experimental analysis. Another important technological change involves the design of new processor chips. To increase raw performance the current strategy is to increase the number of processing units per chip, so to make use of the new processing capacities
parallel applications are required. In both cases we find users that may need to update their current sequential applications and
computing resources to achieve the increased processing capacities required for their particular needs. Since
parallel computing is becoming a natural option for obtaining increased performance and it is required by new computer systems, solutions adapted for the mainstream should be developed for a seamless adoption. In order to enable the adoption of
parallel computing, new methods and technologies are required to remove or mitigate the current barriers and obstacles that prevent many users from evolving their sequential running environments. A particular scenario that specially suffers from these problems and that is considered as a practical case in this work consists of bioinformaticians analyzing molecular data with methods written with the R language. In many cases, with long datasets, they have to wait for days and weeks for their data to be processed or perform the cumbersome task of manually splitting their data, look for available computers to run these subsets and collect back the previously scattered results. Most of these applications written in R are based on
parallel loops. A loop is called a
parallel loop if there is no data dependency among all its iterations, and therefore any iteration can be processed in any order or even simultaneously, so they are susceptible of being parallelized.
Parallel loops are found in a large number of scientific applications. Previous contributions deal with partial aspects…
Advisors/Committee Members: [email protected] (authoremail), true (authoremailshow), Suppi Boldrito, Remo (director), true (authorsendemail).
Subjects/Keywords: Parallel loops; Oportunistic computing; Bioinformatics; Tecnologies; 004
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Vera Rodríguez, G. (2010). R/parallel Parallel Computing for R in non‐dedicated environments. (Thesis). Universitat Autònoma de Barcelona. Retrieved from http://hdl.handle.net/10803/121248
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
Chicago Manual of Style (16th Edition):
Vera Rodríguez, Gonzalo. “R/parallel Parallel Computing for R in non‐dedicated environments.” 2010. Thesis, Universitat Autònoma de Barcelona. Accessed March 04, 2021.
http://hdl.handle.net/10803/121248.
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
MLA Handbook (7th Edition):
Vera Rodríguez, Gonzalo. “R/parallel Parallel Computing for R in non‐dedicated environments.” 2010. Web. 04 Mar 2021.
Vancouver:
Vera Rodríguez G. R/parallel Parallel Computing for R in non‐dedicated environments. [Internet] [Thesis]. Universitat Autònoma de Barcelona; 2010. [cited 2021 Mar 04].
Available from: http://hdl.handle.net/10803/121248.
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
Council of Science Editors:
Vera Rodríguez G. R/parallel Parallel Computing for R in non‐dedicated environments. [Thesis]. Universitat Autònoma de Barcelona; 2010. Available from: http://hdl.handle.net/10803/121248
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
24.
Lugowski, Adam.
Scalable Graph Algorithms in a High-Level Language Using Primitives Inspired by Linear Algebra.
Degree: 2014, University of California – eScholarship, University of California
URL: http://www.escholarship.org/uc/item/85f079s8
► This dissertation advances the state of the art for scalable high-performance graph analytics and data mining using the language of linear algebra. Many graph computations…
(more)
▼ This dissertation advances the state of the art for scalable high-performance graph analytics and data mining using the language of linear algebra. Many graph computations suffer poor scalability due to their irregular nature and low operational intensity. A small but powerful set of linear algebra primitives that specifically target graph and data mining applications can expose sufficient coarse-grained parallelism to scale to thousands of processors.In this dissertation we advance existing distributed memory approaches in two important ways. First, we observe that data scientists and domain experts know their analysis and mining problems well, but suffer from little HPC experience. We describe a system that presents the user with a clean API in a high-level language that scales from a laptop to a supercomputer with thousands of cores. We utilize a Domain-Specific Embedded Language with Selective Just-In-Time Specialization to ensure a negligible performance impact over the original distributed memory low-level code. The high-level language enables ease of use, rapid prototyping, and additional features such as on-the-fly filtering, runtime-defined objects, and exposure to a large set of third-party visualization packages.The second important advance is a new sparse matrix data structure and set of algorithms. We note that shared memory machines are dominant both in stand-alone form and as nodes in distributed memory clusters. This thesis offers the design of a new sparse-matrix data structure and set of parallel algorithms, a reusable implementation in shared memory, and a performance evaluation that shows significant speed and memory usage improvements over competing packages. Our method also offers features such as in-memory compression, a low-cost transpose, and chained primitives that do not materialize the entire intermediate result at any one time. We focus on a scalable, generalized, sparse matrix-matrix multiplication algorithm. This primitive is used extensively in many graph algorithms such as betweenness centrality, graph clustering, graph contraction, and subgraph extraction.
Subjects/Keywords: Computer science; graphs; linear algebra; parallel computing
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Lugowski, A. (2014). Scalable Graph Algorithms in a High-Level Language Using Primitives Inspired by Linear Algebra. (Thesis). University of California – eScholarship, University of California. Retrieved from http://www.escholarship.org/uc/item/85f079s8
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
Chicago Manual of Style (16th Edition):
Lugowski, Adam. “Scalable Graph Algorithms in a High-Level Language Using Primitives Inspired by Linear Algebra.” 2014. Thesis, University of California – eScholarship, University of California. Accessed March 04, 2021.
http://www.escholarship.org/uc/item/85f079s8.
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
MLA Handbook (7th Edition):
Lugowski, Adam. “Scalable Graph Algorithms in a High-Level Language Using Primitives Inspired by Linear Algebra.” 2014. Web. 04 Mar 2021.
Vancouver:
Lugowski A. Scalable Graph Algorithms in a High-Level Language Using Primitives Inspired by Linear Algebra. [Internet] [Thesis]. University of California – eScholarship, University of California; 2014. [cited 2021 Mar 04].
Available from: http://www.escholarship.org/uc/item/85f079s8.
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
Council of Science Editors:
Lugowski A. Scalable Graph Algorithms in a High-Level Language Using Primitives Inspired by Linear Algebra. [Thesis]. University of California – eScholarship, University of California; 2014. Available from: http://www.escholarship.org/uc/item/85f079s8
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation

University of Newcastle upon Tyne
25.
Chen, Xian.
Automatic parallelisation for a class of URE problems.
Degree: PhD, 1995, University of Newcastle upon Tyne
URL: http://theses.ncl.ac.uk/jspui/handle/10443/1974
;
https://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.294493
► This thesis deals with the methodology and software of automatic parallelisation for numerical supercomputing and supercomputers. Basically, we focus on the problem of Uniform Recurrence…
(more)
▼ This thesis deals with the methodology and software of automatic parallelisation for numerical supercomputing and supercomputers. Basically, we focus on the problem of Uniform Recurrence Equations (URE) which exists widely in numerical computations. vVepropose a complete methodology of automatic generation of parallel programs for regular array designs. The methodology starts with an introduction of a set of canonical dependencies which generates a general modelling of the various URE problems. Based on these canonical dependencies, partitioning and mapping methods are developed which gives the foundation of the universal design process. Using the theoretical results we propose the structures of parallel programs and eventually generate automatically parallel codes which run correctly and efficiently on transputer array. The achievements presented in this thesis can be regarded as a significant progress in the area of automatic generation of parallel codes and regular (systolic) array design. This methodology is integrated and self-contained, and may be the only practical working package in this area.
Subjects/Keywords: 005; Parallel computing
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Chen, X. (1995). Automatic parallelisation for a class of URE problems. (Doctoral Dissertation). University of Newcastle upon Tyne. Retrieved from http://theses.ncl.ac.uk/jspui/handle/10443/1974 ; https://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.294493
Chicago Manual of Style (16th Edition):
Chen, Xian. “Automatic parallelisation for a class of URE problems.” 1995. Doctoral Dissertation, University of Newcastle upon Tyne. Accessed March 04, 2021.
http://theses.ncl.ac.uk/jspui/handle/10443/1974 ; https://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.294493.
MLA Handbook (7th Edition):
Chen, Xian. “Automatic parallelisation for a class of URE problems.” 1995. Web. 04 Mar 2021.
Vancouver:
Chen X. Automatic parallelisation for a class of URE problems. [Internet] [Doctoral dissertation]. University of Newcastle upon Tyne; 1995. [cited 2021 Mar 04].
Available from: http://theses.ncl.ac.uk/jspui/handle/10443/1974 ; https://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.294493.
Council of Science Editors:
Chen X. Automatic parallelisation for a class of URE problems. [Doctoral Dissertation]. University of Newcastle upon Tyne; 1995. Available from: http://theses.ncl.ac.uk/jspui/handle/10443/1974 ; https://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.294493

Rice University
26.
Ding, Jian.
Software-based Baseband Processing for Massive MIMO.
Degree: MS, Engineering, 2019, Rice University
URL: http://hdl.handle.net/1911/107406
► Large-Scale multiple-input multiple-output (MIMO) is a key technology for improving spectral efficiency. However, it requires massive, real-time computation. All existing solutions are based on dedicated,…
(more)
▼ Large-Scale multiple-input multiple-output (MIMO) is a key technology for improving spectral efficiency. However, it requires massive, real-time computation. All existing solutions are based on dedicated, specialized hardware, e.g., FPGA, that is expensive, inflexible and difficult to program. This thesis investigates a software-only solution that exploits recent CPU development supporting many cores and architectural extensions for fine-grained parallelism. We present a high-performance framework for real-time, large-scale baseband processing on a many-core server. To achieve the high data rate and low latency promised by 5G, the framework utilizes data parallelism and exploits architecture features, including memory hierarchy and SIMD extensions, to accelerate computations and data movements. We report a prototype on a 36-core server and evaluate its performance.
Advisors/Committee Members: Zhong, Lin (advisor).
Subjects/Keywords: baseband processing; massive MIMO; parallel computing; 5G
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Ding, J. (2019). Software-based Baseband Processing for Massive MIMO. (Masters Thesis). Rice University. Retrieved from http://hdl.handle.net/1911/107406
Chicago Manual of Style (16th Edition):
Ding, Jian. “Software-based Baseband Processing for Massive MIMO.” 2019. Masters Thesis, Rice University. Accessed March 04, 2021.
http://hdl.handle.net/1911/107406.
MLA Handbook (7th Edition):
Ding, Jian. “Software-based Baseband Processing for Massive MIMO.” 2019. Web. 04 Mar 2021.
Vancouver:
Ding J. Software-based Baseband Processing for Massive MIMO. [Internet] [Masters thesis]. Rice University; 2019. [cited 2021 Mar 04].
Available from: http://hdl.handle.net/1911/107406.
Council of Science Editors:
Ding J. Software-based Baseband Processing for Massive MIMO. [Masters Thesis]. Rice University; 2019. Available from: http://hdl.handle.net/1911/107406

University of Toronto
27.
Willenberg, Ruediger.
Heterogeneous Runtime Support for Partitioned Global Address Space Programming on FPGAs.
Degree: PhD, 2016, University of Toronto
URL: http://hdl.handle.net/1807/77077
► We are presenting THeGASNet, a framework to provide remote memory communication and synchronization in heterogeneous, distributed systems composed of software components and FPGA components. It…
(more)
▼ We are presenting THeGASNet, a framework to provide remote memory communication and synchronization in heterogeneous, distributed systems composed of software components and FPGA components. It is intended as a runtime layer to support higher-level languages and libraries that implement the Partitioned Global Address Space(PGAS) model. PGAS is a shared memory
parallel programming model intended for high-productivity programming of distributed, cluster-like systems. THeGASNet provides a communication abstraction with a common API for both software and hardware components, thereby facilitating easier migration of performance-critical application portions to reconfigurable
computing hardware.
To demonstrate the development flow, we have implemented three applications representing common distributed application characteristics, starting with software-only solutions and using the common API to efficiently move selected parts into FPGA hardware. Based on the accumulated experience, we illustrate why PGAS is a good model to program heterogeneous systems using FPGAs, define minimum infrastructure requirements and outline a vision for continued exploration of heterogeneous PGAS programming.
Advisors/Committee Members: Chow, Paul, Electrical and Computer Engineering.
Subjects/Keywords: Accelerators; FPGAs; Heterogeneous; Parallel Computing; PGAS; 0464
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Willenberg, R. (2016). Heterogeneous Runtime Support for Partitioned Global Address Space Programming on FPGAs. (Doctoral Dissertation). University of Toronto. Retrieved from http://hdl.handle.net/1807/77077
Chicago Manual of Style (16th Edition):
Willenberg, Ruediger. “Heterogeneous Runtime Support for Partitioned Global Address Space Programming on FPGAs.” 2016. Doctoral Dissertation, University of Toronto. Accessed March 04, 2021.
http://hdl.handle.net/1807/77077.
MLA Handbook (7th Edition):
Willenberg, Ruediger. “Heterogeneous Runtime Support for Partitioned Global Address Space Programming on FPGAs.” 2016. Web. 04 Mar 2021.
Vancouver:
Willenberg R. Heterogeneous Runtime Support for Partitioned Global Address Space Programming on FPGAs. [Internet] [Doctoral dissertation]. University of Toronto; 2016. [cited 2021 Mar 04].
Available from: http://hdl.handle.net/1807/77077.
Council of Science Editors:
Willenberg R. Heterogeneous Runtime Support for Partitioned Global Address Space Programming on FPGAs. [Doctoral Dissertation]. University of Toronto; 2016. Available from: http://hdl.handle.net/1807/77077

University of Southern California
28.
Peng, Liu.
Parallelization framework for scientific application kernels
on multi-core/many-core platforms.
Degree: PhD, Computer Science, 2011, University of Southern California
URL: http://digitallibrary.usc.edu/cdm/compoundobject/collection/p15799coll127/id/624895/rec/4915
► The advent of multi-core/many-core paradigm has provided unprecedented computing power, and it is of great significance to develop a parallelization framework for various scientific applications…
(more)
▼ The advent of multi-core/many-core paradigm has
provided unprecedented
computing power, and it is of great
significance to develop a parallelization framework for various
scientific applications to harvest the
computing power. However, it
is a great challenge to design an efficient parallelization
framework that continues to scale on future architectures due to
the complexity of real-world applications and the variety of
multi-core/many-core platforms. ❧ To address this challenge, I
propose a hierarchical optimization framework that maps
applications to hardware by exploiting multiple levels of
parallelization: (1) Inter-node level parallelism via spatial
decomposition; (2) inter-core level parallelism via cellular
decomposition; and (3) single-instruction multiple-data (SIMD)
parallelization. The framework includes application-based SIMD
analysis and optimization, which allows application scientists to
determine whether their applications are viable for SIMDization and
provide various code transformation techniques to enhance the SIMD
efficiency as well as simple recipes when compiler
auto-vectorization fails. I also propose a suite of optimization
strategies to achieve ideal on-chip inter-core strong scalability
on emerging many-core architectures: (1) A divide-and-conquer
algorithm adaptive to local memory; (2) a novel data layout to
improve data locality; (3) on-chip locality-aware
parallel
algorithms to enhance data reuse; and (4) a pipeline algorithm
using data transfer agent to orchestrate computation and memory
operations to hide latency to shared memory. ❧ I have applied the
framework to three scientific applications, which represent most of
the numerical classes in the seven dwarfs (which are known to cover
most high performance
computing applications): (1) Stencil
computation, specifically lattice Boltzmann method (LBM)for fluid
flow simulation; (2) molecular dynamics (MD) simulation; and (3)
molecular fragment analysis via connected component detection. ❧ I
have achieved high inter-node, inter-core (multithreading), and
SIMD efficiency on various
computing platforms: (1) For LBM,
inter-node
parallel efficiency 0.978 on 131,072 BlueGene/P
processors, multithreading efficiency 0.882 on 6 cores of a Cell
BE, and SIMD efficiency 0.780 using 4-element vector registers of a
Cell BE; (2) for MD simulation, inter-node
parallel efficiency
0.985 on 106,496 BlueGene/L processors, and inter-core
multithreading
parallel efficiency 0.99 on the 64-core Godson-T
many-core architecture; (3) for molecular fragment analysis, nearly
linear inter-node strong scalability up to 50 million vertices
molecular graph on 32
computing nodes, and over 13-fold inter-core
speedup on 16 cores. In addition, a simple performance model based
on hierarchical parallelization is derived, which suggests that the
optimization scheme is likely to scale well toward exascale.
Furthermore, I have analyzed the impact of architectural features
on applications' performance to find that certain architectural
features are essential for these…
Advisors/Committee Members: Nakano, Aiichiro (Committee Chair), Prasanna, Viktor K. (Committee Member), Shing, Katherine S. (Committee Member).
Subjects/Keywords: multi/many core; parallel computing; scientific simulation
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Peng, L. (2011). Parallelization framework for scientific application kernels
on multi-core/many-core platforms. (Doctoral Dissertation). University of Southern California. Retrieved from http://digitallibrary.usc.edu/cdm/compoundobject/collection/p15799coll127/id/624895/rec/4915
Chicago Manual of Style (16th Edition):
Peng, Liu. “Parallelization framework for scientific application kernels
on multi-core/many-core platforms.” 2011. Doctoral Dissertation, University of Southern California. Accessed March 04, 2021.
http://digitallibrary.usc.edu/cdm/compoundobject/collection/p15799coll127/id/624895/rec/4915.
MLA Handbook (7th Edition):
Peng, Liu. “Parallelization framework for scientific application kernels
on multi-core/many-core platforms.” 2011. Web. 04 Mar 2021.
Vancouver:
Peng L. Parallelization framework for scientific application kernels
on multi-core/many-core platforms. [Internet] [Doctoral dissertation]. University of Southern California; 2011. [cited 2021 Mar 04].
Available from: http://digitallibrary.usc.edu/cdm/compoundobject/collection/p15799coll127/id/624895/rec/4915.
Council of Science Editors:
Peng L. Parallelization framework for scientific application kernels
on multi-core/many-core platforms. [Doctoral Dissertation]. University of Southern California; 2011. Available from: http://digitallibrary.usc.edu/cdm/compoundobject/collection/p15799coll127/id/624895/rec/4915

Clemson University
29.
Clevenger, Thomas Conrad.
A Parallel Geometric Multigrid Method for Adaptive Finite Elements.
Degree: PhD, Mathematical Sciences, 2019, Clemson University
URL: https://tigerprints.clemson.edu/all_dissertations/2523
► Applications in a variety of scientific disciplines use systems of Partial Differential Equations (PDEs) to model physical phenomena. Numerical solutions to these models are…
(more)
▼ Applications in a variety of scientific disciplines use systems of Partial Differential Equations (PDEs) to model physical phenomena. Numerical solutions to these models are often found using the Finite Element Method (FEM), where the problem is discretized and the solution of a large linear system is required, containing millions or even billions of unknowns. Often times, the domain of these solves will contain localized features that require very high resolution of the underlying finite element mesh to accurately solve, while a mesh with uniform resolution would require far too much computational time and memory overhead to be feasible on a modern machine. Therefore, techniques like adaptive mesh refinement, where one increases the resolution of the mesh only where it is necessary, must be used. Even with adaptive mesh refinement, these systems can still be on the order of much more than a million unknowns (large mantle convection applications like the ones in [90] show simulations on over 600 billion unknowns), and attempting to solve on a single processing unit is infeasible due to limited computational time and memory required. For this reason, any application code aimed at solving large problems must be built using a
parallel framework, allowing the concurrent use of multiple processing units to solve a single problem, and the code must exhibit efficient scaling to large amounts of processing units.
Multigrid methods are currently the only known optimal solvers for linear systems arising from discretizations of elliptic boundary valued problems. These methods can be represented as an iterative scheme with contraction number less than one, independent of the resolution of the discretization [24, 54, 25, 103], with optimal complexity in the number of unknowns in the system [29]. Geometric multigrid (GMG) methods, where the hierarchy of spaces are defined by linear systems of finite element discretizations on meshes of decreasing resolution, have been shown to be robust for many different problem formulations, giving mesh independent convergence for highly adaptive meshes [26, 61, 83, 18], but these methods require specific implementations for each type of equation, boundary condition, mesh, etc., required by the specific application. The implementation in a massively
parallel environment is not obvious, and research into this topic is far from exhaustive.
We present an implementation of a massively
parallel, adaptive geometric multigrid (GMG) method used in the open-source finite element library deal.II [5], and perform extensive tests showing scaling of the v-cycle application on systems with up to 137 billion unknowns run on up to 65,536 processors, and demonstrating low communication overhead of the algorithms proposed. We then show the flexibility of the GMG by applying the method to four different PDE systems: the Poisson equation, linear elasticity, advection-diffusion, and the Stokes equations. For the Stokes equations, we implement a fully matrix-free, adaptive, GMG-based solver…
Advisors/Committee Members: Timo Heister, Qingshan Chen, Leo Rebholz, Fei Xue.
Subjects/Keywords: Geometric multigrid; Parallel computing; Stokes equations
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Clevenger, T. C. (2019). A Parallel Geometric Multigrid Method for Adaptive Finite Elements. (Doctoral Dissertation). Clemson University. Retrieved from https://tigerprints.clemson.edu/all_dissertations/2523
Chicago Manual of Style (16th Edition):
Clevenger, Thomas Conrad. “A Parallel Geometric Multigrid Method for Adaptive Finite Elements.” 2019. Doctoral Dissertation, Clemson University. Accessed March 04, 2021.
https://tigerprints.clemson.edu/all_dissertations/2523.
MLA Handbook (7th Edition):
Clevenger, Thomas Conrad. “A Parallel Geometric Multigrid Method for Adaptive Finite Elements.” 2019. Web. 04 Mar 2021.
Vancouver:
Clevenger TC. A Parallel Geometric Multigrid Method for Adaptive Finite Elements. [Internet] [Doctoral dissertation]. Clemson University; 2019. [cited 2021 Mar 04].
Available from: https://tigerprints.clemson.edu/all_dissertations/2523.
Council of Science Editors:
Clevenger TC. A Parallel Geometric Multigrid Method for Adaptive Finite Elements. [Doctoral Dissertation]. Clemson University; 2019. Available from: https://tigerprints.clemson.edu/all_dissertations/2523

Georgia Tech
30.
Remley, Kyle E.
Development of methods for high performance computing applications of the deterministic stage of comet calculations.
Degree: PhD, Mechanical Engineering, 2016, Georgia Tech
URL: http://hdl.handle.net/1853/58610
► The Coarse Mesh Radiation Transport (COMET) method is a reactor physics method and code that has been used to solve whole core reactor eigenvalue and…
(more)
▼ The Coarse Mesh Radiation Transport (COMET) method is a reactor physics method and code that has been used to solve whole core reactor eigenvalue and flux distribution problems. A strength of the method is its formidable accuracy and computational efficiency. COMET solutions are computed to Monte Carlo accuracy on a single processor in a runtime that is several orders of magnitude faster than stochastic calculations. However, with the growing ubiquity of both shared and distributed memory
parallel machines and the desire to extend the method to allow for coupling to multiphysics and on-the-fly response generation, serial implementations of COMET calculations will become less desirable. It is under this motivation that an implementation for a
parallel execution of deterministic COMET calculations has been developed. COMET involves inner and outer iterations; inner iterations involve local calculations that can be carried out independently, making the algorithm amenable to parallelization. However, considerations for decomposing a problem and the distribution of data must be made. To allow for efficient
parallel implementation of a distributed algorithm, changes to response data access and sweep order are made, along with considerations for communications between processors. The
parallel code is implemented on several variants of the C5G7 benchmark problem to assess the scalability of the algorithm, and it is found that problems with larger numbers of coarse meshes increase the scalability of the code, which is an encouraging result. The code is further tested for full core reactor problems, where extremely efficient wall clock times (on the order of minutes) for solutions are achieved. Finally, application of the
parallel code to novel implementations of COMET (e.g., problems with high flux expansions) is discussed.
Advisors/Committee Members: Rahnema, Farzad (advisor), Petrovic, Bojan (committee member), Zhang, Dingkang (committee member), Morley, Tom (committee member), Haghighat, Alireza (committee member).
Subjects/Keywords: Coarse mesh transport; Parallel computing; Reactor physics
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Remley, K. E. (2016). Development of methods for high performance computing applications of the deterministic stage of comet calculations. (Doctoral Dissertation). Georgia Tech. Retrieved from http://hdl.handle.net/1853/58610
Chicago Manual of Style (16th Edition):
Remley, Kyle E. “Development of methods for high performance computing applications of the deterministic stage of comet calculations.” 2016. Doctoral Dissertation, Georgia Tech. Accessed March 04, 2021.
http://hdl.handle.net/1853/58610.
MLA Handbook (7th Edition):
Remley, Kyle E. “Development of methods for high performance computing applications of the deterministic stage of comet calculations.” 2016. Web. 04 Mar 2021.
Vancouver:
Remley KE. Development of methods for high performance computing applications of the deterministic stage of comet calculations. [Internet] [Doctoral dissertation]. Georgia Tech; 2016. [cited 2021 Mar 04].
Available from: http://hdl.handle.net/1853/58610.
Council of Science Editors:
Remley KE. Development of methods for high performance computing applications of the deterministic stage of comet calculations. [Doctoral Dissertation]. Georgia Tech; 2016. Available from: http://hdl.handle.net/1853/58610
◁ [1] [2] [3] [4] [5] … [36] ▶
.