You searched for subject:(high performance computing)
.
Showing records 1 – 30 of
899 total matches.
◁ [1] [2] [3] [4] [5] … [30] ▶
1.
Bowman, Clark Michael Riordan.
Data-Calibrated Modeling of Biological Soft Matter with
Dissipative Particle Dynamics and High-Performance Bayesian
Uncertainty Quantification.
Degree: Department of Applied Mathematics, 2018, Brown University
URL: https://repository.library.brown.edu/studio/item/bdr:792771/
► Fluids play a key role in the mechanics of cells and the soft-matter structures composing them; from the movement of molecules such as amino acids…
(more)
▼ Fluids play a key role in the mechanics of cells and
the soft-matter structures composing them; from the movement of
molecules such as amino acids via diffusion to the viscous forces
propelling flagellating organisms, accounting for the role of the
fluid environment is critical to understanding the mechanisms at
play. One of the major challenges in mathematical modeling of such
systems is developing an accurate, yet computationally feasible
representation of the fluid. Continuum approaches such as
Navier-Stokes neglect the thermal fluctuations and boundary effects
than can dominate flow behavior at the nanoscale, while explicit
models of the fluid via, e.g., molecular dynamics become
computationally infeasible for systems on the order of biological
cells which may have millions or billions of molecules. In this
thesis, we use dissipative particle dynamics, a particle method
which uses coarse-grained particles interacting with artificial
forces chosen to generate accurate fluid behavior, to model a
number of biofluidic systems of experimental interest, including
the diffusion of DNA in constrained environments, the mechanics of
poration in phospholipid membranes, and the mechanical force
profile of polymerizing actin networks in the cytoskeleton. The
simulation results are used to examine at the nanoscale the
dynamics underlying macroscopic behaviors observed in experiment,
answering a number of questions about the forces, energies,
motions, and mechanisms of biological soft matter with measurements
from the explicitly modeled fluid environment. We also introduce a
framework for
high-
performance Bayesian uncertainty quantification
and demonstrate an application to inferring structural properties
of lipids in a bilayer membrane, illustrating the feasibility of
data-driven model calibration for our complex dissipative particle
simulations using parallel
computing. Together, these methods allow
for a uniquely fine-scale look at the mechanics underpinning a
number of biologically relevant phenomena.
Advisors/Committee Members: Matzavinos, Anastasios (Advisor), Stein, Derek (Reader), Chaplain, Mark (Reader).
Subjects/Keywords: High performance computing
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Bowman, C. M. R. (2018). Data-Calibrated Modeling of Biological Soft Matter with
Dissipative Particle Dynamics and High-Performance Bayesian
Uncertainty Quantification. (Thesis). Brown University. Retrieved from https://repository.library.brown.edu/studio/item/bdr:792771/
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
Chicago Manual of Style (16th Edition):
Bowman, Clark Michael Riordan. “Data-Calibrated Modeling of Biological Soft Matter with
Dissipative Particle Dynamics and High-Performance Bayesian
Uncertainty Quantification.” 2018. Thesis, Brown University. Accessed March 04, 2021.
https://repository.library.brown.edu/studio/item/bdr:792771/.
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
MLA Handbook (7th Edition):
Bowman, Clark Michael Riordan. “Data-Calibrated Modeling of Biological Soft Matter with
Dissipative Particle Dynamics and High-Performance Bayesian
Uncertainty Quantification.” 2018. Web. 04 Mar 2021.
Vancouver:
Bowman CMR. Data-Calibrated Modeling of Biological Soft Matter with
Dissipative Particle Dynamics and High-Performance Bayesian
Uncertainty Quantification. [Internet] [Thesis]. Brown University; 2018. [cited 2021 Mar 04].
Available from: https://repository.library.brown.edu/studio/item/bdr:792771/.
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
Council of Science Editors:
Bowman CMR. Data-Calibrated Modeling of Biological Soft Matter with
Dissipative Particle Dynamics and High-Performance Bayesian
Uncertainty Quantification. [Thesis]. Brown University; 2018. Available from: https://repository.library.brown.edu/studio/item/bdr:792771/
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
2.
Rietmann, Max.
Local time stepping on high performance computing
architectures: mitigating CFL bottlenecks for large-scale wave
propagation.
Degree: 2015, Università della Svizzera italiana
URL: http://doc.rero.ch/record/255915
► Modeling problems that require the simulation of hyperbolic PDEs (wave equations) on large heterogeneous domains have potentially many bottlenecks. We attack this problem through two…
(more)
▼ Modeling problems that require the simulation of
hyperbolic PDEs (wave equations) on large heterogeneous domains
have potentially many bottlenecks. We attack this problem through
two techniques: the massively parallel capabilities of graphics
processors (GPUs) and local time stepping (LTS) to mitigate any CFL
bottlenecks on a multiscale mesh. Many modern supercomputing
centers are installing GPUs due to their
high performance, and
extending existing seismic wave-propagation software to use GPUs is
vitally important to give application scientists the highest
possible
performance. In addition to this architectural
optimization, LTS schemes avoid
performance losses in meshes with
localized areas of refinement. Coupled with the GPU
performance
optimizations, the derivation and implementation of an Newmark LTS
scheme enables next-generation
performance for real-world
applications. Included in this implementation is work addressing
the load-balancing problem inherent to multi-level LTS schemes,
enabling scalability to hundreds and thousands of CPUs and GPUs.
These GPU, LTS, and scaling optimizations accelerate the
performance of existing applications by a factor of 30 or more, and
enable future modeling scenarios previously made unfeasible by the
cost of standard explicit time-stepping schemes.
Advisors/Committee Members: Olaf (Dir.).
Subjects/Keywords: High performance computing
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Rietmann, M. (2015). Local time stepping on high performance computing
architectures: mitigating CFL bottlenecks for large-scale wave
propagation. (Thesis). Università della Svizzera italiana. Retrieved from http://doc.rero.ch/record/255915
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
Chicago Manual of Style (16th Edition):
Rietmann, Max. “Local time stepping on high performance computing
architectures: mitigating CFL bottlenecks for large-scale wave
propagation.” 2015. Thesis, Università della Svizzera italiana. Accessed March 04, 2021.
http://doc.rero.ch/record/255915.
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
MLA Handbook (7th Edition):
Rietmann, Max. “Local time stepping on high performance computing
architectures: mitigating CFL bottlenecks for large-scale wave
propagation.” 2015. Web. 04 Mar 2021.
Vancouver:
Rietmann M. Local time stepping on high performance computing
architectures: mitigating CFL bottlenecks for large-scale wave
propagation. [Internet] [Thesis]. Università della Svizzera italiana; 2015. [cited 2021 Mar 04].
Available from: http://doc.rero.ch/record/255915.
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
Council of Science Editors:
Rietmann M. Local time stepping on high performance computing
architectures: mitigating CFL bottlenecks for large-scale wave
propagation. [Thesis]. Università della Svizzera italiana; 2015. Available from: http://doc.rero.ch/record/255915
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation

Penn State University
3.
Frasca, Michael Robert.
Model-driven Memory Optimizations for High Performance Computing: From Caches to I/o.
Degree: 2012, Penn State University
URL: https://submit-etda.libraries.psu.edu/catalog/15096
► High performance systems are quickly evolving to keep pace with application demands, and we observe greater complexity in system design at all scales. Parallelism, in…
(more)
▼ High performance systems are quickly evolving to keep pace with application demands, and we observe greater complexity in system design at all scales. Parallelism, in its many forms, is a fundamental change agent of current system and software architecture, and the greatest source of power and
performance challenges. We understand that dynamic techniques are required to optimize computation in this environment and propose model-driven techniques to first understand
performance inefficiencies, then respond with online and adaptive mechanisms. In this thesis, we recognize that the parallelism employed creates contention within and throughout the memory hierarchy, and we therefore focus our analysis in this domain.
The memory hierarchy extends from on-chip caches through persistent storage in I/O subsystems, and we analyze and develop models of shared data and cache use to understand how parallel applications interact with hardware and why parallel scalability is often poor. Through this lens of these memory models, we develop dynamic optimization techniques for disparate layers of the memory hierarchy. For on-chip multi-core caches, we seek to improve data sharing characteristics for sparse
high performance algorithms. Our approach leverages model-driven insight to dynamically change inter-thread access behavior so that it efficiently maps to the given hardware topology. In the I/O subspace, we target the interference caused by concurrent applications accessing a shared storage caches. We design model-driven techniques to both isolate application behavior and dynamically alter inefficient caching policies.
Advisors/Committee Members: Padma Raghavan, Dissertation Advisor/Co-Advisor, Mahmut Taylan Kandemir, Committee Member, Bhuvan Urgaonkar, Committee Member, Jia Li, Committee Member.
Subjects/Keywords: High Performance Computing; Performance Optimization
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Frasca, M. R. (2012). Model-driven Memory Optimizations for High Performance Computing: From Caches to I/o. (Thesis). Penn State University. Retrieved from https://submit-etda.libraries.psu.edu/catalog/15096
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
Chicago Manual of Style (16th Edition):
Frasca, Michael Robert. “Model-driven Memory Optimizations for High Performance Computing: From Caches to I/o.” 2012. Thesis, Penn State University. Accessed March 04, 2021.
https://submit-etda.libraries.psu.edu/catalog/15096.
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
MLA Handbook (7th Edition):
Frasca, Michael Robert. “Model-driven Memory Optimizations for High Performance Computing: From Caches to I/o.” 2012. Web. 04 Mar 2021.
Vancouver:
Frasca MR. Model-driven Memory Optimizations for High Performance Computing: From Caches to I/o. [Internet] [Thesis]. Penn State University; 2012. [cited 2021 Mar 04].
Available from: https://submit-etda.libraries.psu.edu/catalog/15096.
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
Council of Science Editors:
Frasca MR. Model-driven Memory Optimizations for High Performance Computing: From Caches to I/o. [Thesis]. Penn State University; 2012. Available from: https://submit-etda.libraries.psu.edu/catalog/15096
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation

Rutgers University
4.
Qin, Yubo.
Exploring power-performance-quality tradeoffs for exascale combustion simulation.
Degree: MS, Electrical and Computer Engineering, 2016, Rutgers University
URL: https://rucore.libraries.rutgers.edu/rutgers-lib/50136/
► The computational demand of high-performance computing (HPC) applications has brought major changes to the HPC system architecture. As a result, it is now possible to…
(more)
▼ The computational demand of
high-
performance computing (HPC) applications has brought major changes to the HPC system architecture. As a result, it is now possible to run simulations faster and get more accurate results. But behind this, power and energy are becoming critical concerns for HPC systems, e.g. Titan’s electric cost is about $9 million per year. Energy efficiency has become a critical challenge for the exascale research challenges, and U.S. Department of Energy’s (DOE) gives the goal to achieve exascale
performance with a power budget of 20MW. Current research efforts have studied power and
performance tradeoffs, and how to balance these, e.g., using DVFS to meet power constraints, which significantly impacts
performance. However, scientific applications may not tolerate degradation in
performance and other tradeoffs need to be explored to meet power budgets, e.g., involving the application in making energy-
performance tradeoff decisions. In this research, we focus on studying the properties and exploring the
performance and power/energy tradeoffs of Low-Mach-Number Combustion (LMC) application which is an Adaptive Mesh Refinement (AMR) algorithm. Our experimental evaluation provides an empirical evaluation of different application configurations that gives insights into the power-
performance tradeoffs space for this LMC or AMR-based application workflows. The key contribution of this work is a better understanding of the running behavior of this AMR-based application and proposed a power-
performance tradeoff for this application, which can be used to better schedule power budgets across HPC systems.
Advisors/Committee Members: Rodero, Ivan (chair).
Subjects/Keywords: Combustion; High performance computing
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Qin, Y. (2016). Exploring power-performance-quality tradeoffs for exascale combustion simulation. (Masters Thesis). Rutgers University. Retrieved from https://rucore.libraries.rutgers.edu/rutgers-lib/50136/
Chicago Manual of Style (16th Edition):
Qin, Yubo. “Exploring power-performance-quality tradeoffs for exascale combustion simulation.” 2016. Masters Thesis, Rutgers University. Accessed March 04, 2021.
https://rucore.libraries.rutgers.edu/rutgers-lib/50136/.
MLA Handbook (7th Edition):
Qin, Yubo. “Exploring power-performance-quality tradeoffs for exascale combustion simulation.” 2016. Web. 04 Mar 2021.
Vancouver:
Qin Y. Exploring power-performance-quality tradeoffs for exascale combustion simulation. [Internet] [Masters thesis]. Rutgers University; 2016. [cited 2021 Mar 04].
Available from: https://rucore.libraries.rutgers.edu/rutgers-lib/50136/.
Council of Science Editors:
Qin Y. Exploring power-performance-quality tradeoffs for exascale combustion simulation. [Masters Thesis]. Rutgers University; 2016. Available from: https://rucore.libraries.rutgers.edu/rutgers-lib/50136/

Georgia Tech
5.
Clay, Matthew Paul.
Strained turbulence and low-diffusivity turbulent mixing using high performance computing.
Degree: PhD, Aerospace Engineering, 2017, Georgia Tech
URL: http://hdl.handle.net/1853/60184
► In this thesis, turbulent flows are studied using the method of direct numerical simulation (DNS), whereby exact governing equations are computed without modeling. Beginning with…
(more)
▼ In this thesis, turbulent flows are studied using the method of direct numerical simulation (DNS), whereby exact governing equations are computed without modeling. Beginning with isotropic turbulence and turbulent mixing under axisymmetric contraction, comparisons with experiments are made by directly modeling strain rates from wind tunnel facilities in the DNS. The simulations reproduce key findings from the experiments for the evolution of the one-dimensional component velocity spectra, which are strongly influenced by spectral transfer and pressure-strain mechanisms following the contraction. For simulations of low-diffusivity (i.e.,
high Schmidt number) turbulent mixing in isotropic turbulence, the increased resolution requirements of the Batchelor scales are addressed by adopting a dual-grid dual-scheme numerical approach. The one-way coupling of the velocity and passive scalar fields, along with their disparate resolution requirements at
high Schmidt number, are exploited in the design of the parallel code by
computing each field separately in disjoint message passing communicators. Good scalability of the code up to O(10
5) cores on machines at multiple national supercomputer centers is maintained by overlapping communication and computation through extensive use of shared-memory programming, both in homogeneous and heterogeneous (i.e., GPU-accelerated)
computing environments. Simulations of passive scalars maintained under a uniform mean gradient in forced isotropic turbulence are conducted. The highest grid resolution employed is 8192
3 (0.5 trillion) for a scalar of Schmidt number 512, which is comparable to salinity mixing in the ocean. The results give strong support to the emergence of Batchelor scaling in the scalar spectrum and an approach toward local isotropy with increasing Schmidt number.
Advisors/Committee Members: Yeung, P. K. (advisor), Aidun, Cyrus K. (committee member), Ranjan, Devesh (committee member), Smith, Marilyn J. (committee member), Chow, Edmond (committee member).
Subjects/Keywords: Turbulence; DNS; High performance computing
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Clay, M. P. (2017). Strained turbulence and low-diffusivity turbulent mixing using high performance computing. (Doctoral Dissertation). Georgia Tech. Retrieved from http://hdl.handle.net/1853/60184
Chicago Manual of Style (16th Edition):
Clay, Matthew Paul. “Strained turbulence and low-diffusivity turbulent mixing using high performance computing.” 2017. Doctoral Dissertation, Georgia Tech. Accessed March 04, 2021.
http://hdl.handle.net/1853/60184.
MLA Handbook (7th Edition):
Clay, Matthew Paul. “Strained turbulence and low-diffusivity turbulent mixing using high performance computing.” 2017. Web. 04 Mar 2021.
Vancouver:
Clay MP. Strained turbulence and low-diffusivity turbulent mixing using high performance computing. [Internet] [Doctoral dissertation]. Georgia Tech; 2017. [cited 2021 Mar 04].
Available from: http://hdl.handle.net/1853/60184.
Council of Science Editors:
Clay MP. Strained turbulence and low-diffusivity turbulent mixing using high performance computing. [Doctoral Dissertation]. Georgia Tech; 2017. Available from: http://hdl.handle.net/1853/60184

Georgia Tech
6.
Schieber, Matthew Cole.
Optimizing computational kernels in quantum chemistry.
Degree: MS, Computational Science and Engineering, 2018, Georgia Tech
URL: http://hdl.handle.net/1853/59949
► Density fitting is a rank reduction technique popularly used in quantum chemistry in order to reduce the computational cost of evaluating, transforming, and processing the…
(more)
▼ Density fitting is a rank reduction technique popularly used in quantum chemistry in
order to reduce the computational cost of evaluating, transforming, and processing the
4-center electron repulsion integrals (ERIs). By utilizing the resolution of the identity technique, density fitting reduces the 4-center ERIs into a 3-center form. Doing so not only alleviates the
high storage cost of the ERIs, but it also reduces the computational cost of operations involving them. Still, these operations can remain as computational bottlenecks which commonly plague quantum chemistry procedures. The goal of this thesis is to investigate various optimizations for density-fitted version of computational kernels used ubiquitously throughout quantum chemistry. First, we detail the spatial sparsity available to the 3-center integrals and the application of such sparsity to various operations, including integral computation, metric contractions, and integral transformations. Next, we investigate sparse memory layouts and their implication on the
performance of the integral transformation kernel. Next, we analyze two transformation algorithms and how their
performance will vary depending on the context in which they are used. Then, we propose two sparse memory layouts and the resulting
performance of Coulomb and exchange evaluations. Since the memory required for these tensors grows rapidly, we frame these discussions in the context of their in-core and disk
performance. We implement these methods in the P SI 4 electronic structure package and reveal the optimal algorithm for the kernel should vary depending on whether a disk-based implementation must be used.
Advisors/Committee Members: Sherrill, Charles D. (advisor), Chow, Edmond (committee member), McDaniel, Jesse (committee member).
Subjects/Keywords: Quantum chemistry; Parallel computing; High performance computing
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Schieber, M. C. (2018). Optimizing computational kernels in quantum chemistry. (Masters Thesis). Georgia Tech. Retrieved from http://hdl.handle.net/1853/59949
Chicago Manual of Style (16th Edition):
Schieber, Matthew Cole. “Optimizing computational kernels in quantum chemistry.” 2018. Masters Thesis, Georgia Tech. Accessed March 04, 2021.
http://hdl.handle.net/1853/59949.
MLA Handbook (7th Edition):
Schieber, Matthew Cole. “Optimizing computational kernels in quantum chemistry.” 2018. Web. 04 Mar 2021.
Vancouver:
Schieber MC. Optimizing computational kernels in quantum chemistry. [Internet] [Masters thesis]. Georgia Tech; 2018. [cited 2021 Mar 04].
Available from: http://hdl.handle.net/1853/59949.
Council of Science Editors:
Schieber MC. Optimizing computational kernels in quantum chemistry. [Masters Thesis]. Georgia Tech; 2018. Available from: http://hdl.handle.net/1853/59949

Penn State University
7.
Park, Jeonghyung.
Reuse distance models for accelerating scientific computing workloads on multicore processors.
Degree: 2015, Penn State University
URL: https://submit-etda.libraries.psu.edu/catalog/26411
► As the number of cores increase in chip multiprocessor microarchitecture (CMP) or multicores, we often observe performance degradation due to complex memory behavior on such…
(more)
▼ As the number of cores increase in chip multiprocessor microarchitecture (CMP)
or multicores, we often observe
performance degradation due to complex memory
behavior on such systems. To mitigate such �inefficiencies, we develop schemes that
can be used to characterize and improve the memory behavior of a multicore node
for scientific
computing applications that require
high performance.
We leverage the fact that such scientific
computing applications often comprise
code blocks that are repeated, leading to certain periodic properties. We conjecture
that their periodic properties and their observable impacts on cache
performance
can be characterized in �sufficient detail by simple 'alpha + beta*sine'models. Additionally,
starting from such a model of the observable reuse distances, we develop
a predictive cache miss model, followed by appropriate extensions for predictive
capability in the presence of interference.
We consider the utilization of our reuse distance and cache miss models for accelerating
scientific workloads on multicore system. We use our cache miss model
to determine a set of preferred applications to be co-scheduled with a given application
to minimize
performance degradation from interference. Further, we propose
a reuse distance reducing ordering that improves the
performance of Laplacian
mesh smoothing. We reorder mesh vertices based on the initial quality for each
node and its neighboring nodes so that we can improve both temporal and spatial
localities. The reordering results show that 38.75% of
performance improvement
of Laplacian mesh smoothing can be obtained by our reuse distance reducing ordering
when running on a single core. 75x of speedup is obtained when scaling up
to 32 cores.
Advisors/Committee Members: Padma Raghavan, Dissertation Advisor/Co-Advisor, Padma Raghavan, Committee Chair/Co-Chair, Mahmut Taylan Kandemir, Committee Member, Kamesh Madduri, Committee Member, Christopher J Duffy, Committee Member.
Subjects/Keywords: high performance computing; scientific computing; parallel computing; performance optimization
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Park, J. (2015). Reuse distance models for accelerating scientific computing workloads on multicore processors. (Thesis). Penn State University. Retrieved from https://submit-etda.libraries.psu.edu/catalog/26411
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
Chicago Manual of Style (16th Edition):
Park, Jeonghyung. “Reuse distance models for accelerating scientific computing workloads on multicore processors.” 2015. Thesis, Penn State University. Accessed March 04, 2021.
https://submit-etda.libraries.psu.edu/catalog/26411.
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
MLA Handbook (7th Edition):
Park, Jeonghyung. “Reuse distance models for accelerating scientific computing workloads on multicore processors.” 2015. Web. 04 Mar 2021.
Vancouver:
Park J. Reuse distance models for accelerating scientific computing workloads on multicore processors. [Internet] [Thesis]. Penn State University; 2015. [cited 2021 Mar 04].
Available from: https://submit-etda.libraries.psu.edu/catalog/26411.
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
Council of Science Editors:
Park J. Reuse distance models for accelerating scientific computing workloads on multicore processors. [Thesis]. Penn State University; 2015. Available from: https://submit-etda.libraries.psu.edu/catalog/26411
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation

Texas State University – San Marcos
8.
Saha, Biplab Kumar.
Towards a Framework for Automating the Workflow for Building Machine Learning Based Performance Tuning.
Degree: MS, Computer Science, 2016, Texas State University – San Marcos
URL: https://digital.library.txstate.edu/handle/10877/6343
► Recent interest in machine learning-based methods have produced many sophisticated models for performance modeling and optimi:,ation. These models tend to be sensitive to architectural parameters…
(more)
▼ Recent interest in machine learning-based methods have produced many sophisticated models for
performance modeling and optimi:,ation. These models tend to be sensitive to architectural parameters and are most effective when trained on the target platform. Training of these models; however; is a fairly involved process and requires knowledge of statistics and machine learning that the end-users of such models may not possess. This paper presents a framework for automatically generating machine learning-based
performance models. Leveraging existing open-source software; we provide a tool-chain that provides automated mechanisms for sample generation; dynamic feature extraction; feature selection; data labeling; validation and model selection. We describe the design of the framework and demonstrate its effectiveness by developing a learning heuristic for register allocation of GPU kernels. The results show the newly created models are accurate and can predict register caps that lead to substantial improvements in execution time without incurring a penalty in power consumption.
Advisors/Committee Members: Qasem, Apan (advisor), Ekstrand, Michael (committee member), Metsis, Vangelis (committee member).
Subjects/Keywords: High performance computing; Machine learning; High performance computing; Machine learning
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Saha, B. K. (2016). Towards a Framework for Automating the Workflow for Building Machine Learning Based Performance Tuning. (Masters Thesis). Texas State University – San Marcos. Retrieved from https://digital.library.txstate.edu/handle/10877/6343
Chicago Manual of Style (16th Edition):
Saha, Biplab Kumar. “Towards a Framework for Automating the Workflow for Building Machine Learning Based Performance Tuning.” 2016. Masters Thesis, Texas State University – San Marcos. Accessed March 04, 2021.
https://digital.library.txstate.edu/handle/10877/6343.
MLA Handbook (7th Edition):
Saha, Biplab Kumar. “Towards a Framework for Automating the Workflow for Building Machine Learning Based Performance Tuning.” 2016. Web. 04 Mar 2021.
Vancouver:
Saha BK. Towards a Framework for Automating the Workflow for Building Machine Learning Based Performance Tuning. [Internet] [Masters thesis]. Texas State University – San Marcos; 2016. [cited 2021 Mar 04].
Available from: https://digital.library.txstate.edu/handle/10877/6343.
Council of Science Editors:
Saha BK. Towards a Framework for Automating the Workflow for Building Machine Learning Based Performance Tuning. [Masters Thesis]. Texas State University – San Marcos; 2016. Available from: https://digital.library.txstate.edu/handle/10877/6343

University of Tasmania
9.
Atkinson, AK.
Tupleware: a distributed tuple space for the development and execution of array-based applications in a cluster computing environment.
Degree: 2010, University of Tasmania
URL: https://eprints.utas.edu.au/9996/1/Alistair_Atkinson_PhD_Thesis.pdf
► This thesis describes Tupleware, an implementation of a distributed tuple space which acts as a scalable and efficient cluster middleware for computationally intensive numerical and…
(more)
▼ This thesis describes Tupleware, an implementation of a distributed tuple space which acts as a scalable and efficient cluster middleware for computationally intensive numerical and scientific applications. Tupleware is based on the Linda coordination language (Gelernter 1985), and incorporates additional techniques such as peer-to-peer communications and exploitation of data locality in order to address problems such as scalability and performance, which are commonly encountered by traditional centralised tuple space implementations.
Tupleware is implemented in such as way that, whilepr ocessing is taking place, all communication between cluster nodes is decentralised in a peer-to-peer fashion. Communication events are initiated by a node requesting a tuple which is located on a remote node, and in order to make tuple retrieval as efficient as possible, a tuple
search algorithm is used to minimise the number of communication instances required to retrieve a remote tuple. This algorithm is based on the locality of a remote
tuple and the success of previous remote tuple requests. As Tupleware is targetted at numerical applications which generally involve the partitioning and processing of 1-D or 2-D arrays, the locality of a remote tuple can generally be determined as being located on one of a small number nodes which are processing neighbouring partitions of the array.
Furthermore, unlike some other distributed tuple space implementations, Tupleware does not burden the programmer with any additional complexity due to this distribution. At the application level, the Tupleware middleware behaves exactly like a centralised tuple space, and provides much greater flexibility with regards to where components of a system are executed.
The design and implementation of Tupleware is described, and placed in the context of other distributed tuple space implementations, along with the specific requirements of the applications that the system caters for. Finally, Tupleware is evaluated using several numerical and/or scientific applications, which show it to provide a sufficient level of scalability for a broad range tasks.
The main contribution of this work is the identification of techniques which enable a tuple space to be efficiently and transparently distributed across the nodes in a cluster. Central to this is the use of an algorithm for tuple retrieval which minimises the number of communication instances which occur during system execution. Distribution transparency is ensured by the provision of a simple interface to the underlying system, so that the distributed tuple space appears to the programmer as a single unified resource.
It is hoped that this research in some way furthers the adoption of the tuple space programming model for distributed computing, by enhancing its ability to
provide improved performance, scalability, flexibility and simplicity for a range of applications not traditionally suited to tuple space based systems.
Subjects/Keywords: Distributed computing; parallel computing; concurrency; high-performance computing; tuple space.
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Atkinson, A. (2010). Tupleware: a distributed tuple space for the development and execution of array-based applications in a cluster computing environment. (Thesis). University of Tasmania. Retrieved from https://eprints.utas.edu.au/9996/1/Alistair_Atkinson_PhD_Thesis.pdf
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
Chicago Manual of Style (16th Edition):
Atkinson, AK. “Tupleware: a distributed tuple space for the development and execution of array-based applications in a cluster computing environment.” 2010. Thesis, University of Tasmania. Accessed March 04, 2021.
https://eprints.utas.edu.au/9996/1/Alistair_Atkinson_PhD_Thesis.pdf.
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
MLA Handbook (7th Edition):
Atkinson, AK. “Tupleware: a distributed tuple space for the development and execution of array-based applications in a cluster computing environment.” 2010. Web. 04 Mar 2021.
Vancouver:
Atkinson A. Tupleware: a distributed tuple space for the development and execution of array-based applications in a cluster computing environment. [Internet] [Thesis]. University of Tasmania; 2010. [cited 2021 Mar 04].
Available from: https://eprints.utas.edu.au/9996/1/Alistair_Atkinson_PhD_Thesis.pdf.
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
Council of Science Editors:
Atkinson A. Tupleware: a distributed tuple space for the development and execution of array-based applications in a cluster computing environment. [Thesis]. University of Tasmania; 2010. Available from: https://eprints.utas.edu.au/9996/1/Alistair_Atkinson_PhD_Thesis.pdf
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation

Cape Peninsula University of Technology
10.
Killian, Rudi.
Dynamic superscalar grid for technical debt reduction
.
Degree: 2018, Cape Peninsula University of Technology
URL: http://etd.cput.ac.za/handle/20.500.11838/2726
► Organizations and the private individual, look to technology advancements to increase their ability to make informed decisions. The motivation for technology adoption by entities sprouting…
(more)
▼ Organizations and the private individual, look to technology advancements to increase their ability to make informed decisions. The motivation for technology adoption by entities sprouting from an innate need for value generation. The technology currently heralded as the future platform to facilitate value addition, is popularly termed cloud computing. The move to cloud computing however, may conceivably increase the obsolescence cycle for currently retained Information Technology (IT) assets. The term obsolescence, applied as the inability to repurpose or scale an information system resource for needed functionality. The incapacity to reconfigure, grow or shrink an IT asset, be it hardware or software is a well-known narrative of technical debt. The notion of emergent technical debt realities is professed to be all but inevitable when informed by Moore’s Law, as technology must inexorably advance. Of more imminent concern however are that major accelerating factors of technical debt are deemed as non-holistic conceptualization and design conventions. Should management of IT assets fail to address technical debt continually, the technology platform would predictably require replacement. The unrealized value, functional and fiscal loss, together with the resultant e-waste generated by technical debt is meaningfully unattractive. Historically, the cloud milieu had evolved from the grid and clustering paradigms which allowed for information sourcing across multiple and often dispersed computing platforms. The parallel operations in distributed computing environments are inherently value adding, as enhanced effective use of resources and efficiency in data handling may be achieved. The predominant information processing solutions that implement parallel operations in distributed environments are abstracted constructs, styled as High Performance Computing (HPC) or High Throughput Computing (HTC). Regardless of the underlying distributed environment, the archetypes of HPC and HTC differ radically in standard implementation. The foremost contrasting factors of parallelism granularity, failover and locality in data handling have recently been the subject of greater academic discourse towards possible fusion of the two technologies. In this research paper, we uncover probable platforms of future technical debt and subsequently recommend redeployment alternatives. The suggested alternatives take the form of scalable grids, which should provide alignment with the contemporary nature of individual information processing needs. The potential of grids, as efficient and effective information sourcing solutions across geographically dispersed heterogeneous systems are envisioned to reduce or delay aspects of technical debt. As part of an experimental investigation to test plausibility of concepts, artefacts are designed to generically implement HPC and HTC. The design features exposed by the experimental artefacts, could provide insights towards amalgamation of HPC and HTC.
Subjects/Keywords: Computational grids (Computer systems);
Cloud computing;
High performance computing;
Heterogeneous computing
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Killian, R. (2018). Dynamic superscalar grid for technical debt reduction
. (Thesis). Cape Peninsula University of Technology. Retrieved from http://etd.cput.ac.za/handle/20.500.11838/2726
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
Chicago Manual of Style (16th Edition):
Killian, Rudi. “Dynamic superscalar grid for technical debt reduction
.” 2018. Thesis, Cape Peninsula University of Technology. Accessed March 04, 2021.
http://etd.cput.ac.za/handle/20.500.11838/2726.
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
MLA Handbook (7th Edition):
Killian, Rudi. “Dynamic superscalar grid for technical debt reduction
.” 2018. Web. 04 Mar 2021.
Vancouver:
Killian R. Dynamic superscalar grid for technical debt reduction
. [Internet] [Thesis]. Cape Peninsula University of Technology; 2018. [cited 2021 Mar 04].
Available from: http://etd.cput.ac.za/handle/20.500.11838/2726.
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
Council of Science Editors:
Killian R. Dynamic superscalar grid for technical debt reduction
. [Thesis]. Cape Peninsula University of Technology; 2018. Available from: http://etd.cput.ac.za/handle/20.500.11838/2726
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation

Penn State University
11.
Shantharam, Manu.
Algorithmic Approaches for Enhancing Speedup, Energy and Resiliency Measures of Sparse Scientific Computations.
Degree: 2012, Penn State University
URL: https://submit-etda.libraries.psu.edu/catalog/15740
► High performance computing systems have increasingly complex node and network architectures including non-uniform memory subsystems, heterogeneous processors and hierarchical interconnects. The performance of scientific applications…
(more)
▼ High performance computing systems have increasingly complex node and network architectures including non-uniform memory subsystems, heterogeneous processors and hierarchical interconnects. The
performance of scientific applications that run on such systems depends on several factors including memory access pattern, memory bandwidth, load balancing and resiliency. Consequently, optimizing the
performance of scientific applications for
high performance computing systems is challenging. We seek to address this challenge for applications involving sparse scientific computations because such computations form the basis for solving many large-scale models of physical phenomena. In this thesis, we seek to understand the interplay between sparse scientific applications and hardware, and develop algorithmic approaches to improve
performance measures and resiliency for sparse scientific computations.
We organize the thesis into two parts. The first part concerns developing algorithmic approaches to enhance the
performance of sparse scientific computations and has two main contributions: (i) a new sparse matrix representation and a corresponding sparse matrix vector multiplication (SpMV) algorithm that enhances
performance of the SpMV operation and (ii) speedup-aware processor partitioning algorithms to manage sparse scientific workloads efficiently. The second part concerns analyzing the impact of transient errors on sparse scientific
computing and developing a fault tolerant algorithm, and has two main results: (i) characterizing the impact of a single transient error on iterative methods and (ii) a new sparse checksum encoded algorithm-based fault tolerance technique for the preconditioned conjugate gradients method.
In Chapter 2, we focus on SpMV, which is at the heart of many scientific applications involving sparse linear system solution. We develop a new sparse matrix representation and a corresponding SpMV algorithm that exploits the dense substructures that are inherently present in many sparse matrices derived from partial differential equation models. We show that our SpMV algorithm reduces the total number of load operations and enhances locality in accesses to the vector, consequently, improving the SpMV
performance on average by a third compared to the traditional compressed sparse row scheme on the Intel Nehalem processor.
In Chapter 3, we consider improving the
performance sparse scientific workloads that are commonly executed on
high performance computing systems. We observe many applications in such workloads do not scale linearly with the number of cores, providing diminishing gains in execution time for larger numbers of cores. For a workload comprising multiple such applications, it is beneficial from system perspective to reduce system energy consumption and decrease workload completion time. We develop speedup-aware processor partitioning algorithms that exploit individual application scaling features and optimize processor allocations per application. Our results indicate that the speedup-aware…
Advisors/Committee Members: Padma Raghavan, Dissertation Advisor/Co-Advisor, Mahmut Taylan Kandemir, Committee Member, Mary Jane Irwin, Committee Member, Jia Li, Committee Member.
Subjects/Keywords: Sparse scientific computing; high performance computing; performance analysis and modeling
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Shantharam, M. (2012). Algorithmic Approaches for Enhancing Speedup, Energy and Resiliency Measures of Sparse Scientific Computations. (Thesis). Penn State University. Retrieved from https://submit-etda.libraries.psu.edu/catalog/15740
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
Chicago Manual of Style (16th Edition):
Shantharam, Manu. “Algorithmic Approaches for Enhancing Speedup, Energy and Resiliency Measures of Sparse Scientific Computations.” 2012. Thesis, Penn State University. Accessed March 04, 2021.
https://submit-etda.libraries.psu.edu/catalog/15740.
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
MLA Handbook (7th Edition):
Shantharam, Manu. “Algorithmic Approaches for Enhancing Speedup, Energy and Resiliency Measures of Sparse Scientific Computations.” 2012. Web. 04 Mar 2021.
Vancouver:
Shantharam M. Algorithmic Approaches for Enhancing Speedup, Energy and Resiliency Measures of Sparse Scientific Computations. [Internet] [Thesis]. Penn State University; 2012. [cited 2021 Mar 04].
Available from: https://submit-etda.libraries.psu.edu/catalog/15740.
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
Council of Science Editors:
Shantharam M. Algorithmic Approaches for Enhancing Speedup, Energy and Resiliency Measures of Sparse Scientific Computations. [Thesis]. Penn State University; 2012. Available from: https://submit-etda.libraries.psu.edu/catalog/15740
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation

Louisiana State University
12.
Fang, Ye.
Scheduling and Tuning Kernels for High-performance on Heterogeneous Processor Systems.
Degree: PhD, Electrical and Computer Engineering, 2016, Louisiana State University
URL: etd-01132017-192355
;
https://digitalcommons.lsu.edu/gradschool_dissertations/4233
► Accelerated parallel computing techniques using devices such as GPUs and Xeon Phis (along with CPUs) have proposed promising solutions of extending the cutting edge of…
(more)
▼ Accelerated parallel computing techniques using devices such as GPUs
and Xeon Phis (along with CPUs) have proposed promising solutions of
extending the cutting edge of high-performance computer systems. A
significant performance improvement can be achieved when suitable
workloads are handled by the accelerator. Traditional CPUs can handle
those workloads not well suited for accelerators. Combination of
multiple types of processors in a single computer system is referred
to as a heterogeneous system.
This dissertation addresses tuning and scheduling issues in
heterogeneous systems. The first section presents work on tuning
scientific workloads on three different types of processors:
multi-core CPU, Xeon Phi massively parallel processor, and NVIDIA GPU;
common tuning methods and platform-specific tuning techniques are
presented. Then, analysis is done to demonstrate the performance
characteristics of the heterogeneous system on different input data.
This section of the dissertation is part of the GeauxDock project,
which prototyped a few state-of-art bioinformatics algorithms, and
delivered a fast molecular docking program.
The second section of this work studies the performance model of the
GeauxDock computing kernel. Specifically, the work presents an
extraction of features from the input data set and the target systems,
and then uses various regression models to calculate the perspective
computation time. This helps understand why a certain processor is
faster for certain sets of tasks. It also provides the essential
information for scheduling on heterogeneous systems.
In addition, this dissertation investigates a high-level task
scheduling framework for heterogeneous processor systems in which,
the pros and cons of using different heterogeneous processors can
complement each other. Thus a higher performance can be achieve on
heterogeneous computing systems. A new scheduling algorithm with four
innovations is presented: Ranked Opportunistic Balancing (ROB),
Multi-subject Ranking (MR), Multi-subject Relative Ranking (MRR), and
Automatic Small Tasks Rearranging (ASTR). The new algorithm
consistently outperforms previously proposed algorithms with better
scheduling results, lower computational complexity, and more
consistent results over a range of performance prediction errors.
Finally, this work extends the heterogeneous task scheduling algorithm
to handle power capping feature. It demonstrates that a power-aware
scheduler significantly improves the power efficiencies and saves the
energy consumption. This suggests that, in addition to performance
benefits, heterogeneous systems may have certain advantages on overall
power efficiency.
Subjects/Keywords: machine learning; task scheduling; heterogeneous computing; performance optimizations; high-performance computing
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Fang, Y. (2016). Scheduling and Tuning Kernels for High-performance on Heterogeneous Processor Systems. (Doctoral Dissertation). Louisiana State University. Retrieved from etd-01132017-192355 ; https://digitalcommons.lsu.edu/gradschool_dissertations/4233
Chicago Manual of Style (16th Edition):
Fang, Ye. “Scheduling and Tuning Kernels for High-performance on Heterogeneous Processor Systems.” 2016. Doctoral Dissertation, Louisiana State University. Accessed March 04, 2021.
etd-01132017-192355 ; https://digitalcommons.lsu.edu/gradschool_dissertations/4233.
MLA Handbook (7th Edition):
Fang, Ye. “Scheduling and Tuning Kernels for High-performance on Heterogeneous Processor Systems.” 2016. Web. 04 Mar 2021.
Vancouver:
Fang Y. Scheduling and Tuning Kernels for High-performance on Heterogeneous Processor Systems. [Internet] [Doctoral dissertation]. Louisiana State University; 2016. [cited 2021 Mar 04].
Available from: etd-01132017-192355 ; https://digitalcommons.lsu.edu/gradschool_dissertations/4233.
Council of Science Editors:
Fang Y. Scheduling and Tuning Kernels for High-performance on Heterogeneous Processor Systems. [Doctoral Dissertation]. Louisiana State University; 2016. Available from: etd-01132017-192355 ; https://digitalcommons.lsu.edu/gradschool_dissertations/4233

Utah State University
13.
Steven Monteiro, Steena Dominica.
Statistical Techniques to Model and Optimize Performance of Scientific, Numerically Intensive Workloads.
Degree: PhD, Computer Science, 2016, Utah State University
URL: https://digitalcommons.usu.edu/etd/5228
► Projecting performance of applications and hardware is important to several market segments—hardware designers, software developers, supercomputing centers, and end users. Hardware designers estimate performance…
(more)
▼ Projecting
performance of applications and hardware is important to several market segments—hardware designers, software developers, supercomputing centers, and end users. Hardware designers estimate
performance of current applications on future systems when designing new hardware. Software developers make
performance estimates to evaluate
performance of their code on different architectures and input datasets. Supercomputing centers try to optimize the process of matching
computing resources to
computing needs. End users requesting time on supercomputers must provide estimates of their application’s run time, and incorrect estimates can lead to wasted supercomputing resources and time. However, application
performance is challenging to predict because it is affected by several factors in application code, specifications of system hardware, choice of compilers, compiler flags, and libraries.
This dissertation uses statistical techniques to model and optimize
performance of scientific applications across different computer processors. The first study in this research offers statistical models that predict
performance of an application across different input datasets prior to application execution. These models guide end users to select parameters that produce optimal application
performance during execution. The second study offers a suite of statistical models that predict
performance of a new application on a new processor. Both studies present statistical techniques that can be generalized to analyze, optimize, and predict
performance of diverse computation- and data-intensive applications on different hardware.
Advisors/Committee Members: Amanda Lee Hughes, ;.
Subjects/Keywords: performance prediction; performance analysis; performance modeling; high performance computing; Computer Sciences
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Steven Monteiro, S. D. (2016). Statistical Techniques to Model and Optimize Performance of Scientific, Numerically Intensive Workloads. (Doctoral Dissertation). Utah State University. Retrieved from https://digitalcommons.usu.edu/etd/5228
Chicago Manual of Style (16th Edition):
Steven Monteiro, Steena Dominica. “Statistical Techniques to Model and Optimize Performance of Scientific, Numerically Intensive Workloads.” 2016. Doctoral Dissertation, Utah State University. Accessed March 04, 2021.
https://digitalcommons.usu.edu/etd/5228.
MLA Handbook (7th Edition):
Steven Monteiro, Steena Dominica. “Statistical Techniques to Model and Optimize Performance of Scientific, Numerically Intensive Workloads.” 2016. Web. 04 Mar 2021.
Vancouver:
Steven Monteiro SD. Statistical Techniques to Model and Optimize Performance of Scientific, Numerically Intensive Workloads. [Internet] [Doctoral dissertation]. Utah State University; 2016. [cited 2021 Mar 04].
Available from: https://digitalcommons.usu.edu/etd/5228.
Council of Science Editors:
Steven Monteiro SD. Statistical Techniques to Model and Optimize Performance of Scientific, Numerically Intensive Workloads. [Doctoral Dissertation]. Utah State University; 2016. Available from: https://digitalcommons.usu.edu/etd/5228

Georgia Tech
14.
Witte, Philipp Andre.
Software and algorithms for large-scale seismic inverse problems.
Degree: PhD, Computational Science and Engineering, 2020, Georgia Tech
URL: http://hdl.handle.net/1853/62754
► Seismic imaging and parameter estimation are an import class of inverse problems with practical relevance in resource exploration, carbon control and monitoring systems for geohazards.…
(more)
▼ Seismic imaging and parameter estimation are an import class of inverse problems with practical relevance in resource exploration, carbon control and monitoring systems for geohazards. Seismic inverse problems involve solving a large number of partial differential equations (PDEs) during numerical optimization using finite difference modeling, making them computationally expensive. Additionally, problems of this type are typically ill-posed, non-convex or ill-conditioned, thus making them challenging from a mathematical standpoint as well. Similar to the field of deep learning, this calls for software that is not only optimized for
performance, but also enables geophysical domain specialists to experiment with algorithms in
high-level programming languages and using different
computing environments, such as
high-
performance computing (HPC) clusters or the cloud. Furthermore, they call for the adaption of dimensionality reduction techniques and stochastic algorithms to address computational cost from the algorithmic side. This thesis makes three distinct contributions to address computational challenges encountered in seismic inverse problems and to facilitate algorithmic development in this field. Part one introduces a large-scale framework for seismic modeling and inversion based on the paradigm of separation of concerns, which combines a user interface based on domain specific abstractions with a Python package for automatic code generation to solve the underlying PDEs. The modular code structure makes it possible to manage the complexity of a seismic inversion code, while matrix-free linear operators and data containers enable the implementation of algorithms in a fashion that closely resembles the underlying mathematical notation. The second contribution of this thesis is an algorithm for seismic imaging, that addresses its
high computational cost and large memory imprint through a combination of on-the-fly Fourier transforms, stochastic sampling techniques and sparsity-promoting optimization. The algorithm combines the best of both time- and frequency-domain inversion, as the memory imprint is independent of the number of modeled time steps, while time-to-frequency conversions avoid the need to solve Helmholtz equations, which involve inverting ill-conditioned matrices. Part three of this thesis introduces a novel approach for adapting the cloud for
high-
performance computing applications like seismic imaging, which does not rely on a fixed cluster of permanently running virtual machines. Instead, computational resources are automatically started and terminated by the cloud environment during runtime and the workflow takes advantage of cloud-native technologies such as event-driven computations and containerized batch processing.
Advisors/Committee Members: Herrmann, Felix J. (advisor), Chow, Edmond (advisor), Vuduc, Richard (advisor), Peng, Zhigang (advisor), Romberg, Justin (advisor).
Subjects/Keywords: Seismic; Algorithm; Cloud; High-performance-computing; Geophysics
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Witte, P. A. (2020). Software and algorithms for large-scale seismic inverse problems. (Doctoral Dissertation). Georgia Tech. Retrieved from http://hdl.handle.net/1853/62754
Chicago Manual of Style (16th Edition):
Witte, Philipp Andre. “Software and algorithms for large-scale seismic inverse problems.” 2020. Doctoral Dissertation, Georgia Tech. Accessed March 04, 2021.
http://hdl.handle.net/1853/62754.
MLA Handbook (7th Edition):
Witte, Philipp Andre. “Software and algorithms for large-scale seismic inverse problems.” 2020. Web. 04 Mar 2021.
Vancouver:
Witte PA. Software and algorithms for large-scale seismic inverse problems. [Internet] [Doctoral dissertation]. Georgia Tech; 2020. [cited 2021 Mar 04].
Available from: http://hdl.handle.net/1853/62754.
Council of Science Editors:
Witte PA. Software and algorithms for large-scale seismic inverse problems. [Doctoral Dissertation]. Georgia Tech; 2020. Available from: http://hdl.handle.net/1853/62754

Rice University
15.
Fabien, Maurice S.
Hybridizable Discontinuous Galerkin Methods for Flow and Transport: Applications, Solvers, and High Performance Computing.
Degree: PhD, Engineering, 2019, Rice University
URL: http://hdl.handle.net/1911/105967
► This thesis proposal explores e cient computational methods for the approximation of solutions to partial di erential equations that model ow and transport phenomena in…
(more)
▼ This thesis proposal explores e cient computational methods for the approximation of
solutions to partial di erential equations that model
ow and transport phenomena in porous
media. These problems can be challenging to solve as the governing equations are
coupled, nonlinear, and material properties are often highly varying and discontinuous. The
high-order implicit hybridizable discontinuous method (HDG) is utilized for the discretization,
which signi cantly reduces the computational cost. To our knowledge, HDG methods
have not been previously applied to this class of complex problems in porous media. The
HDG method is
high-order accurate, locally mass-conservative, allows us to use unstructured
complicated meshes, and enables the use of static condensation. We demonstrate that the
HDG method is able to e ciently generate
high- delity simulations of
ow and transport
phenomena in porous media. Several challenging benchmarks are used to verify and validate
the method in heterogeneous porous media.
High-order methods give rise to less sparse
discretization matrices, which is problematic for linear solvers. To address the issue of less
sparse discretization matrices (compared to low-order methods), we develop and deployed a
novel nested multigrid method. It is based on a combination of p-multgrid, h-multigrid and
algebraic multigrid. The method is demonstrated to be algorithmically e cient, achieving
convergences rates of at most 0:2. We also show how to implement the multigrid technique
in many-core parallel architectures. Parallel
computing is a critical step in the simulation process, as it allows us to consider larger problems, and potentially generate simulations
faster. Traditional
performance measures like FLOPs or run-time are not entirely appropriate
for nite element problems, as they ignore solution accuracy. A new accuracy-inclusive
performance measure has been investigated as a part of my research. This
performance
measure, called the Time-Accuracy-Size spectrum (TAS), allows us to have a more complete
assessment of how e cient our algorithms are. Utilizing TAS also enables a systematic way
of determining which discretization is best suited for a given application.
Advisors/Committee Members: Riviere, Beatrice M. (advisor), Knepley, Matthew G. (committee member).
Subjects/Keywords: High performance computing; finite elements; linear solvers
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Fabien, M. S. (2019). Hybridizable Discontinuous Galerkin Methods for Flow and Transport: Applications, Solvers, and High Performance Computing. (Doctoral Dissertation). Rice University. Retrieved from http://hdl.handle.net/1911/105967
Chicago Manual of Style (16th Edition):
Fabien, Maurice S. “Hybridizable Discontinuous Galerkin Methods for Flow and Transport: Applications, Solvers, and High Performance Computing.” 2019. Doctoral Dissertation, Rice University. Accessed March 04, 2021.
http://hdl.handle.net/1911/105967.
MLA Handbook (7th Edition):
Fabien, Maurice S. “Hybridizable Discontinuous Galerkin Methods for Flow and Transport: Applications, Solvers, and High Performance Computing.” 2019. Web. 04 Mar 2021.
Vancouver:
Fabien MS. Hybridizable Discontinuous Galerkin Methods for Flow and Transport: Applications, Solvers, and High Performance Computing. [Internet] [Doctoral dissertation]. Rice University; 2019. [cited 2021 Mar 04].
Available from: http://hdl.handle.net/1911/105967.
Council of Science Editors:
Fabien MS. Hybridizable Discontinuous Galerkin Methods for Flow and Transport: Applications, Solvers, and High Performance Computing. [Doctoral Dissertation]. Rice University; 2019. Available from: http://hdl.handle.net/1911/105967

University of Illinois – Urbana-Champaign
16.
Jha, Saurabh.
Analysis of Gemini interconnect recovery mechanisms: methods and observations.
Degree: MS, Computer Science, 2016, University of Illinois – Urbana-Champaign
URL: http://hdl.handle.net/2142/95450
► This thesis focuses on the resilience of network components, and recovery capabilities of extreme-scale high-performance computing (HPC) systems, specifically petaflop-level supercomputers, aimed at solving complex…
(more)
▼ This thesis focuses on the resilience of network components, and recovery capabilities of extreme-scale
high-
performance computing (HPC) systems, specifically petaflop-level supercomputers, aimed at solving complex science, engineering, and business problems that require
high bandwidth, enhanced networking, and
high compute capabilities. The resilience of the network is critical for ensuring successful execution of the applications and overall system availability. Failure of interconnect components such as links, routers, power supply, etc. pose a threat to the resilience of the interconnect network, causing application failures and, in the worst case, system-wide failure. An extreme-scale system is designed to manage these failures and automatically recover from such failures to ensure successful application execution and avoid system-wide failure. Thus, in this thesis, we characterize the success probability of the recovery procedures as well as the impact of the recovery procedures on the applications.
We developed an interconnect recovery mechanisms analysis tool (I-RAT), a plugin built on top of LogDiver to characterize and assess the impact of recovery mechanisms. The tool was used to analyze more than two years of network/system logs from Blue Waters, a supercomputer operated by the NCSA at the University of Illinois. Our analyses show that recovery mechanisms are frequently triggered (in as little as 36 hours for link failovers) that can fail with relatively
high probability (as much as 0.25 for link failover). Furthermore, the analyses show that system resilience does not equate to application resilience since executing applications can fail with non-negligible probability during (or just after) a successful recovery.
Our analyses show that interconnect recovery mechanisms are frequently triggered (the mean time between triggers is as short as 36 hours for link failovers), and the initiated recovery fails with relatively
high probability (as much as 0.25 for link failover). We also show that as many as 20% of the executing applications fail during the recovery phase.
Advisors/Committee Members: Iyer, Ravishankar K. (advisor).
Subjects/Keywords: High Performance Computing; Fault Tolerance; Interconnects
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Jha, S. (2016). Analysis of Gemini interconnect recovery mechanisms: methods and observations. (Thesis). University of Illinois – Urbana-Champaign. Retrieved from http://hdl.handle.net/2142/95450
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
Chicago Manual of Style (16th Edition):
Jha, Saurabh. “Analysis of Gemini interconnect recovery mechanisms: methods and observations.” 2016. Thesis, University of Illinois – Urbana-Champaign. Accessed March 04, 2021.
http://hdl.handle.net/2142/95450.
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
MLA Handbook (7th Edition):
Jha, Saurabh. “Analysis of Gemini interconnect recovery mechanisms: methods and observations.” 2016. Web. 04 Mar 2021.
Vancouver:
Jha S. Analysis of Gemini interconnect recovery mechanisms: methods and observations. [Internet] [Thesis]. University of Illinois – Urbana-Champaign; 2016. [cited 2021 Mar 04].
Available from: http://hdl.handle.net/2142/95450.
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
Council of Science Editors:
Jha S. Analysis of Gemini interconnect recovery mechanisms: methods and observations. [Thesis]. University of Illinois – Urbana-Champaign; 2016. Available from: http://hdl.handle.net/2142/95450
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation

University of New Mexico
17.
Levy, Scott.
Using Rollback Avoidance to Mitigate Failures in Next-Generation Extreme-Scale Systems.
Degree: Department of Computer Science, 2016, University of New Mexico
URL: http://hdl.handle.net/1928/32314
► High-performance computing (HPC) systems enable scientists to numerically model complex phenomena in many important physical systems. The next major milestone in the development of HPC…
(more)
▼ High-
performance computing (HPC) systems enable scientists to numerically model complex phenomena in many important physical systems. The next major milestone in the development of HPC systems is the construction of the first supercomputer capable executing more than an exaflop, 10
18 floating point operations per second. On systems of this scale, failures will occur much more frequently than on current systems. As a result, resilience is a key obstacle to building next-generation extreme-scale systems. Coordinated checkpointing is currently the most widely-used mechanism for handling failures on HPC systems. Although coordinated checkpointing remains effective on current systems, increasing the scale of today's systems to build next-generation systems will increase the cost of fault tolerance as more and more time is taken away from the application to protect against or recover from failure. Rollback avoidance techniques seek to mitigate the cost of checkpoint/restart by allowing an application to continue its execution rather than rolling back to an earlier checkpoint when failures occur. These techniques include failure prediction and preventive migration, replicated computation, fault-tolerant algorithms, and software-based memory fault correction. In this thesis, I examine how rollback avoidance techniques can be used to address failures on extreme-scale systems. Using a combination of analytic modeling and simulation, I evaluate the potential impact of rollback avoidance on these systems. I then present a novel rollback avoidance technique that exploits similarities in application memory. Finally, I examine the feasibility of using this technique to protect against memory faults in kernel memory.
Advisors/Committee Members: Bridges, Patrick G., Ferreira, Kurt B., Arnold, Dorian, Lowenthal, David.
Subjects/Keywords: High-performance computing; Fault tolerance; Simulation; Modeling
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Levy, S. (2016). Using Rollback Avoidance to Mitigate Failures in Next-Generation Extreme-Scale Systems. (Doctoral Dissertation). University of New Mexico. Retrieved from http://hdl.handle.net/1928/32314
Chicago Manual of Style (16th Edition):
Levy, Scott. “Using Rollback Avoidance to Mitigate Failures in Next-Generation Extreme-Scale Systems.” 2016. Doctoral Dissertation, University of New Mexico. Accessed March 04, 2021.
http://hdl.handle.net/1928/32314.
MLA Handbook (7th Edition):
Levy, Scott. “Using Rollback Avoidance to Mitigate Failures in Next-Generation Extreme-Scale Systems.” 2016. Web. 04 Mar 2021.
Vancouver:
Levy S. Using Rollback Avoidance to Mitigate Failures in Next-Generation Extreme-Scale Systems. [Internet] [Doctoral dissertation]. University of New Mexico; 2016. [cited 2021 Mar 04].
Available from: http://hdl.handle.net/1928/32314.
Council of Science Editors:
Levy S. Using Rollback Avoidance to Mitigate Failures in Next-Generation Extreme-Scale Systems. [Doctoral Dissertation]. University of New Mexico; 2016. Available from: http://hdl.handle.net/1928/32314

Queens University
18.
Grant, Ryan Eric.
Improving High Performance Networking Technologies for Data Center Clusters
.
Degree: Electrical and Computer Engineering, 2012, Queens University
URL: http://hdl.handle.net/1974/7502
► This dissertation demonstrates new methods for increasing the performance and scalability of high performance networking technologies for use in clustered computing systems, concentrating on Ethernet/High-Speed…
(more)
▼ This dissertation demonstrates new methods for increasing the performance and scalability of high performance networking technologies for use in clustered computing systems, concentrating on Ethernet/High-Speed networking convergence. The motivation behind the improvement of high performance networking technologies and their importance to the viability of modern data centers is discussed first. It then introduces the concepts of high performance networking in a commercial data center context as well as high performance computing (HPC) and describes some of the most important challenges facing such networks in the future. It reviews current relevant literature and discusses problems that are not yet solved.
Through a study of existing high performance networks, the most promising features for future networks are identified. Sockets Direct Protocol (SDP) is shown to have unexpected performance issues for commercial applications, due to inefficiencies in handling large numbers of simultaneous connections. The first SDP over eXtended Reliable Connections implementation is developed to reduce connection management overhead, demonstrating that performance issues are related to protocol overhead at the SDP level. Datagram offloading for IP over InfiniBand (IPoIB) is found to work well.
In the first work of its kind, hybrid high-speed/Ethernet networks are shown to resolve the issues of SDP underperformance and demonstrate the potential for hybrid high-speed networking local area Remote Direct Memory Access (RDMA) technologies and Ethernet wide area networking for data centers.
Given the promising results from these studies, a set of solutions to enhance performance at the local and wide area network level for Ethernet is introduced, providing a scalable, connectionless, socket-compatible, fully RDMA-capable networking technology, datagram-iWARP. A novel method of performing RDMA Write operations (called RDMA Write-Record) and RDMA Read over unreliable datagrams over Ethernet is designed, implemented and tested. It shows its applicability in scientific and commercial application spaces and is applicable to other verbs-based networking interfaces such as InfiniBand.
The newly proposed RDMA methods, both for send/recv and RDMA Write-Record, are supplemented with interfaces for both socket-based applications and Message Passing Interface (MPI) applications. An MPI implementation is adapted to support datagram-iWARP. Both scalability and performance improvements are demonstrated for HPC and commercial applications.
Subjects/Keywords: Data centers
;
Cluster Computing
;
High Performance Networks
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Grant, R. E. (2012). Improving High Performance Networking Technologies for Data Center Clusters
. (Thesis). Queens University. Retrieved from http://hdl.handle.net/1974/7502
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
Chicago Manual of Style (16th Edition):
Grant, Ryan Eric. “Improving High Performance Networking Technologies for Data Center Clusters
.” 2012. Thesis, Queens University. Accessed March 04, 2021.
http://hdl.handle.net/1974/7502.
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
MLA Handbook (7th Edition):
Grant, Ryan Eric. “Improving High Performance Networking Technologies for Data Center Clusters
.” 2012. Web. 04 Mar 2021.
Vancouver:
Grant RE. Improving High Performance Networking Technologies for Data Center Clusters
. [Internet] [Thesis]. Queens University; 2012. [cited 2021 Mar 04].
Available from: http://hdl.handle.net/1974/7502.
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
Council of Science Editors:
Grant RE. Improving High Performance Networking Technologies for Data Center Clusters
. [Thesis]. Queens University; 2012. Available from: http://hdl.handle.net/1974/7502
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
19.
-6059-0490.
Distributed tensor computations: formalizing distributions, redistributions, and algorithm derivation.
Degree: PhD, Computer science, 2015, University of Texas – Austin
URL: http://hdl.handle.net/2152/33276
► A goal of computer science is to develop practical methods to automate tasks that are otherwise too complex or tedious to perform manually. Complex tasks…
(more)
▼ A goal of computer science is to develop practical methods to automate tasks
that are otherwise too complex or tedious to perform manually. Complex
tasks can include determining a practical algorithm and creating the
associated implementation for a given problem specification. Goal-oriented
programming can make this systematic. Therefore, we can rely on automated tools
to create implementations by expressing the task of creating implementations in
terms of goal-oriented programming. To do so, pertinent knowledge must be
encoded which requires a notation and language to define relevant abstractions.
This dissertation focuses on distributed-memory parallel tensor
computations arising from computational chemistry. Specifically, we focus on
applications based on the tensor contraction operation of dense, non-symmetric
tensors. Creating an efficient algorithm for a given problem specification in
this domain is complex; creating an optimized implementation of a developed
algorithm is even more complex, tedious, and error-prone. To this end, we
encode pertinent knowledge for distributed-memory parallel algorithms for tensor
contractions of dense non-symmetric tensors. We do this by developing a
notation for data distribution and redistribution that exposes a systematic
procedure for deriving a family of algorithms for this operation for which
efficient implementations exist.
We validate the developed ideas by implementing them in the Redistribution
Operations and Tensor Expressions application programming interface (ROTE API)
and encoding them into an automated system, DxTer, for systematically
generating efficient implementations from problem specifications. Experiments
performed on the IBM Blue Gene/Q and Cray XC30 architectures testing generated
implementations for the spin-adapted coupled cluster singles and doubles method
from computational chemistry demonstrate impact both in terms of
performance and
storage requirements.
Advisors/Committee Members: Van de Geijn, Robert A. (advisor), Kolda, Tamara G. (advisor), Stanton, John F (committee member), Pingali, Keshav (committee member), Hammond, Jeff R (committee member), Batory, Don S (committee member).
Subjects/Keywords: High-performance computing; Parallel programming; Tensor computations
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
-6059-0490. (2015). Distributed tensor computations: formalizing distributions, redistributions, and algorithm derivation. (Doctoral Dissertation). University of Texas – Austin. Retrieved from http://hdl.handle.net/2152/33276
Note: this citation may be lacking information needed for this citation format:
Author name may be incomplete
Chicago Manual of Style (16th Edition):
-6059-0490. “Distributed tensor computations: formalizing distributions, redistributions, and algorithm derivation.” 2015. Doctoral Dissertation, University of Texas – Austin. Accessed March 04, 2021.
http://hdl.handle.net/2152/33276.
Note: this citation may be lacking information needed for this citation format:
Author name may be incomplete
MLA Handbook (7th Edition):
-6059-0490. “Distributed tensor computations: formalizing distributions, redistributions, and algorithm derivation.” 2015. Web. 04 Mar 2021.
Note: this citation may be lacking information needed for this citation format:
Author name may be incomplete
Vancouver:
-6059-0490. Distributed tensor computations: formalizing distributions, redistributions, and algorithm derivation. [Internet] [Doctoral dissertation]. University of Texas – Austin; 2015. [cited 2021 Mar 04].
Available from: http://hdl.handle.net/2152/33276.
Note: this citation may be lacking information needed for this citation format:
Author name may be incomplete
Council of Science Editors:
-6059-0490. Distributed tensor computations: formalizing distributions, redistributions, and algorithm derivation. [Doctoral Dissertation]. University of Texas – Austin; 2015. Available from: http://hdl.handle.net/2152/33276
Note: this citation may be lacking information needed for this citation format:
Author name may be incomplete
20.
Malas, Tareq Majed Yasin.
Tiling and Asynchronous Communication Optimizations for Stencil Computations.
Degree: Computer, Electrical and Mathematical Sciences and Engineering (CEMSE) Division, 2015, King Abdullah University of Science and Technology
URL: http://hdl.handle.net/10754/583807
► The importance of stencil-based algorithms in computational science has focused attention on optimized parallel implementations for multilevel cache-based processors. Temporal blocking schemes leverage the large…
(more)
▼ The importance of stencil-based algorithms in computational science has focused attention on optimized parallel implementations for multilevel cache-based processors. Temporal blocking schemes leverage the large bandwidth and low latency of caches to accelerate stencil updates and approach theoretical peak
performance. A key ingredient is the reduction of data traffic across slow data paths, especially the main memory interface. Most of the established work concentrates on updating separate cache blocks per thread, which works on all types of shared memory systems, regardless of whether there is a shared cache among the cores. This approach is memory-bandwidth limited in several situations, where the cache space for each thread can be too small to provide sufficient in-cache data reuse.
We introduce a generalized multi-dimensional intra-tile parallelization scheme for shared-cache multicore processors that results in a significant reduction of cache size requirements and shows a large saving in memory bandwidth usage compared to existing approaches. It also provides data access patterns that allow efficient hardware prefetching. Our parameterized thread groups concept provides a controllable trade-off between concurrency and memory usage, shifting the pressure between the memory interface and the Central Processing Unit (CPU).We also introduce efficient diamond tiling structure for both shared memory cache
blocking and distributed memory relaxed-synchronization communication, demonstrated using one-dimensional domain decomposition. We describe the approach and our open-source testbed implementation details (called Girih), present
performance results on contemporary Intel processors, and apply advanced
performance modeling techniques to reconcile the observed
performance with hardware capabilities. Furthermore, we conduct a comparison with the state-of-the-art stencil frameworks PLUTO and Pochoir in shared memory, using corner-case stencil operators. We study the impact of the diamond tile size on computational intensity, cache block size, and energy consumption. The impact of computational intensity on power dissipation on the CPU and in the DRAM is investigated and shows that DRAM power is a decisive factor for energy consumption in the Intel Ivy Bridge processor, which is strongly influenced by the computational intensity. Moreover, we show that highest
performance does not necessarily lead to lowest energy even if the clock speed is fixed. We apply our approach to an electromagnetic simulation application for solar cell development, demonstrating several-fold speedup compared to an efficient spatially blocked variant. Finally, we discuss the integration of our approach with other techniques for future
High Performance Computing (HPC) systems, which are expected to be more memory bandwidth-starved with a deeper memory hierarchy.
Advisors/Committee Members: Keyes, David E. (advisor), Ltaief, Hatem (committee member), Ketcheson, David I. (committee member), Shihada, Basem (committee member), Elnozahy, Mootaz (committee member), Matsuoka, Satoshi (committee member).
Subjects/Keywords: High Performance Computing; stencil computations; tiling
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Malas, T. M. Y. (2015). Tiling and Asynchronous Communication Optimizations for Stencil Computations. (Thesis). King Abdullah University of Science and Technology. Retrieved from http://hdl.handle.net/10754/583807
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
Chicago Manual of Style (16th Edition):
Malas, Tareq Majed Yasin. “Tiling and Asynchronous Communication Optimizations for Stencil Computations.” 2015. Thesis, King Abdullah University of Science and Technology. Accessed March 04, 2021.
http://hdl.handle.net/10754/583807.
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
MLA Handbook (7th Edition):
Malas, Tareq Majed Yasin. “Tiling and Asynchronous Communication Optimizations for Stencil Computations.” 2015. Web. 04 Mar 2021.
Vancouver:
Malas TMY. Tiling and Asynchronous Communication Optimizations for Stencil Computations. [Internet] [Thesis]. King Abdullah University of Science and Technology; 2015. [cited 2021 Mar 04].
Available from: http://hdl.handle.net/10754/583807.
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
Council of Science Editors:
Malas TMY. Tiling and Asynchronous Communication Optimizations for Stencil Computations. [Thesis]. King Abdullah University of Science and Technology; 2015. Available from: http://hdl.handle.net/10754/583807
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation

Penn State University
21.
Labarbera, Nicholas Andrew.
Anisotropic Conductivity and Uncertainty Quantification in Irreversible Electroporation Simulations.
Degree: 2018, Penn State University
URL: https://submit-etda.libraries.psu.edu/catalog/15917nal5047
► In the past few years, interest has drastically increased in using surgically inserted electrodes to ablate cancer cells. The treatment is referred to as irreversible…
(more)
▼ In the past few years, interest has drastically increased in using surgically inserted electrodes to ablate cancer cells. The treatment is referred to as irreversible electroporation (IRE) and has the advantage of being a minimally invasive procedure that can be used to treat tumors while inflicting minimal damage to surrounding tissue and preserving blood vessels. However, treatment planning is required to ensure the electrodes are placed in the correct location and at the proper voltages such that all cancer cells are killed while damaging as few healthy cells as possible. This treatment planning is accomplished through the use of computer simulations.
The accuracy of models to predict tissue ablation from IRE is an important component to IRE treatments. It has been well established that the conductivity of tissue increases as the electrical field increases, and a conductivity dependent on electrical field strength is often included in treatment planning models.
However, previous work increases conductivity equally in all directions. This dissertation presents a novel formulation that increases the conductivity more in the direction of the electrical field. There is both theoretical and experimental evidence previously published to support this formulation. Results using this novel formulation are compared to previously published models.
The second part of the dissertation focuses on performing uncertainty quantification to determine how uncertainty in physical parameters affects the extent of ablation.
There is a degree of uncertainty in the material properties of each specific person, as no two humans are identical. There is further uncertainty due to the incomplete knowledge of how the tissue's properties vary during exposure to strong electrical fields. The goal of this research is to provide more knowledge that can be used for the continued development of treatment planning protocols for irreversible electroporation.
The third part of the dissertation presents a novel work-flow that will allow medical doctors to perform treatment planning in such a way that they can easily get feedback on how adjustments in electrode number and placement affects possible ablation shapes. The work-flow utilizes linear models to determine possible ablation zones before using nonlinear models to determine voltages necessary to ablate the target zone. Enabling medical doctors to have a more active role in the treatment planning phase when compared to optimization algorithms, should improve the treatment planning process
Advisors/Committee Members: Huanyu Cheng, Dissertation Advisor/Co-Advisor, Huanyu Cheng, Committee Chair/Co-Chair, Joseph Paul Cusumano, Committee Member, Corina Stefania Drapaca, Committee Member, Michael Kinzel, Outside Member, Sean McInytre, Special Member.
Subjects/Keywords: Simulation; High Performance Computing; Electroporation; Mathematical Modelling
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Labarbera, N. A. (2018). Anisotropic Conductivity and Uncertainty Quantification in Irreversible Electroporation Simulations. (Thesis). Penn State University. Retrieved from https://submit-etda.libraries.psu.edu/catalog/15917nal5047
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
Chicago Manual of Style (16th Edition):
Labarbera, Nicholas Andrew. “Anisotropic Conductivity and Uncertainty Quantification in Irreversible Electroporation Simulations.” 2018. Thesis, Penn State University. Accessed March 04, 2021.
https://submit-etda.libraries.psu.edu/catalog/15917nal5047.
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
MLA Handbook (7th Edition):
Labarbera, Nicholas Andrew. “Anisotropic Conductivity and Uncertainty Quantification in Irreversible Electroporation Simulations.” 2018. Web. 04 Mar 2021.
Vancouver:
Labarbera NA. Anisotropic Conductivity and Uncertainty Quantification in Irreversible Electroporation Simulations. [Internet] [Thesis]. Penn State University; 2018. [cited 2021 Mar 04].
Available from: https://submit-etda.libraries.psu.edu/catalog/15917nal5047.
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
Council of Science Editors:
Labarbera NA. Anisotropic Conductivity and Uncertainty Quantification in Irreversible Electroporation Simulations. [Thesis]. Penn State University; 2018. Available from: https://submit-etda.libraries.psu.edu/catalog/15917nal5047
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
22.
Chapuis, Guillaume.
Exploiting parallel features of modern computer architectures in bioinformatics : applications to genetics, structure comparison and large graph analysis : Exploiter les capacités de calcul parallèle des architectures modernes en bioinformatique.
Degree: Docteur es, Informatique, 2013, Cachan, Ecole normale supérieure
URL: http://www.theses.fr/2013DENS0068
► La croissance exponentielle de la génération de données pour la bioinformatique couplée à une stagnation des fréquences d’horloge des processeurs modernes accentuent la nécessité de…
(more)
▼ La croissance exponentielle de la génération de données pour la bioinformatique couplée à une stagnation des fréquences d’horloge des processeurs modernes accentuent la nécessité de fournir des implémentation tirant bénéfice des capacités parallèles des ordinateurs modernes. Cette thèse se concentre sur des algorithmes et implementations pour des problèmes de bioinformatique. Plusieurs types de parallélisme sont décrits et exploités. Cette thèse présente des applications en génétique, avec un outil de détection de QTL paralllisé sur GPU, en comparaison de structures de protéines, avec un outil permettant de trouver des régions similaires entre protéines parallélisé sur CPU, ainsi qu’à l’analyse de larges graphes avec une implémentation multi-GPUs d’un nouvel algorithme pour le problème du «All-Pairs Shortest Path».
The exponential growth in bioinformatics data generation and the stagnation of processor frequencies in modern processors stress the need for efficient implementations that fully exploit the parallel capabilities offered by modern computers. This thesis focuses on parallel algorithms and implementations for bioinformatics problems. Various types of parallelism are described and exploited. This thesis presents applications in genetics with a GPU parallel tool for QTL detection, in protein structure comparison with a multicore parallel tool for finding similar regions between proteins, and large graph analysis with a multi-GPU parallel implementation for a novel algorithm for the All-Pairs Shortest Path problem.
Advisors/Committee Members: Lavenier, Dominique (thesis director).
Subjects/Keywords: Bioinformatique; Calcul parallèle; Bioinformatics; High performance computing
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Chapuis, G. (2013). Exploiting parallel features of modern computer architectures in bioinformatics : applications to genetics, structure comparison and large graph analysis : Exploiter les capacités de calcul parallèle des architectures modernes en bioinformatique. (Doctoral Dissertation). Cachan, Ecole normale supérieure. Retrieved from http://www.theses.fr/2013DENS0068
Chicago Manual of Style (16th Edition):
Chapuis, Guillaume. “Exploiting parallel features of modern computer architectures in bioinformatics : applications to genetics, structure comparison and large graph analysis : Exploiter les capacités de calcul parallèle des architectures modernes en bioinformatique.” 2013. Doctoral Dissertation, Cachan, Ecole normale supérieure. Accessed March 04, 2021.
http://www.theses.fr/2013DENS0068.
MLA Handbook (7th Edition):
Chapuis, Guillaume. “Exploiting parallel features of modern computer architectures in bioinformatics : applications to genetics, structure comparison and large graph analysis : Exploiter les capacités de calcul parallèle des architectures modernes en bioinformatique.” 2013. Web. 04 Mar 2021.
Vancouver:
Chapuis G. Exploiting parallel features of modern computer architectures in bioinformatics : applications to genetics, structure comparison and large graph analysis : Exploiter les capacités de calcul parallèle des architectures modernes en bioinformatique. [Internet] [Doctoral dissertation]. Cachan, Ecole normale supérieure; 2013. [cited 2021 Mar 04].
Available from: http://www.theses.fr/2013DENS0068.
Council of Science Editors:
Chapuis G. Exploiting parallel features of modern computer architectures in bioinformatics : applications to genetics, structure comparison and large graph analysis : Exploiter les capacités de calcul parallèle des architectures modernes en bioinformatique. [Doctoral Dissertation]. Cachan, Ecole normale supérieure; 2013. Available from: http://www.theses.fr/2013DENS0068

Oregon State University
23.
Bae, Myung Mun.
Resource placement, data rearrangement, and Hamiltonian cycles in torus networks.
Degree: PhD, Computer Science, 1996, Oregon State University
URL: http://hdl.handle.net/1957/34129
► Many parallel machines, both commercial and experimental, have been/are being designed with toroidal interconnection networks. For a given number of nodes, the torus has a…
(more)
▼ Many parallel machines, both commercial and experimental, have been/are being designed with toroidal interconnection networks. For a given number of nodes, the torus has a relatively larger diameter, but better cost/
performance tradeoffs, such as higher channel bandwidth, and lower node degree, when compared to the hypercube. Thus, the torus is becoming a popular topology for the interconnection network of a
high performance parallel computers.
In a multicomputer, the resources, such as I/O devices or software packages, are distributed over the networks. The first part of the thesis investigates efficient methods of distributing resources in a torus network. Three classes of placement methods are studied. They are (1) distant-t placement problem: in this case, any non-resource node is at a distance of at most t from some resource nodes, (2) j-adjacency problem: here, a non-resource node is adjacent to at least j resource nodes, and (3) generalized placement problem: a non-resource node must be a distance of at most t from at least j resource nodes.
This resource placement technique can be applied to allocating spare processors to provide fault-tolerance in the case of the processor failures. Some efficient
spare processor placement methods and reconfiguration schemes in the case of processor failures are also described.
In a torus based parallel system, some algorithms give best
performance if the data are distributed to processors numbered in Cartesian order; in some other cases, it is better to distribute the data to processors numbered in Gray code order. Since the placement patterns may be changed dynamically, it is essential to find efficient methods of rearranging the data from Gray code order to Cartesian order and vice versa. In the second part of the thesis, some efficient methods for data transfer from Cartesian order to radix order and vice versa are developed.
The last part of the thesis gives results on generating edge disjoint Hamiltonian cycles in k-ary n-cubes, hypercubes, and 2D tori. These edge disjoint cycles are quite useful for many communication algorithms.
Advisors/Committee Members: Bose, Bella (advisor), Cook, Curtis (committee member).
Subjects/Keywords: High performance computing
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Bae, M. M. (1996). Resource placement, data rearrangement, and Hamiltonian cycles in torus networks. (Doctoral Dissertation). Oregon State University. Retrieved from http://hdl.handle.net/1957/34129
Chicago Manual of Style (16th Edition):
Bae, Myung Mun. “Resource placement, data rearrangement, and Hamiltonian cycles in torus networks.” 1996. Doctoral Dissertation, Oregon State University. Accessed March 04, 2021.
http://hdl.handle.net/1957/34129.
MLA Handbook (7th Edition):
Bae, Myung Mun. “Resource placement, data rearrangement, and Hamiltonian cycles in torus networks.” 1996. Web. 04 Mar 2021.
Vancouver:
Bae MM. Resource placement, data rearrangement, and Hamiltonian cycles in torus networks. [Internet] [Doctoral dissertation]. Oregon State University; 1996. [cited 2021 Mar 04].
Available from: http://hdl.handle.net/1957/34129.
Council of Science Editors:
Bae MM. Resource placement, data rearrangement, and Hamiltonian cycles in torus networks. [Doctoral Dissertation]. Oregon State University; 1996. Available from: http://hdl.handle.net/1957/34129

Oregon State University
24.
Moore, Jason Andrew.
High-performance data-parallel input/output.
Degree: PhD, Computer Science, 1996, Oregon State University
URL: http://hdl.handle.net/1957/34460
► Existing parallel file systems are proving inadequate in two important arenas: programmability and performance. Both of these inadequacies can largely be traced to the fact…
(more)
▼ Existing parallel file systems are proving inadequate in two important arenas:
programmability and
performance. Both of these inadequacies can largely be traced
to the fact that nearly all parallel file systems evolved from Unix and rely on a Unix-oriented,
single-stream, block-at-a-time approach to file I/O. This one-size-fits-all
approach to parallel file systems is inadequate for supporting applications running
on distributed-memory parallel computers.
This research provides a migration path away from the traditional approaches
to parallel I/O at two levels. At the level seen by the programmer, we show how
file operations can be closely integrated with the semantics of a parallel language.
Principles for this integration are illustrated in their application to C*, a virtual-processor-
oriented language. The result is that traditional C file operations with
familiar semantics can be used in C* where the programmer works – at the virtual
processor level. To facilitate
high performance within this framework, machine-independent
modes are used. Modes change the
performance of file operations,
not their semantics, so programmers need not use ambiguous operations found in
many parallel file systems. An automatic mode detection technique is presented
that saves the programmer from extra syntax and low-level file system details. This
mode detection system ensures that the most commonly encountered file operations
are performed using
high-
performance modes.
While the
high-
performance modes allow fast collective movement of file data,
they must include optimizations for redistribution of file data, a common operation
in production scientific code. This need is addressed at the file system level, where
we provide enhancements to Disk-Directed I/O for redistributing file data. Two
enhancements are geared to speeding fine-grained redistributions. One uses a two-phase,
or indirect, approach to redistributing data among compute nodes. The
other relies on I/O nodes to guide the redistribution by building packets bound for
compute nodes. We model the
performance of these enhancements and determine
the key parameters determining when each approach should be used. Finally, we
introduce the notion of collective prefetching and identify its
performance benefits
and implementation tradeoffs.
Advisors/Committee Members: Quinn, Michael J. (advisor).
Subjects/Keywords: High performance computing
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Moore, J. A. (1996). High-performance data-parallel input/output. (Doctoral Dissertation). Oregon State University. Retrieved from http://hdl.handle.net/1957/34460
Chicago Manual of Style (16th Edition):
Moore, Jason Andrew. “High-performance data-parallel input/output.” 1996. Doctoral Dissertation, Oregon State University. Accessed March 04, 2021.
http://hdl.handle.net/1957/34460.
MLA Handbook (7th Edition):
Moore, Jason Andrew. “High-performance data-parallel input/output.” 1996. Web. 04 Mar 2021.
Vancouver:
Moore JA. High-performance data-parallel input/output. [Internet] [Doctoral dissertation]. Oregon State University; 1996. [cited 2021 Mar 04].
Available from: http://hdl.handle.net/1957/34460.
Council of Science Editors:
Moore JA. High-performance data-parallel input/output. [Doctoral Dissertation]. Oregon State University; 1996. Available from: http://hdl.handle.net/1957/34460

University of California – Santa Cruz
25.
Ionkov, Latchesar.
Optimizing Access to Scientific Data for Storage, Analysis and Visualization.
Degree: Computer Science, 2018, University of California – Santa Cruz
URL: http://www.escholarship.org/uc/item/4vs7g3pk
► Scientific workflows contain an increasing number of interactingapplications, often with big disparity between the formats of databeing produced and consumed by different applications. This mismatchcan…
(more)
▼ Scientific workflows contain an increasing number of interactingapplications, often with big disparity between the formats of databeing produced and consumed by different applications. This mismatchcan result in performance degradation as data retrieval causesmultiple read operations (often to a remote storage system) in orderto convert the data. In recent years, with the large increase in theamount of data and computational power available there is demand forapplications to support data access in-situ, or close-to simulation toprovide application steering, analytics and visualization.Although some parallel filesystems and middlewarelibraries attempt to identify access patterns and optimize dataretrieval, they frequently fail if the patterns are complex. It isevident that more knowledge of the structure of the datasets at thestorage systems level will provide many opportunities for furtherperformance improvements.For most developers of scientific applications, storing theapplication data, and its particular format on disk, is not anessential part of the application. Although they acknowledge theimportance of the I/O performance, their expertise lies mostly innumerical simulations and the particular models their applicationsimulates. Most of their efforts are spent of ensuring that theit produces correct numerical results. Ideally, they would like to beable to have a library call that reads a subset of the data from storage (nomatter what its format is), and place it in the data structures thesimulation defines in the computer memory. Since the data needs to beanalyzed and visualized, and the data has to be accessible fromthird-party tools, the scientists are forced to know more about thedata formats.In this dissertation we investigate multiple techniques for utilizingdataset description for improving performance and overall dataavailability for HPC applications. We introduce a declarative datadescription language that can be used to define the complete datasetas well as parts of it. These descriptions are used to generatetransformation rules that allow data to be converted between differentphysical layouts on storage and in memory.First, we define the DRepl dataset description language and use it toimplement divergent data views and replicas as POSIX files. Weevaluate the performance for this approach and demonstrate itsadvantages both because of the transparent application use, andcombined performance when the application is combined with analyticsand/or visualization code that reads the data in different format.DRepl decouples the data producers and consumers and the data layoutsthey use from the way the data is stored on the storage system.DRepl has shown up to 2x for cumulative performance when data isaccessed using optimized replicas.Second, we extend the previous approach to the parallel environmentused in HPC. Instead of using POSIX files, the new method allows datato be accessed in larger chunks (fragments) in the way it will be laidout in memory. The developers can define what data structures theyhave in the…
Subjects/Keywords: Computer science; high performance computing; storage
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Ionkov, L. (2018). Optimizing Access to Scientific Data for Storage, Analysis and Visualization. (Thesis). University of California – Santa Cruz. Retrieved from http://www.escholarship.org/uc/item/4vs7g3pk
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
Chicago Manual of Style (16th Edition):
Ionkov, Latchesar. “Optimizing Access to Scientific Data for Storage, Analysis and Visualization.” 2018. Thesis, University of California – Santa Cruz. Accessed March 04, 2021.
http://www.escholarship.org/uc/item/4vs7g3pk.
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
MLA Handbook (7th Edition):
Ionkov, Latchesar. “Optimizing Access to Scientific Data for Storage, Analysis and Visualization.” 2018. Web. 04 Mar 2021.
Vancouver:
Ionkov L. Optimizing Access to Scientific Data for Storage, Analysis and Visualization. [Internet] [Thesis]. University of California – Santa Cruz; 2018. [cited 2021 Mar 04].
Available from: http://www.escholarship.org/uc/item/4vs7g3pk.
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
Council of Science Editors:
Ionkov L. Optimizing Access to Scientific Data for Storage, Analysis and Visualization. [Thesis]. University of California – Santa Cruz; 2018. Available from: http://www.escholarship.org/uc/item/4vs7g3pk
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation

Indiana University
26.
Gilmanov, Timur.
Lower bound resource requirements for machine intelligence
.
Degree: 2018, Indiana University
URL: http://hdl.handle.net/2022/22595
► Recent advancements in technology and the field of artificial intelligence provide a platform for new applications in a wide range of areas, including healthcare, engineering,…
(more)
▼ Recent advancements in technology and the field of artificial intelligence provide a platform for new applications in a wide range of areas, including healthcare, engineering, vision, and natural language processing, that would be considered unattainable one or two decades ago. With the expected compound annual growth rate of 50% during the years of 2017–2021, the field of global artificial intelligence is set to observe increases in computational complexities and amounts of sensor data processed. In spite of the advancements in the field, truly intelligent machine behavior operating in real time is yet an unachieved milestone. First, in order to quantify such behavior, a definition of machine intelligence would be required, which has not been agreed upon by the community at large. Second, delivering full machine intelligence, as defined in this work, is beyond the scope of today’s cutting-edge
high-
performance computing machines. One important aspect of machine intelligent systems is resource requirements and the limitations that today’s and future machines could impose on such systems. The goal of this research effort is to provide an estimate on the lower bound resource requirements for machine intelligence. A working definition of machine intelligence for purposes of this research is provided, along with definitions of an abstract architecture, workflow, and
performance model. Combined together, these tools allow an estimate on resource requirements for problems of machine intelligence, and provide an estimate of such requirements in the future.
Advisors/Committee Members: Sterling, Thomas (advisor).
Subjects/Keywords: machine intelligence;
artificial intelligence;
high performance computing
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Gilmanov, T. (2018). Lower bound resource requirements for machine intelligence
. (Thesis). Indiana University. Retrieved from http://hdl.handle.net/2022/22595
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
Chicago Manual of Style (16th Edition):
Gilmanov, Timur. “Lower bound resource requirements for machine intelligence
.” 2018. Thesis, Indiana University. Accessed March 04, 2021.
http://hdl.handle.net/2022/22595.
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
MLA Handbook (7th Edition):
Gilmanov, Timur. “Lower bound resource requirements for machine intelligence
.” 2018. Web. 04 Mar 2021.
Vancouver:
Gilmanov T. Lower bound resource requirements for machine intelligence
. [Internet] [Thesis]. Indiana University; 2018. [cited 2021 Mar 04].
Available from: http://hdl.handle.net/2022/22595.
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
Council of Science Editors:
Gilmanov T. Lower bound resource requirements for machine intelligence
. [Thesis]. Indiana University; 2018. Available from: http://hdl.handle.net/2022/22595
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation

Virginia Tech
27.
Gabriel, Matthew Frederick.
An Expanded Speedup Model for the Early Phases of High Performance Computing Cluster (HPCC) Design.
Degree: MS, Computer Engineering, 2013, Virginia Tech
URL: http://hdl.handle.net/10919/22053
► The size and complexity of many scientific and enterprise-level applications require a high degree of parallelization in order to produce outputs within an acceptable period…
(more)
▼ The size and complexity of many scientific and enterprise-level applications require a
high degree of parallelization in order to produce outputs within an acceptable period of time. This often necessitates the uses of
high performance computing clusters (HPCCs) and parallelized applications which are carefully designed and optimized. A myriad of papers study the various factors which influence
performance and then attempt to quantify the maximum theoretical speedup that can be achieved by a cluster relative to a sequential processor.
The studies tend to only investigate the influences in isolation, but in practice these factors tend to be interdependent. It is the interaction rather than any solitary influence which normally creates the bounds of the design trade space. In the attempt to address this disconnect, this thesis blends the studies into an expanded speedup model which captures the interplay. The model is intended to help the cluster engineer make initial estimates during the early phases of design while the system is not mature enough for refinement using timing studies.
The model pulls together factors such as problem scaling, resource allocation, critical sections, and the problem's inherent parallelizability. The derivation was examined theoretically and then validated by timing studies on a physical HPCC. The validation studies found that the model was an adequate generic first approximation. However, it was also found that customizations may be needed in order to account for application-specific influences such as bandwidth limitations and communication delays which are not readily incorporated into a generic model.
Advisors/Committee Members: Hsiao, Michael S. (committeechair), Pratt, Timothy J. (committee member), Silva, Luiz A. (committee member).
Subjects/Keywords: High Performance Computing; Speedup; Amdahl; Gustafson
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Gabriel, M. F. (2013). An Expanded Speedup Model for the Early Phases of High Performance Computing Cluster (HPCC) Design. (Masters Thesis). Virginia Tech. Retrieved from http://hdl.handle.net/10919/22053
Chicago Manual of Style (16th Edition):
Gabriel, Matthew Frederick. “An Expanded Speedup Model for the Early Phases of High Performance Computing Cluster (HPCC) Design.” 2013. Masters Thesis, Virginia Tech. Accessed March 04, 2021.
http://hdl.handle.net/10919/22053.
MLA Handbook (7th Edition):
Gabriel, Matthew Frederick. “An Expanded Speedup Model for the Early Phases of High Performance Computing Cluster (HPCC) Design.” 2013. Web. 04 Mar 2021.
Vancouver:
Gabriel MF. An Expanded Speedup Model for the Early Phases of High Performance Computing Cluster (HPCC) Design. [Internet] [Masters thesis]. Virginia Tech; 2013. [cited 2021 Mar 04].
Available from: http://hdl.handle.net/10919/22053.
Council of Science Editors:
Gabriel MF. An Expanded Speedup Model for the Early Phases of High Performance Computing Cluster (HPCC) Design. [Masters Thesis]. Virginia Tech; 2013. Available from: http://hdl.handle.net/10919/22053

University of Oregon
28.
Binyahib, Roba.
Evaluating Parallel Particle Advection Algorithms Over Various Workloads.
Degree: PhD, Department of Computer and Information Science, 2020, University of Oregon
URL: https://scholarsbank.uoregon.edu/xmlui/handle/1794/25595
► We consider the problem of efficient particle advection in a distributed- memory parallel setting, focusing on four popular parallelization algorithms. The performance of each of…
(more)
▼ We consider the problem of efficient particle advection in a distributed- memory parallel setting, focusing on four popular parallelization algorithms. The
performance of each of these algorithms varies based on the desired workload. Our research focuses on two important questions: (1) which parallelization techniques perform best for a given workload?, and (2) what are the unsolved problems in parallel particle advection? To answer these questions, we ran a “bake off” study between the algorithms with 216 tests, going to a concurrency up to 8192 cores and considering data sets as large as 34 billion cells with 300 million particles. We also performed a variety of optimizations to the algorithms, including fundamental enhancements to the “work requesting algorithm” and we introduce a new hybrid algorithm that we call “HyLiPoD.” Our findings inform tradeoffs between the algorithms and when domain scientists should switch between them to obtain better
performance. Finally, we consider the future of parallel particle advection, i.e., how these algorithms will be run with in situ processing.
This dissertation includes previously published co-authored material.
Advisors/Committee Members: Childs, Hank (advisor).
Subjects/Keywords: Flow Visualization; High Performance Computing; Scientific Visualization
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Binyahib, R. (2020). Evaluating Parallel Particle Advection Algorithms Over Various Workloads. (Doctoral Dissertation). University of Oregon. Retrieved from https://scholarsbank.uoregon.edu/xmlui/handle/1794/25595
Chicago Manual of Style (16th Edition):
Binyahib, Roba. “Evaluating Parallel Particle Advection Algorithms Over Various Workloads.” 2020. Doctoral Dissertation, University of Oregon. Accessed March 04, 2021.
https://scholarsbank.uoregon.edu/xmlui/handle/1794/25595.
MLA Handbook (7th Edition):
Binyahib, Roba. “Evaluating Parallel Particle Advection Algorithms Over Various Workloads.” 2020. Web. 04 Mar 2021.
Vancouver:
Binyahib R. Evaluating Parallel Particle Advection Algorithms Over Various Workloads. [Internet] [Doctoral dissertation]. University of Oregon; 2020. [cited 2021 Mar 04].
Available from: https://scholarsbank.uoregon.edu/xmlui/handle/1794/25595.
Council of Science Editors:
Binyahib R. Evaluating Parallel Particle Advection Algorithms Over Various Workloads. [Doctoral Dissertation]. University of Oregon; 2020. Available from: https://scholarsbank.uoregon.edu/xmlui/handle/1794/25595

Rutgers University
29.
Gamell Balmana, Marc, 1989-.
Application-aware on-line failure recovery for extreme-scale HPC environments.
Degree: PhD, Electrical and Computer Engineering, 2017, Rutgers University
URL: https://rucore.libraries.rutgers.edu/rutgers-lib/53618/
► High Performance Computing (HPC) brings with it the promise of deeper insight into complex phenomena through the execution of various extreme-scale applications, especially those in…
(more)
▼ High Performance Computing (HPC) brings with it the promise of deeper insight into complex phenomena through the execution of various extreme-scale applications, especially those in the fields of science and engineering. The increasing computational demands of these applications continue to push the limits of current extreme scale HPC systems. As a result, the community is working toward achieving exascale systems able to compute 1018 floating point operations per second (FLOPS). Since these systems are expected to contain a large number of components, reliability is one of the key anticipated challenges. Due to the extensive periods of time that complex applications require, future systems will likely see an increase in process and node failures during application execution. These failures, also known as hard failures, are currently handled by terminating the execution and restarting it from the last stored checkpoint. This checkpoint-restart methodology requires the application to periodically save its distributed state into a centralized, stable storage – an approach that is not expected to scale to future extreme-scale systems. While the illusion of a failure-free machine – implemented either via hardware or system software strategies – is adequate for current HPC systems, they may prove too costly in future extreme-scale machines. Resilience is, therefore, a key challenge that must be addressed in order to realize the exascale vision. This dissertation explores new models that leverage application-awareness to enable on-line failure recovery. On-line recovery, which does not require the interruption of surviving processes in order to collectively restart the entire application, offers better cost/performance tradeoffs by reducing recovery overheads. Recovering processes on-line enables application-specific data recovery strategies and optimized in-memory checkpointing while avoiding the repetition of initialization procedures – the least optimized part of most production-level applications – on all processes. This dissertation addresses three areas of research in on-line failure recovery. First, it explores a generic global on-line recovery model, involving all processes in the recovery process. Second, it explores optimized local recovery in which communication characteristics of certain application classes are leveraged to reduce overheads due to failure. In particular, finite difference partial differential equation solvers using stencil operators are used as the driving application class. Third, this dissertation demonstrates how the overhead of multiple, independent failures can be masked to effectively reduce the impact on total execution time. The models presented in this dissertation are implemented and evaluated in Fenix and FenixLR, a pair of generic and extensible frameworks used to demonstrate the concepts.
Advisors/Committee Members: Parashar, Manish (chair), Marsic, Ivan (internal member), Silver, Deborah (internal member), Teranishi, Keita (outside member).
Subjects/Keywords: Fault tolerance (Engineering); High performance computing
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Gamell Balmana, Marc, 1. (2017). Application-aware on-line failure recovery for extreme-scale HPC environments. (Doctoral Dissertation). Rutgers University. Retrieved from https://rucore.libraries.rutgers.edu/rutgers-lib/53618/
Chicago Manual of Style (16th Edition):
Gamell Balmana, Marc, 1989-. “Application-aware on-line failure recovery for extreme-scale HPC environments.” 2017. Doctoral Dissertation, Rutgers University. Accessed March 04, 2021.
https://rucore.libraries.rutgers.edu/rutgers-lib/53618/.
MLA Handbook (7th Edition):
Gamell Balmana, Marc, 1989-. “Application-aware on-line failure recovery for extreme-scale HPC environments.” 2017. Web. 04 Mar 2021.
Vancouver:
Gamell Balmana, Marc 1. Application-aware on-line failure recovery for extreme-scale HPC environments. [Internet] [Doctoral dissertation]. Rutgers University; 2017. [cited 2021 Mar 04].
Available from: https://rucore.libraries.rutgers.edu/rutgers-lib/53618/.
Council of Science Editors:
Gamell Balmana, Marc 1. Application-aware on-line failure recovery for extreme-scale HPC environments. [Doctoral Dissertation]. Rutgers University; 2017. Available from: https://rucore.libraries.rutgers.edu/rutgers-lib/53618/

University of St. Andrews
30.
Yu, Teng.
Heterogeneity-aware scheduling and data partitioning for system performance acceleration
.
Degree: 2020, University of St. Andrews
URL: http://hdl.handle.net/10023/19797
► Over the past decade, heterogeneous processors and accelerators have become increasingly prevalent in modern computing systems. Compared with previous homogeneous parallel machines, the hardware heterogeneity…
(more)
▼ Over the past decade, heterogeneous processors and accelerators have become increasingly prevalent in modern
computing systems. Compared with previous homogeneous parallel machines, the hardware heterogeneity in modern systems provides new opportunities and challenges for
performance acceleration. Classic operating systems optimisation problems such as task scheduling, and application-specific optimisation techniques such as the adaptive data partitioning of parallel algorithms, are both required to work together to address hardware heterogeneity.
Significant effort has been invested in this problem, but either focuses on a specific type of heterogeneous systems or algorithm, or a
high-level framework without insight into the difference in heterogeneity between different types of system. A general software framework is required, which can not only be adapted to multiple types of systems and workloads, but is also equipped with the techniques to address a variety of hardware heterogeneity.
This thesis presents approaches to design general heterogeneity-aware software frameworks for system
performance acceleration. It covers a wide variety of systems, including an OS scheduler targeting on-chip asymmetric multi-core processors (AMPs) on mobile devices, a hierarchical many-core supercomputer and multi-FPGA systems for
high performance computing (HPC) centers. Considering heterogeneity from on-chip AMPs, such as thread criticality, core sensitivity, and relative fairness, it suggests a collaborative based approach to co-design the task selector and core allocator on OS scheduler. Considering the typical sources of heterogeneity in HPC systems, such as the memory hierarchy, bandwidth limitations and asymmetric physical connection, it proposes an application-specific automatic data partitioning method for a modern supercomputer, and a topological-ranking heuristic based schedule for a multi-FPGA based reconfigurable cluster.
Experiments on both a full system simulator (GEM5) and real systems (Sunway Taihulight Supercomputer and Xilinx Multi-FPGA based clusters) demonstrate the significant advantages of the suggested approaches compared against the state-of-the-art on variety of workloads.
Advisors/Committee Members: Thomson, John Donald (advisor).
Subjects/Keywords: Heterogeneous systems;
High performance computing;
System software
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Yu, T. (2020). Heterogeneity-aware scheduling and data partitioning for system performance acceleration
. (Thesis). University of St. Andrews. Retrieved from http://hdl.handle.net/10023/19797
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
Chicago Manual of Style (16th Edition):
Yu, Teng. “Heterogeneity-aware scheduling and data partitioning for system performance acceleration
.” 2020. Thesis, University of St. Andrews. Accessed March 04, 2021.
http://hdl.handle.net/10023/19797.
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
MLA Handbook (7th Edition):
Yu, Teng. “Heterogeneity-aware scheduling and data partitioning for system performance acceleration
.” 2020. Web. 04 Mar 2021.
Vancouver:
Yu T. Heterogeneity-aware scheduling and data partitioning for system performance acceleration
. [Internet] [Thesis]. University of St. Andrews; 2020. [cited 2021 Mar 04].
Available from: http://hdl.handle.net/10023/19797.
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
Council of Science Editors:
Yu T. Heterogeneity-aware scheduling and data partitioning for system performance acceleration
. [Thesis]. University of St. Andrews; 2020. Available from: http://hdl.handle.net/10023/19797
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
◁ [1] [2] [3] [4] [5] … [30] ▶
.