You searched for +publisher:"Georgia Tech" +contributor:("Pande, Santosh")
.
Showing records 1 – 30 of
54 total matches.
◁ [1] [2] ▶

Georgia Tech
1.
Gupta, Meghana.
Code generation and adaptive control divergence management for light weight SIMT processors.
Degree: MS, Computer Science, 2016, Georgia Tech
URL: http://hdl.handle.net/1853/55044
► The energy costs of data movement are limiting the performance scaling of future generations of high performance computing architectures targeted to data intensive applications. The…
(more)
▼ The energy costs of data movement are limiting the performance scaling of future generations of high performance computing architectures targeted to data intensive applications. The result has been a resurgence in the interest in processing-in-memory (PIM) architectures. This challenge has spawned the development of a scalable, parametric data parallel architecture referred at the Heterogeneous Architecture Research Prototype (HARP) - a single instruction multiple thread (SIMT) architecture for integration into DRAM systems, particularly 3D memory stacks as a distinct processing layer to exploit the enormous internal memory bandwidth. However, this potential can only be realized with an optimizing compilation environment. This thesis addresses this challenge by i) the construction of an open source compiler for HARP, and ii) integrating optimizations for handling control flow divergence for HARP instances. The HARP compiler is built using the LLVM open source compiler infrastructure. Apart from traditional code generation, the HARP compiler backend handles unique challenges associated with the HARP instruction set. Chief among them is code generation for control divergence management techniques. The HARP architecture and compiler supports i) a hardware reconvergence stack and ii) predication to handle divergent branches. The HARP compiler addresses several challenges associated with generating code for these two control divergence management techniques and implements multiple analyses and transformations for code generation. Both of these techniques have unique advantages and disadvantages depending upon whether the conditional branch is likely to be unanimous or not. Two decision frameworks, guided by static analysis and dynamic profile information are implemented to choose between the control divergence management techniques by analyzing the nature of the conditional branches and utilizing this information during compilation.
Advisors/Committee Members: Yalamanchili, Sudhakar (advisor), Kim, Hyesoon (committee member), Pande, Santosh (committee member).
Subjects/Keywords: Compiler; SIMT; Control divergence
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Gupta, M. (2016). Code generation and adaptive control divergence management for light weight SIMT processors. (Masters Thesis). Georgia Tech. Retrieved from http://hdl.handle.net/1853/55044
Chicago Manual of Style (16th Edition):
Gupta, Meghana. “Code generation and adaptive control divergence management for light weight SIMT processors.” 2016. Masters Thesis, Georgia Tech. Accessed April 13, 2021.
http://hdl.handle.net/1853/55044.
MLA Handbook (7th Edition):
Gupta, Meghana. “Code generation and adaptive control divergence management for light weight SIMT processors.” 2016. Web. 13 Apr 2021.
Vancouver:
Gupta M. Code generation and adaptive control divergence management for light weight SIMT processors. [Internet] [Masters thesis]. Georgia Tech; 2016. [cited 2021 Apr 13].
Available from: http://hdl.handle.net/1853/55044.
Council of Science Editors:
Gupta M. Code generation and adaptive control divergence management for light weight SIMT processors. [Masters Thesis]. Georgia Tech; 2016. Available from: http://hdl.handle.net/1853/55044

Georgia Tech
2.
Na, Taesik.
Energy efficient, secure and noise robust deep learning for the internet of things.
Degree: PhD, Electrical and Computer Engineering, 2018, Georgia Tech
URL: http://hdl.handle.net/1853/60293
► The objective of this research is to design an energy efficient, secure and noise robust deep learning system for the Internet of Things (IoTs). The…
(more)
▼ The objective of this research is to design an energy efficient, secure and noise robust deep learning system for the Internet of Things (IoTs). The research particularly focuses on energy efficient training of deep learning, adversarial machine learning, and noise robust deep learning. To enable energy efficient training of deep learning, the research studies impact of a limited precision training of various types of neural networks like convolutional neural networks (CNNs) and recurrent neural networks (RNNs). For CNNs, the work proposes dynamic precision scaling algorithm, and precision flexible computing unit to accelerate CNNs training. For RNNs, the work studies impact of various hyper-parameters to enable low precision training of RNNs and proposes low precision computing unit with stochastic rounding. To enhance the security of deep learning, the research proposes cascade adversarial machine learning and additional regularization using a unified embedding for image classification and low level (pixel level) similarity learning. Noise robust and resolution-invariant image classification is also achieved by adding this low level similarity learning. Mixture of pre-processing experts model is proposed for noise robust object detection network without sacrificing accuracy for the clean images.
Advisors/Committee Members: Yalamanchili, Sudhakar (committee member), Krishna, Tushar (committee member), Burger, Doug (committee member), Pande, Santosh (committee member).
Subjects/Keywords: Deep learning; Adversarial machine learning; Energy efficient training; Noise robust machine learning; IoT
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Na, T. (2018). Energy efficient, secure and noise robust deep learning for the internet of things. (Doctoral Dissertation). Georgia Tech. Retrieved from http://hdl.handle.net/1853/60293
Chicago Manual of Style (16th Edition):
Na, Taesik. “Energy efficient, secure and noise robust deep learning for the internet of things.” 2018. Doctoral Dissertation, Georgia Tech. Accessed April 13, 2021.
http://hdl.handle.net/1853/60293.
MLA Handbook (7th Edition):
Na, Taesik. “Energy efficient, secure and noise robust deep learning for the internet of things.” 2018. Web. 13 Apr 2021.
Vancouver:
Na T. Energy efficient, secure and noise robust deep learning for the internet of things. [Internet] [Doctoral dissertation]. Georgia Tech; 2018. [cited 2021 Apr 13].
Available from: http://hdl.handle.net/1853/60293.
Council of Science Editors:
Na T. Energy efficient, secure and noise robust deep learning for the internet of things. [Doctoral Dissertation]. Georgia Tech; 2018. Available from: http://hdl.handle.net/1853/60293

Georgia Tech
3.
Kulkarni, Sulekha Raghavendra.
Accelerating program analyses by cross-program training.
Degree: MS, Computer Science, 2016, Georgia Tech
URL: http://hdl.handle.net/1853/56359
► Practical programs share large modules of code. However, many program analyses are ineffective at reusing analysis results for shared code across programs. We present POLYMER,…
(more)
▼ Practical programs share large modules of code. However, many program analyses are ineffective at reusing analysis results for shared code across programs. We present POLYMER, an analysis optimizer to address this problem. POLYMER runs the analysis offline on a corpus of training programs and learns analysis facts over shared code. It prunes the learnt facts to eliminate intermediate computations and then reuses these pruned facts to accelerate the analysis of other programs that share code with the training corpus. We have implemented POLYMER to accelerate analyses specified in Datalog, and apply it to optimize two analyses for Java programs: a call-graph analysis that is flow- and context-insensitive, and a points-to analysis that is flow- and context-sensitive. We evaluate the resulting analyses on ten programs from the DaCapo suite that share the JDK library. POLYMER achieves average speedups of 2.6x for the call-graph analysis and 5.2x for the points-to analysis.
Advisors/Committee Members: Naik, Mayur H. (advisor), Orso, Alessandro (committee member), Pande, Santosh (committee member).
Subjects/Keywords: Program analysis; Optimization; Datalog
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Kulkarni, S. R. (2016). Accelerating program analyses by cross-program training. (Masters Thesis). Georgia Tech. Retrieved from http://hdl.handle.net/1853/56359
Chicago Manual of Style (16th Edition):
Kulkarni, Sulekha Raghavendra. “Accelerating program analyses by cross-program training.” 2016. Masters Thesis, Georgia Tech. Accessed April 13, 2021.
http://hdl.handle.net/1853/56359.
MLA Handbook (7th Edition):
Kulkarni, Sulekha Raghavendra. “Accelerating program analyses by cross-program training.” 2016. Web. 13 Apr 2021.
Vancouver:
Kulkarni SR. Accelerating program analyses by cross-program training. [Internet] [Masters thesis]. Georgia Tech; 2016. [cited 2021 Apr 13].
Available from: http://hdl.handle.net/1853/56359.
Council of Science Editors:
Kulkarni SR. Accelerating program analyses by cross-program training. [Masters Thesis]. Georgia Tech; 2016. Available from: http://hdl.handle.net/1853/56359

Georgia Tech
4.
Sahin, Semih.
Memory optimizations for distributed executors in big data clouds.
Degree: PhD, Computer Science, 2019, Georgia Tech
URL: http://hdl.handle.net/1853/62669
► The amount of data generated from software and hardware sensors continues to grow exponentially as the world become more instrumented and interconnected. Our ability to…
(more)
▼ The amount of data generated from software and hardware sensors continues to grow exponentially as the world become more instrumented and interconnected. Our ability to analyze this huge and growing amount of data is critical. Real-time processing of big data enables us to identify frequent patterns, gain better understanding of happenings around us, and increases the accuracy of our predictions on future activities, events, and trends. Hadoop and Spark have been the dominating distributed computing platforms for big data processing and analytics on a cluster of commodity servers. Distributed executors are widely used as the computation abstractions for providing data parallelism and computation parallelism in large computing clusters. Each executor is typically a multi- threaded Java Virtual Machine (JVM) instance on Spark clusters, and Spark runtime sup- ports memory-intensive parallel computation for iterative machine learning applications by launching multiple executors on every cluster node and enabling explicit caching of inter- mediate data as Resilient Distributed Datasets (RDDs). It is well-known that JVM executors may not be effective in utilizing available memory for improving application runtime performance due to high cost of garbage collection (GC). Such situations may get worse when the dataset contains large number of small size objects, leading to frequent GC overhead. Spark addresses such problems by relying on multi-threaded executors with the support of three fundamental storage modes of RDDs: memory-only RDD, disk-only RDD and memory-disk RDD. When RDD partitions are fully cached into the available DRAM, Spark applications enjoy excellent performance for iterative big data analytics workloads as expected. However, these applications start to experience drastic performance degradation when applications have heterogeneous tasks, highly skewed datasets, or their RDD working sets can no longer fully cached in memory. In these scenarios, we identify three serious performance bottlenecks: (1) As the amount of cached data increases, the application performance suffers from high garbage collection overhead. (2) Depending on the heterogeneity of application, or the non-uniformity in data, the distribution of tasks over executors may differ, leading to different memory utilization on executors. Such temporal imbalance of memory usage can cause out-of-memory error for those executors under memory pressure, even though other executors on the same host or in the same cluster have sufficient unused memory. (3) Depending on the task granularity, partition granularity of data to be cached may be too large as the working set size at runtime, experiencing executor thrashing and out-of-memory error, even though there are plenty of unused memory on Spark nodes in a cluster and the total physical memory of the node or the cluster is not fully utilized. This dissertation research takes a holistic approach to tackle the above problems from three dimensions. First, we analyze JVM heap structure, components of garbage…
Advisors/Committee Members: Liu, Ling (advisor), Pu, Calton (committee member), Eisenhauer, Greg (committee member), Pande, Santosh (committee member), Devecsery, David (committee member), Lofstead, Gerald (committee member).
Subjects/Keywords: Memory management; Cloud computing; Big data processing; Spark
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Sahin, S. (2019). Memory optimizations for distributed executors in big data clouds. (Doctoral Dissertation). Georgia Tech. Retrieved from http://hdl.handle.net/1853/62669
Chicago Manual of Style (16th Edition):
Sahin, Semih. “Memory optimizations for distributed executors in big data clouds.” 2019. Doctoral Dissertation, Georgia Tech. Accessed April 13, 2021.
http://hdl.handle.net/1853/62669.
MLA Handbook (7th Edition):
Sahin, Semih. “Memory optimizations for distributed executors in big data clouds.” 2019. Web. 13 Apr 2021.
Vancouver:
Sahin S. Memory optimizations for distributed executors in big data clouds. [Internet] [Doctoral dissertation]. Georgia Tech; 2019. [cited 2021 Apr 13].
Available from: http://hdl.handle.net/1853/62669.
Council of Science Editors:
Sahin S. Memory optimizations for distributed executors in big data clouds. [Doctoral Dissertation]. Georgia Tech; 2019. Available from: http://hdl.handle.net/1853/62669

Georgia Tech
5.
Park, Sunjae Young.
Bridging the gap for hardware transactional memory.
Degree: PhD, Computer Science, 2018, Georgia Tech
URL: http://hdl.handle.net/1853/62218
► Transactional memory (TM) is a promising new tool for shared memory application development. Unlike mutual exclusion locks, TM allows atomic sections to execute concurrently, optimistically…
(more)
▼ Transactional memory (TM) is a promising new tool for shared memory application development. Unlike mutual exclusion locks, TM allows atomic sections to execute concurrently, optimistically predicting the threads will not conflict. Commercial releases of hardware TM (HTM) brings this functionality to the mainstream. However, the commercial implementations work to provide TM functionality with the minimum amount of hardware changes required, unlike research prototypes that can work from a clean slate. As a result, there are significant gaps in performance of the commercial implementations compared to those proposed by the research community. In this thesis, I propose to several ideas that keep with this mindset, but still close the gap in performance. First, I introduce plea bits that can be used to provide enhanced conflict resolution policies, compared to the basic "requester-wins" policy used in commercial HTM implementations. Second, I propose calling a pre-abort handler instead of doing automatic state rollback when encountering abort-causing conditions. Last, I propose to change how speculative writes are handled within the transaction, allowing for lazy conflict detection. Using these techniques, I show that it is possible to support more sophisticated HTM functionality while keeping the required changes minimal.
Advisors/Committee Members: Prvulovic, Milos (advisor), Kim, Hyesoon (committee member), Pande, Santosh (committee member), Qureshi, Moinuddin (committee member), Hughes, Christopher J. (committee member).
Subjects/Keywords: Transactional memory; Multithread; Multicore
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Park, S. Y. (2018). Bridging the gap for hardware transactional memory. (Doctoral Dissertation). Georgia Tech. Retrieved from http://hdl.handle.net/1853/62218
Chicago Manual of Style (16th Edition):
Park, Sunjae Young. “Bridging the gap for hardware transactional memory.” 2018. Doctoral Dissertation, Georgia Tech. Accessed April 13, 2021.
http://hdl.handle.net/1853/62218.
MLA Handbook (7th Edition):
Park, Sunjae Young. “Bridging the gap for hardware transactional memory.” 2018. Web. 13 Apr 2021.
Vancouver:
Park SY. Bridging the gap for hardware transactional memory. [Internet] [Doctoral dissertation]. Georgia Tech; 2018. [cited 2021 Apr 13].
Available from: http://hdl.handle.net/1853/62218.
Council of Science Editors:
Park SY. Bridging the gap for hardware transactional memory. [Doctoral Dissertation]. Georgia Tech; 2018. Available from: http://hdl.handle.net/1853/62218

Georgia Tech
6.
Kang, Suk Chan.
Optimizing high locality memory references in cache coherent shared memory multi-core processors.
Degree: PhD, Electrical and Computer Engineering, 2019, Georgia Tech
URL: http://hdl.handle.net/1853/62641
► Optimizing memory references has been a primary research area of computer systems ever since the advent of the stored program computers. The objective of this…
(more)
▼ Optimizing memory references has been a primary research area of computer systems ever since the advent of the stored program computers. The objective of this thesis research is to identify and optimize two classes of high locality data memory reference streams in cache coherent shared memory multi-processors. More specifically, this thesis classifies such memory objects into spatial and temporal false shared memory objects. The underlying hypothesis is that the policy of treating all the memory objects as being permanently shared significantly hinders the optimization of high-locality memory objects in modern cache coherent shared memory multi-processor systems: the policy forces the systems to unconditionally prepare to incur shared-memory-related overheads for every memory reference. To verify the hypothesis, this thesis explores two different schemes to minimize the shared memory abstraction overheads associated with memory reference streams of spatial and temporal false shared memory objects, respectively. The schemes implement the exception rules which enable isolating false memory objects from the shared memory domain, in a spatial and temporal manner. However, the exception rules definitely require special consideration in cache coherent shared memory multi-processors, regarding the data consistency, cache coherence, and memory consistency model. Thus, this thesis not only implements the schemes based on such consideration, but also breaks the chain of the widespread faulty assumption of prior academic work. This high-level approach ultimately aims at upgrading scalability of large scale systems, such as multi-socket cache coherent shared memory multi-processors, throughout improving performance and reducing energy/power consumption. This thesis demonstrates the efficacy and efficiency of the schemes in terms of performance improvement and energy/power reduction.
Advisors/Committee Members: Yalamanchili, Sudhakar (advisor), Wills, Linda (committee member), Gavrilovska, Ada (committee member), Krishna, Tushar (committee member), Pande, Santosh (committee member).
Subjects/Keywords: Shared memory system; Cache coherence; Memory consistency; Synchronization
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Kang, S. C. (2019). Optimizing high locality memory references in cache coherent shared memory multi-core processors. (Doctoral Dissertation). Georgia Tech. Retrieved from http://hdl.handle.net/1853/62641
Chicago Manual of Style (16th Edition):
Kang, Suk Chan. “Optimizing high locality memory references in cache coherent shared memory multi-core processors.” 2019. Doctoral Dissertation, Georgia Tech. Accessed April 13, 2021.
http://hdl.handle.net/1853/62641.
MLA Handbook (7th Edition):
Kang, Suk Chan. “Optimizing high locality memory references in cache coherent shared memory multi-core processors.” 2019. Web. 13 Apr 2021.
Vancouver:
Kang SC. Optimizing high locality memory references in cache coherent shared memory multi-core processors. [Internet] [Doctoral dissertation]. Georgia Tech; 2019. [cited 2021 Apr 13].
Available from: http://hdl.handle.net/1853/62641.
Council of Science Editors:
Kang SC. Optimizing high locality memory references in cache coherent shared memory multi-core processors. [Doctoral Dissertation]. Georgia Tech; 2019. Available from: http://hdl.handle.net/1853/62641

Georgia Tech
7.
Ravichandran, Kaushik.
Programming frameworks for performance driven speculative parallelization.
Degree: PhD, Computer Science, 2014, Georgia Tech
URL: http://hdl.handle.net/1853/52985
► Effectively utilizing available parallelism is becoming harder and harder as systems evolve to many-core processors with many tens of cores per chip. Automatically extracting parallelism…
(more)
▼ Effectively utilizing available parallelism is becoming harder and harder as systems evolve to many-core processors with many tens of cores per chip. Automatically extracting parallelism has limitations whereas completely redesigning software using traditional parallel constructs is a daunting task that significantly jeopardizes programmer productivity. On the other hand, many studies have shown that a good amount of parallelism indeed exists in sequential software that remains untapped. How to unravel and utilize it successfully remains an open research question.
Speculation fortunately provides a potential answer to this question. Speculation provides a golden bridge for a quick expression of "potential" parallelism in a given program. While speculation at extremely fine granularities has been shown to provide good speed-ups, speculation at larger granularities has only been attempted on a very small scale due to the potentially large overheads that render it useless. The transactional construct used by STMs can be used by programmers to express speculation since it provides atomicity and isolation while writing parallel code. However, it was not designed to deal with the semantics of speculation. This thesis contends that by incorporating the semantics of speculation new solutions can be constructed and speculation can provide a powerful means to the hard problem of efficiently utilizing many-cores with very low programmer efforts.
This thesis takes a multi-faceted view of the problem of speculation through a combination of programming models, compiler analysis, scheduling and runtime systems and tackles the semantic issues that surround speculation such as determining the right degree of speculation to maximize performance, reuse of state in rollbacks, providing probabilistic guidance for minimizing conflicts, deterministic execution for debugging and development, and providing very large scale speculations across distributed nodes.
First, we present F2C2-STM, a high performance flux-based feedback-driven concurrency control technique which automatically selects and adapts the degree of speculation in transactional applications for best performance. Second, we present the Merge framework which is capable of salvaging useful work performed during an incorrect speculation and incorporates it towards the final commit. Third, we present a framework which has the ability to leverage semantics of data structures and algorithmic properties to guide the scheduling of concurrent speculative transactions to minimize conflicts and performance loss. Fourth, we present DeSTM, a deterministic STM designed to aid the development of speculative transactional applications for repeatability without undue performance loss.
These contributions significantly enhance the use of transactional memory as a speculative idiom improving the efficiency of speculative execution as well as simplify the development process.
Finally, we focus on a performance oriented view of speculation, namely choose one of many speculative semantics,…
Advisors/Committee Members: Pande, Santosh (advisor), Vuduc, Richard (committee member), Schwan, Karsten (committee member), Kim, Hyesoon (committee member), Yalamanchili, Sudhakar (committee member).
Subjects/Keywords: Speculation; Transactional memory
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Ravichandran, K. (2014). Programming frameworks for performance driven speculative parallelization. (Doctoral Dissertation). Georgia Tech. Retrieved from http://hdl.handle.net/1853/52985
Chicago Manual of Style (16th Edition):
Ravichandran, Kaushik. “Programming frameworks for performance driven speculative parallelization.” 2014. Doctoral Dissertation, Georgia Tech. Accessed April 13, 2021.
http://hdl.handle.net/1853/52985.
MLA Handbook (7th Edition):
Ravichandran, Kaushik. “Programming frameworks for performance driven speculative parallelization.” 2014. Web. 13 Apr 2021.
Vancouver:
Ravichandran K. Programming frameworks for performance driven speculative parallelization. [Internet] [Doctoral dissertation]. Georgia Tech; 2014. [cited 2021 Apr 13].
Available from: http://hdl.handle.net/1853/52985.
Council of Science Editors:
Ravichandran K. Programming frameworks for performance driven speculative parallelization. [Doctoral Dissertation]. Georgia Tech; 2014. Available from: http://hdl.handle.net/1853/52985

Georgia Tech
8.
Kumar, Tushar.
Characterizing and controlling program behavior using execution-time variance.
Degree: PhD, Electrical and Computer Engineering, 2016, Georgia Tech
URL: http://hdl.handle.net/1853/55000
► Immersive applications, such as computer gaming, computer vision and video codecs, are an important emerging class of applications with QoS requirements that are difficult to…
(more)
▼ Immersive applications, such as computer gaming, computer vision and video codecs, are an important emerging class of applications with QoS requirements that are difficult to characterize and control using traditional methods. This thesis proposes new techniques reliant on execution-time variance to both characterize and control program behavior. The proposed techniques are intended to be broadly applicable to a wide variety of immersive applications and are intended to be easy for programmers to apply without needing to gain specialized expertise. First, we create new QoS controllers that programmers can easily apply to their applications to achieve desired application-specific QoS objectives on any platform or application data-set, provided the programmers verify that their applications satisfy some simple domain requirements specific to immersive applications. The controllers adjust programmer-identified knobs every application frame to effect desired values for programmer-identified QoS metrics. The control techniques are novel in that they do not require the user to provide any kind of application behavior models, and are effective for immersive applications that defy the traditional requirements for feedback controller construction. Second, we create new profiling techniques that provide visibility into the behavior of a large complex application, inferring behavior relationships across application components based on the execution-time variance observed at all levels of granularity of the application functionality. Additionally for immersive applications, some of the most important QoS requirements relate to managing the execution-time variance of key application components, for example, the frame-rate. The profiling techniques not only identify and summarize behavior directly relevant to the QoS aspects related to timing, but also indirectly reveal non-timing related properties of behavior, such as the identification of components that are sensitive to data, or those whose behavior changes based on the call-context.
Advisors/Committee Members: Yalamanchili, Sudhakar (advisor), Pande, Santosh (committee member), Vela, Patricio (committee member), Vuduc, Richard (committee member), Chatterjee, Abhijit (committee member), Ramachandran, Umakishore (committee member).
Subjects/Keywords: Profiling; QoS tuning; Adaptive control; Optimal control; Gain scheduling; LQR; Machine learning; System identification; Parameter estimation; Online training; Multimedia; Video; Gaming; Computer vision; Statistical analysis; Best effort; Probabilistic; Program analysis; Linear fit; Dynamic tuning
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Kumar, T. (2016). Characterizing and controlling program behavior using execution-time variance. (Doctoral Dissertation). Georgia Tech. Retrieved from http://hdl.handle.net/1853/55000
Chicago Manual of Style (16th Edition):
Kumar, Tushar. “Characterizing and controlling program behavior using execution-time variance.” 2016. Doctoral Dissertation, Georgia Tech. Accessed April 13, 2021.
http://hdl.handle.net/1853/55000.
MLA Handbook (7th Edition):
Kumar, Tushar. “Characterizing and controlling program behavior using execution-time variance.” 2016. Web. 13 Apr 2021.
Vancouver:
Kumar T. Characterizing and controlling program behavior using execution-time variance. [Internet] [Doctoral dissertation]. Georgia Tech; 2016. [cited 2021 Apr 13].
Available from: http://hdl.handle.net/1853/55000.
Council of Science Editors:
Kumar T. Characterizing and controlling program behavior using execution-time variance. [Doctoral Dissertation]. Georgia Tech; 2016. Available from: http://hdl.handle.net/1853/55000

Georgia Tech
9.
Lee, Sangho.
Mitigating the performance impact of memory bloat.
Degree: PhD, Computer Science, 2015, Georgia Tech
URL: http://hdl.handle.net/1853/56174
► Memory bloat is loosely defined as an excessive memory usage by an application during its execution. Due to the complexity of efficient memory management that…
(more)
▼ Memory bloat is loosely defined as an excessive memory usage by an application during its execution. Due to the complexity of efficient memory management that developers have to deal with, memory bloat is pervasive and is often neglected in favor of lower application development cost. Unfortunately, when the bloat becomes severe, unwanted performance issues may occur due to its impact on memory management mechanisms and memory layout.
In light of this, this dissertation identifies three pervasive causes of performance issues due to memory bloat and presents feedback-driven solutions for each.
First, in certain languages like C/C++, applications have to manually manage memory in terms of allocation and deallocation. When users forget to free an allocated memory (due to a bug), this leads to a form of memory bloat known as memory leak. The presence of memory leaks causes gradual exhaustion of system memory and eventually leads to serious performance degradation of production systems. To prevent the obvious consequences of memory leaks, we present a memory leak detection framework that relies on object behavior introspection. Our framework models behavioral changes of hypothetically leaked objects in terms of their staleness and coexistence patterns among the allocated objects. With the introspective memory leak detection framework, we observed significant memory bloat savings upon weeding out the discovered memory leaks.
Second, memory bloat prevention mechanisms in multi-threaded memory allocators is another source of performance issues. When the bloat prevention mechanism is frequently triggered unnecessarily as an artifact of intensive memory allocations/deallocations, an application may experience suboptimal performance. To address this, we present a feedback-directed tuning mechanism for TCMalloc, a widely used memory allocator for high performance systems. Our optimization technique tunes the thread cache management mechanism in TCMalloc to the memory allocation behavior of an application and reduces the management cost of the internal data structures in TCMalloc. With the proposed technique integrated into FDO in GCC, we observed up to 10% improvement in application performance.
Third, in some languages like Java, memory is automatically managed through garbage collection. Memory bloat in Java applications occurs due to performance unconscious designs and implementations. When an application uses an excessive amount of memory by creating more objects than are necessary, negative performance impacts such as high garbage collection overhead may arise. To address this issue, we present an object recycle optimization technique for Java applications. Our technique uses a static analysis to figure out safe-deallocation sites of objects and uses a dynamic profiling to select allocation sites for code transformation. With the optimization technique, We observed up to 10% improvement in application performance on the DaCapo 2006 benchmark applications.
In summary, this dissertation comprehensively analyzes and…
Advisors/Committee Members: Pande, Santosh (advisor), Orso, Alessandro (committee member), Kim, Hyesoon (committee member), Schwan, Karsten (committee member), Yalamanchili, Sudhakar (committee member).
Subjects/Keywords: Memory bloat; optimization
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Lee, S. (2015). Mitigating the performance impact of memory bloat. (Doctoral Dissertation). Georgia Tech. Retrieved from http://hdl.handle.net/1853/56174
Chicago Manual of Style (16th Edition):
Lee, Sangho. “Mitigating the performance impact of memory bloat.” 2015. Doctoral Dissertation, Georgia Tech. Accessed April 13, 2021.
http://hdl.handle.net/1853/56174.
MLA Handbook (7th Edition):
Lee, Sangho. “Mitigating the performance impact of memory bloat.” 2015. Web. 13 Apr 2021.
Vancouver:
Lee S. Mitigating the performance impact of memory bloat. [Internet] [Doctoral dissertation]. Georgia Tech; 2015. [cited 2021 Apr 13].
Available from: http://hdl.handle.net/1853/56174.
Council of Science Editors:
Lee S. Mitigating the performance impact of memory bloat. [Doctoral Dissertation]. Georgia Tech; 2015. Available from: http://hdl.handle.net/1853/56174

Georgia Tech
10.
Hassan, Syed Minhaj.
Exploiting on-chip memory concurrency in 3d manycore architectures.
Degree: PhD, Electrical and Computer Engineering, 2016, Georgia Tech
URL: http://hdl.handle.net/1853/56251
► The objective of this thesis is to optimize the uncore of 3D many-core architectures. More specifically, we note that technology trends point to large increases…
(more)
▼ The objective of this thesis is to optimize the uncore of 3D many-core architectures. More specifically, we note that technology trends point to large increases in memory-level concurrency. This in turn affects the design of the multi-core interconnect and organization of the memory hierarchy. The work addresses the need for re-optimization in the presence of this increase in concurrency of the memory system. First, we observe that 2D network latency and inefficient parallelism management in the current 3D designs are the main bottlenecks to fully exploit the potentials of 3D. To that end, we propose an extremely low-latency, low-power, high-radix router and present its various versions for different network typologies and configurations. We also explore optimizations and techniques to reduce the traffic in the network. Second, we propose a reorganization of the memory hierarchy and use simple address space translations to regulate locality, bandwidth and energy trade-offs in highly concurrent 3D memory systems. Third, we analyze the rise in temperature of 3D memories and propose variable-rate per-bank refresh management that exploits variability in temperature to reduce 3D DRAM's refresh power and extend its operating range to higher temperatures.
Advisors/Committee Members: Yalamanchili, Sudhakar (advisor), Mukhopadhyay, Saibal (committee member), Krishna, Tushar (committee member), Kim, Hyesoon (committee member), Pande, Santosh (committee member), Vuduc, Richard (committee member).
Subjects/Keywords: 3D memory systems; Network-on-chip; 3D system thermal analysis; Memory-level parallelism; DRAM
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Hassan, S. M. (2016). Exploiting on-chip memory concurrency in 3d manycore architectures. (Doctoral Dissertation). Georgia Tech. Retrieved from http://hdl.handle.net/1853/56251
Chicago Manual of Style (16th Edition):
Hassan, Syed Minhaj. “Exploiting on-chip memory concurrency in 3d manycore architectures.” 2016. Doctoral Dissertation, Georgia Tech. Accessed April 13, 2021.
http://hdl.handle.net/1853/56251.
MLA Handbook (7th Edition):
Hassan, Syed Minhaj. “Exploiting on-chip memory concurrency in 3d manycore architectures.” 2016. Web. 13 Apr 2021.
Vancouver:
Hassan SM. Exploiting on-chip memory concurrency in 3d manycore architectures. [Internet] [Doctoral dissertation]. Georgia Tech; 2016. [cited 2021 Apr 13].
Available from: http://hdl.handle.net/1853/56251.
Council of Science Editors:
Hassan SM. Exploiting on-chip memory concurrency in 3d manycore architectures. [Doctoral Dissertation]. Georgia Tech; 2016. Available from: http://hdl.handle.net/1853/56251

Georgia Tech
11.
Wang, Jin.
Acceleration and optimization of dynamic parallelism for irregular applications on GPUs.
Degree: PhD, Electrical and Computer Engineering, 2016, Georgia Tech
URL: http://hdl.handle.net/1853/56294
► The objective of this thesis is the development, implementation and optimization of a GPU execution model extension that efficiently supports time-varying, nested, fine-grained dynamic parallelism…
(more)
▼ The objective of this thesis is the development, implementation and optimization of a GPU execution model extension that efficiently supports time-varying, nested, fine-grained dynamic parallelism occurring in the irregular data intensive applications. These dynamically formed pockets of structured parallelism can utilize the recently introduced device-side nested kernel launch capabilities on GPUs. However, the low utilization of GPU resources and the high cost of the device kernel launch make it still difficult to harness dynamic parallelism on GPUs. This thesis then presents an extension to the common Bulk Synchronous Parallel (BSP) GPU execution model – Dynamic Thread Block Launch (DTBL), which provides the capability of spawning light-weight thread blocks from GPU threads on demand and coalescing them to existing native executing kernels. The finer granularity of a thread block provides effective and efficient control of smaller-scale, dynamically occurring nested pockets of structured parallelism during the computation. Evaluations of DTBL show an average of 1.21x speedup over the baseline implementations. The thesis proposes two classes of optimizations of this model. The first is a thread block scheduling strategy that exploits spatial and temporal reference locality between parent kernels and dynamically launched child kernels. The locality-aware thread block scheduler is able to achieve another 27% increase in the overall performance. The second is an energy efficiency optimization which utilizes the SMX occupancy bubbles during the execution of a DTBL application and converts them to SMX idle period where a flexible DVFS technique can be applied to reduce the dynamic and leakage power to achieve better energy efficiency. By presenting the implementations, measurements and key insights, this thesis takes a step in addressing the challenges and issues in emerging irregular applications.
Advisors/Committee Members: Yalamanchili, Sudhakar (advisor), Kim, Hyesoon (committee member), Vuduc, Richard (committee member), Krishna, Tushar (committee member), Pande, Santosh (committee member).
Subjects/Keywords: General-purpose GPU; Dynamic parallelism; Irregular applications
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Wang, J. (2016). Acceleration and optimization of dynamic parallelism for irregular applications on GPUs. (Doctoral Dissertation). Georgia Tech. Retrieved from http://hdl.handle.net/1853/56294
Chicago Manual of Style (16th Edition):
Wang, Jin. “Acceleration and optimization of dynamic parallelism for irregular applications on GPUs.” 2016. Doctoral Dissertation, Georgia Tech. Accessed April 13, 2021.
http://hdl.handle.net/1853/56294.
MLA Handbook (7th Edition):
Wang, Jin. “Acceleration and optimization of dynamic parallelism for irregular applications on GPUs.” 2016. Web. 13 Apr 2021.
Vancouver:
Wang J. Acceleration and optimization of dynamic parallelism for irregular applications on GPUs. [Internet] [Doctoral dissertation]. Georgia Tech; 2016. [cited 2021 Apr 13].
Available from: http://hdl.handle.net/1853/56294.
Council of Science Editors:
Wang J. Acceleration and optimization of dynamic parallelism for irregular applications on GPUs. [Doctoral Dissertation]. Georgia Tech; 2016. Available from: http://hdl.handle.net/1853/56294

Georgia Tech
12.
Mandal, Ankush.
Enabling parallelism and optimizations in data mining algorithms for power-law data.
Degree: PhD, Computer Science, 2020, Georgia Tech
URL: http://hdl.handle.net/1853/63692
► Today's data mining tasks aim to extract meaningful information from a large amount of data in a reasonable time mainly via means of – a)…
(more)
▼ Today's data mining tasks aim to extract meaningful information from a large amount of data in a reasonable time mainly via means of – a) algorithmic advances, such as fast approximate algorithms and efficient learning algorithms, and b) architectural advances, such as machines with massive compute capacity involving distributed multi-core processors and high throughput accelerators. For current and future generation processors, parallel algorithms are critical for fully utilizing computing resources. Furthermore, exploiting data properties for performance gain becomes crucial for data mining applications. In this work, we focus our attention on power-law behavior – – a common property found in a large class of data, such as text data, internet traffic, and click-stream data. Specifically, we address the following questions in the context of power-law data: How well do the critical data mining algorithms of current interest fit with today's parallel architectures? Which algorithmic and mapping opportunities can be leveraged to further improve performance?, and What are the relative challenges and gains for such approaches? Specifically, we first investigate the suitability of the "frequency estimation" problem for GPU-scale parallelism. Sketching algorithms are a popular choice for this task due to their desirable trade-off between estimation accuracy and space-time efficiency. However, most of the past work on sketch-based frequency estimation focused on CPU implementations. In our work, we propose a novel approach for sketches, which exploits the natural skewness in the power-law data to efficiently utilize the massive amounts of parallelism in modern GPUs. Next, we explore the problem of "identifying top-K frequent elements" for distributed data streams on modern distributed settings with both multi-core and multi-node CPU parallelism. Sketch-based approaches, such as Count-Min Sketch (CMS) with top-K heap, have an excellent update time but lacks the important property of reducibility, which is needed for exploiting data parallelism. On the other end, the popular Frequent Algorithm (FA) leads to reducible summaries, but its update costs are high. Our approach Topkapi, gives the best of both worlds, i.e., it is reducible like FA and has an efficient update time similar to CMS. For power-law data, Topkapi possesses strong theoretical guarantees and leads to significant performance gains, relative to past work. Finally, we study Word2Vec, a popular word embedding method widely used in Machine learning and Natural Language Processing applications, such as machine translation, sentiment analysis, and query answering. This time, we target Single Instruction Multiple Data (SIMD) parallelism. With the increasing vector lengths in commodity CPUs, such as AVX-512 with a vector length of 512 bits, efficient vector processing unit utilization becomes a major performance game-changer. By employing a static multi-version code generation strategy coupled with an algorithmic approximation based on the power-law frequency…
Advisors/Committee Members: Sarkar, Vivek (advisor), Shrivastava, Anshumali (advisor), Kim, Hyesoon (committee member), Pande, Santosh (committee member), Vuduc, Richard (committee member).
Subjects/Keywords: Data mining; Performance optimization; Parallel approximate algorithms; Power-law data; Sketches; Word embedding; Multi-core; GPU
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Mandal, A. (2020). Enabling parallelism and optimizations in data mining algorithms for power-law data. (Doctoral Dissertation). Georgia Tech. Retrieved from http://hdl.handle.net/1853/63692
Chicago Manual of Style (16th Edition):
Mandal, Ankush. “Enabling parallelism and optimizations in data mining algorithms for power-law data.” 2020. Doctoral Dissertation, Georgia Tech. Accessed April 13, 2021.
http://hdl.handle.net/1853/63692.
MLA Handbook (7th Edition):
Mandal, Ankush. “Enabling parallelism and optimizations in data mining algorithms for power-law data.” 2020. Web. 13 Apr 2021.
Vancouver:
Mandal A. Enabling parallelism and optimizations in data mining algorithms for power-law data. [Internet] [Doctoral dissertation]. Georgia Tech; 2020. [cited 2021 Apr 13].
Available from: http://hdl.handle.net/1853/63692.
Council of Science Editors:
Mandal A. Enabling parallelism and optimizations in data mining algorithms for power-law data. [Doctoral Dissertation]. Georgia Tech; 2020. Available from: http://hdl.handle.net/1853/63692

Georgia Tech
13.
Mururu, Girish.
Compiler Guided Scheduling : A Cross-Stack Approach For Performance Elicitation.
Degree: PhD, Computer Science, 2020, Georgia Tech
URL: http://hdl.handle.net/1853/64096
► Modern software executes on multi-core systems that share resources like several levels of memory hierarchy (caches, main memory, secondary storage), I/O devices, and network interfaces.…
(more)
▼ Modern software executes on multi-core systems that share resources like several levels of memory hierarchy (caches, main memory, secondary storage), I/O devices, and network interfaces. In such a co-execution environment, the performance of modern software is critically affected because of resource conflicts arising from sharing of these resources. The resource requirements vary not only across the processes but also during the execution of a process. Current resource management techniques involving OS schedulers have evolved from and mainly rely on the principles of fairness (achieved through time-multiplexing) and load-balancing and are oblivious to the dynamic resource requirements of individual processes. On the other hand, compiler research has traditionally evolved around optimizing single and multi-threaded programs limited to one process. However, compilers can analyze the process resource requirements. This thesis contends that a significant performance enhancement can be achieved through the compiler guidance of schedulers in terms of dynamic program characteristics and resource needs.
Towards compiler guided scheduling, we first look at the problem of process migration. For load-balancing purposes, OS schedulers such as CFS can migrate threads when they are in the middle of an intense memory reuse region thus destroying warmed up caches, TLBs. To solve this problem while providing enough flexibility for load-balancing, we propose PinIt, which first determines the regions of a program in which the process should be pinned onto a core so that adverse migrations causing excessive cache and TLB misses are avoided. The thesis proposes new measures such as unique memory reuse and memory reuse density, that capture the performance penalties incurred due to migration. Such regions with high penalties are encapsulated by the compiler with pin/unpin calls that prevent migrations. In an overloaded environment, compared to priority-cfs, PinIt speeds up high-priority applications in mediabench workloads by 1.16x and 2.12x and in computer vision-based workloads by 1.35x and 1.23x on 8 cores and 16 cores, respectively, with almost same or better throughput for low-priority applications.
The problem of co-scheduling and co-location of processes that share resources must be solved for efficiency in a co-execution environment. Towards this, several approaches proposed in the literature rely on static profile data or dynamic performance counter based information, which inherently cannot be used in an anticipatory (proactive) manner leading to suboptimal scheduling. This thesis proposes Beacons, a generic framework that instruments the programs with generated models or equations of specific characteristics of the program and provides a runtime counterpart that delivers the dynamically generated information to the scheduler. We develop a novel timing analysis for the duration of the loop that is on average 84% accurate on Polybench and Rodinia benchmarks and embed that along with memory footprint, and locality…
Advisors/Committee Members: Pande, Santosh (advisor), Gavrilovska, Ada (committee member), Ramachandran, Umakishore (committee member), Sarkar, Vivek (committee member), Krishna, Tushar (committee member).
Subjects/Keywords: Compilers; Scheduling; Co-location
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Mururu, G. (2020). Compiler Guided Scheduling : A Cross-Stack Approach For Performance Elicitation. (Doctoral Dissertation). Georgia Tech. Retrieved from http://hdl.handle.net/1853/64096
Chicago Manual of Style (16th Edition):
Mururu, Girish. “Compiler Guided Scheduling : A Cross-Stack Approach For Performance Elicitation.” 2020. Doctoral Dissertation, Georgia Tech. Accessed April 13, 2021.
http://hdl.handle.net/1853/64096.
MLA Handbook (7th Edition):
Mururu, Girish. “Compiler Guided Scheduling : A Cross-Stack Approach For Performance Elicitation.” 2020. Web. 13 Apr 2021.
Vancouver:
Mururu G. Compiler Guided Scheduling : A Cross-Stack Approach For Performance Elicitation. [Internet] [Doctoral dissertation]. Georgia Tech; 2020. [cited 2021 Apr 13].
Available from: http://hdl.handle.net/1853/64096.
Council of Science Editors:
Mururu G. Compiler Guided Scheduling : A Cross-Stack Approach For Performance Elicitation. [Doctoral Dissertation]. Georgia Tech; 2020. Available from: http://hdl.handle.net/1853/64096

Georgia Tech
14.
Chatarasi, Prasanth.
ADVANCING COMPILER OPTIMIZATIONS FOR GENERAL-PURPOSE & DOMAIN-SPECIFIC PARALLEL ARCHITECTURES.
Degree: PhD, Computer Science, 2020, Georgia Tech
URL: http://hdl.handle.net/1853/64099
► Computer hardware is undergoing a major disruption as we approach the end of Moore’s law, in the form of new advancements to general-purpose and domain-specific…
(more)
▼ Computer hardware is undergoing a major disruption as we approach the end of Moore’s law, in the form of new advancements to general-purpose and domain-specific parallel architectures. Contemporaneously, the demand for higher performance is broadening across multiple application domains ranging from scientific computing applications to deep learning and graph analytics. These trends raise a plethora of challenges to the de-facto approach to achieving higher performance, namely application development using high-performance libraries. Some of the challenges include porting/adapting to multiple parallel architectures, supporting rapidly advancing domains, and also inhibiting optimizations across library calls. Hence, there is a renewed focus on advancing optimizing compilers from industry and academia to address the above trends, but doing so requires enabling compilers to work effectively on a wide range of applications and also to exploit current and future parallel architectures better. As summarized below, this thesis focuses on compiler advancements for current and future hardware trends.
First, we observe that software with explicit parallelism for general-purpose multi-core CPUs and GPUs is on the rise, but the foundation of current compiler frameworks is based on optimizing sequential code. Our approach uses explicit parallelism specified by the programmer as logical parallelism to refine the conservative dependence analysis inherent in compilers (arising from the presence of program constructs such as pointer aliasing, unknown function calls, non-affine subscript expressions, recursion, and unstructured control flow). This approach makes it possible to combine user-specified parallelism and compiler-generated parallelism in a new unified polyhedral compilation framework (PoPP).
Second, despite the fact that compiler technologies for automatic vectorization for general-purpose vector processing (SIMD) units have been under development for over four decades, there are still considerable gaps in the capabilities of modern compilers to perform automatic vectorization. One such gap can be found in the handling of loops with dependence cycles that involve memory-based anti (write-after-read) and output (write-after-write) dependences. A significant limitation in past work is the lack of a unified formulation that synergistically integrates multiple storage transformations to break the cycles and further unify the formulation with loop transformations to enable vectorization. To address this limitation, we propose the PolySIMD approach.
Third, the efficiency of domain-specific spatial accelerators for Deep Learning (DL) solutions depends heavily on the compiler's ability to generate optimized mappings or code for various DL operators (building blocks of DL models, e.g., CONV2D, GEMM) on the accelerator's compute and memory resources. However, the rapid emergence of new operators and new accelerators pose two key challenges/requirements to the existing compilers: 1) Ability to perform fine-grained reasoning of…
Advisors/Committee Members: Sarkar, Vivek (advisor), Shirako, Jun (advisor), Pande, Santosh (committee member), Krishna, Tushar (committee member), Vuduc, Richard (committee member).
Subjects/Keywords: Compiler Optimizations; General-Purpose Architectures; Domain-Specific Architectures; Deep Learning; Graph Analytics; Accelerators; Polyhedral Model
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Chatarasi, P. (2020). ADVANCING COMPILER OPTIMIZATIONS FOR GENERAL-PURPOSE & DOMAIN-SPECIFIC PARALLEL ARCHITECTURES. (Doctoral Dissertation). Georgia Tech. Retrieved from http://hdl.handle.net/1853/64099
Chicago Manual of Style (16th Edition):
Chatarasi, Prasanth. “ADVANCING COMPILER OPTIMIZATIONS FOR GENERAL-PURPOSE & DOMAIN-SPECIFIC PARALLEL ARCHITECTURES.” 2020. Doctoral Dissertation, Georgia Tech. Accessed April 13, 2021.
http://hdl.handle.net/1853/64099.
MLA Handbook (7th Edition):
Chatarasi, Prasanth. “ADVANCING COMPILER OPTIMIZATIONS FOR GENERAL-PURPOSE & DOMAIN-SPECIFIC PARALLEL ARCHITECTURES.” 2020. Web. 13 Apr 2021.
Vancouver:
Chatarasi P. ADVANCING COMPILER OPTIMIZATIONS FOR GENERAL-PURPOSE & DOMAIN-SPECIFIC PARALLEL ARCHITECTURES. [Internet] [Doctoral dissertation]. Georgia Tech; 2020. [cited 2021 Apr 13].
Available from: http://hdl.handle.net/1853/64099.
Council of Science Editors:
Chatarasi P. ADVANCING COMPILER OPTIMIZATIONS FOR GENERAL-PURPOSE & DOMAIN-SPECIFIC PARALLEL ARCHITECTURES. [Doctoral Dissertation]. Georgia Tech; 2020. Available from: http://hdl.handle.net/1853/64099

Georgia Tech
15.
Chen, Chao.
Compiler-Assisted Resilience Framework for Recovery from Transient Faults.
Degree: PhD, Computer Science, 2020, Georgia Tech
URL: http://hdl.handle.net/1853/64214
► Due to system scaling trends toward smaller transistor size, higher circuit density and the use of near-threshold voltage (NTV) techniques, transient hardware faults introduced by…
(more)
▼ Due to system scaling trends toward smaller transistor size, higher circuit density and the use of near-threshold voltage (NTV) techniques, transient hardware faults introduced by external noises, e.g., heat fluxes and particle strikes, have become a growing concern for current and upcoming extreme-scale high-performance-computing (HPC) systems. Applications running on these systems are projected to experience transient errors more frequently than ever before, which will either lead them to generate incorrect outputs without warning users or cause them to crash. Therefore, efficient resilience techniques against transient hardware faults are required for modern HPC applications.
This dissertation is concerned with the design, implementation, and evaluation of a light-weight resilience framework for large-scale scientific applications to mitigate impacts of transient hardware faults. In particular, it consists of 3 novel techniques: 1) LADR, a light-weight anomaly-based approach to protect scientific applications against transient-fault-induced silent data corruptions (SDCs); 2) CARE, a low-cost compiler-assisted technique to repair the crashed process on-the-fly when a crash-causing transient error is detected, such that applications can continue their executions instead of being simply terminated and restarted; and 3) IterPro, which targets the problem of recovery from corruptions to the induction variables by exploiting side-effects of modern compiler optimization techniques.
To limit the runtime overheads during the normal executions of applications, these approaches exploit properties of scientific applications via compiler techniques. Due to the design strategy of these approaches, they only incur negligible (<3%) or even zero runtime overheads during the normal execution of applications, but still achieve a high-level fault coverage.
Advisors/Committee Members: Pande, Santosh (advisor), Eisenhauer, Greg (advisor), Liu, Ling (committee member), Sarkar, Vivek (committee member), Vuduc, Richard (committee member), Cappello, Franck (committee member).
Subjects/Keywords: Resilience; Compiler; HPC; High Performance Computing; Fault Tolerance; SDC; Silent Data Corruption; Transient Hardware Fault; Transient Fault; Soft Failure; Crash; Failure
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Chen, C. (2020). Compiler-Assisted Resilience Framework for Recovery from Transient Faults. (Doctoral Dissertation). Georgia Tech. Retrieved from http://hdl.handle.net/1853/64214
Chicago Manual of Style (16th Edition):
Chen, Chao. “Compiler-Assisted Resilience Framework for Recovery from Transient Faults.” 2020. Doctoral Dissertation, Georgia Tech. Accessed April 13, 2021.
http://hdl.handle.net/1853/64214.
MLA Handbook (7th Edition):
Chen, Chao. “Compiler-Assisted Resilience Framework for Recovery from Transient Faults.” 2020. Web. 13 Apr 2021.
Vancouver:
Chen C. Compiler-Assisted Resilience Framework for Recovery from Transient Faults. [Internet] [Doctoral dissertation]. Georgia Tech; 2020. [cited 2021 Apr 13].
Available from: http://hdl.handle.net/1853/64214.
Council of Science Editors:
Chen C. Compiler-Assisted Resilience Framework for Recovery from Transient Faults. [Doctoral Dissertation]. Georgia Tech; 2020. Available from: http://hdl.handle.net/1853/64214
16.
Marquez, Nicholas Alexander.
OOCFA2: a PDA-based higher-order flow analysis for object-oriented programs.
Degree: MS, Computer Science, 2013, Georgia Tech
URL: http://hdl.handle.net/1853/47556
► The application of higher-order PDA-based flow analyses to object-oriented languages enables comprehensive and precise characterization of program behavior, while retaining practicality with efficiency. We implement…
(more)
▼ The application of higher-order PDA-based flow analyses to object-oriented languages enables comprehensive and precise characterization of program behavior, while retaining practicality with efficiency.
We implement one such flow analysis which we've named OOCFA2.
While over the years many advancements in flow analysis have been made, they have almost exclusively been with respect to functional languages, often modeled with the calculus.
Object-oriented semantics – while also able to be modeled in a functional setting – provide certain structural guarantees and common idioms which we believe are valuable to reason over in a first-class manner.
By tailoring modern, advanced flow analyses to object-oriented semantics, we believe it is possible to achieve greater precision and efficiency than could be had using a functional modeling.
This, in turn, reflects upon the possible classes of higher-level analyses using the underlying flow analysis: the more powerful, efficient, and flexible the flow analysis, the more classes of higher-level analyses – e.g., security analyses – can be practically expressed.
The growing trend is that smartphone and mobile-device (e.g., tablet) users are integrating these devices into their lives, in more frequent and more personal ways.
Accordingly, the primary application and proof-of-concept for this work is the analysis of the Android operating system's permissions-based security system vis – vis potentially malicious applications.
It is implemented atop OOCFA2.
The use of a such a powerful higher-order flow analysis allows one to apply its knowledge to create a wide variety of powerful and practical security-analysis "front-ends" – not only the permissions-checking analysis in this work, but also, e.g., information-flow analyses.
OOCFA2 is the first PDA-based higher-order flow analysis in an object-oriented setting.
We empirically evaluate its accuracy and performance to prove its practical viability.
We also evaluate the proof-of-concept security analysis' accuracy as directly related to OOCFA2; this shows promising results for the potential of building security-oriented "front-ends" atop OOCFA2.
Advisors/Committee Members: Pande, Santosh (Committee Chair), Shivers, Olin (Committee Co-Chair), Isbell, Charles (Committee Member).
Subjects/Keywords: CFA2; kCFA; Dalvik; Java; JVM; Securty analysis; Static analysis; Object-oriented programming (Computer science); Object-oriented programming languages; Operating systems (Computers); Data protection
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Marquez, N. A. (2013). OOCFA2: a PDA-based higher-order flow analysis for object-oriented programs. (Masters Thesis). Georgia Tech. Retrieved from http://hdl.handle.net/1853/47556
Chicago Manual of Style (16th Edition):
Marquez, Nicholas Alexander. “OOCFA2: a PDA-based higher-order flow analysis for object-oriented programs.” 2013. Masters Thesis, Georgia Tech. Accessed April 13, 2021.
http://hdl.handle.net/1853/47556.
MLA Handbook (7th Edition):
Marquez, Nicholas Alexander. “OOCFA2: a PDA-based higher-order flow analysis for object-oriented programs.” 2013. Web. 13 Apr 2021.
Vancouver:
Marquez NA. OOCFA2: a PDA-based higher-order flow analysis for object-oriented programs. [Internet] [Masters thesis]. Georgia Tech; 2013. [cited 2021 Apr 13].
Available from: http://hdl.handle.net/1853/47556.
Council of Science Editors:
Marquez NA. OOCFA2: a PDA-based higher-order flow analysis for object-oriented programs. [Masters Thesis]. Georgia Tech; 2013. Available from: http://hdl.handle.net/1853/47556
17.
Bhagwat, Ashwini.
Methodologies and tools for computation offloading on heterogeneous multicores.
Degree: MS, Computing, 2009, Georgia Tech
URL: http://hdl.handle.net/1853/29688
► Frequency scaling in traditional computing systems has hit the power wall and multicore computing is here to stay. Unlike homogeneous multicores which have uniform architecture…
(more)
▼ Frequency scaling in traditional computing systems has hit the power wall and multicore computing is here to stay. Unlike homogeneous multicores which have uniform architecture and instruction set across cores, heterogenous multicores have differentially capable cores to provide optimal performance for specialized functionality. However, this heterogeneity also translates into difficult programming models, and extracting its potential is not trivial. The Cell Broadband Engine by the Sony Toshiba IBM(STI) consortium was amongst the first heterogenous multicore systems with a single Power Processing Unit(PPU) and 8 Synergistic Processor Units (SPUs).
We address the issue of porting an existing sequential C/C++ codebase on to the Cell BE through compiler driven program analysis and profiling. Until parallel programming models evolve, the "interim" solution to performance involves speeding up legacy code by offloading computationally intense parts of a sequential thread to the co-processor; thus using it as an accelerator. Unique architectural characteristics of an accelerator makes this problem quite challenging. On the Cell, these characteristics include limited local store of the SPU, high latency of data transfer between PPU and SPU, lack of branch prediction unit, limited SIMDizability, expensive scalar code etc. In particular, the designers of the Cell have opted for software controlled memory on its SPUs to reduce power consumption and to give programmers more control over the predictability of latency. The lack of a hardware cache on the SPU can create performance bottlenecks because any data that needs to be brought in to the SPU must be brought in using a DMA call. The need for supporting a software controlled cache is thus evident for irregular memory accesses on the SPU. For such a cache to result in improved performance, the amount of time spent in book-keeping and tracking at run-time should be minimal. Traditional algorithms like LRU, when implemented in software incur overheads on every cache hit because appropriate data structures need to be updated. Such overheads are on off critical path for traditional hardware cache but on the critical path for a software controlled cache. Thus there is a need for better management of "data movement" for the code that is offloaded on to the SPU.
This thesis addresses the "code partitioning" problem as well as the "data movement" problem. We present
GLIMPSES - a compiler driven profiling tool that analyzes existing C/C++ code for its suitability for porting to the Cell, and presents its results in an interactive visualizer.
Software Controlled Cache - an improved eviction policy that exploits information gleaned from memory traces generated through offline profiling. The trace is analyzed to provide guidance for a run-time state machine within the cache manager; resulting in reduced run-time overhead and better performance. The design tradeoffs and several pros and cons of this approach are brought forth as well. It is shown that with just about the right…
Advisors/Committee Members: Pande, Santosh (Committee Chair), Clark, Nate (Committee Member), Yalamanchili, Sudhakar (Committee Member).
Subjects/Keywords: Parallel Computing; Coprocessors; Coding theory
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Bhagwat, A. (2009). Methodologies and tools for computation offloading on heterogeneous multicores. (Masters Thesis). Georgia Tech. Retrieved from http://hdl.handle.net/1853/29688
Chicago Manual of Style (16th Edition):
Bhagwat, Ashwini. “Methodologies and tools for computation offloading on heterogeneous multicores.” 2009. Masters Thesis, Georgia Tech. Accessed April 13, 2021.
http://hdl.handle.net/1853/29688.
MLA Handbook (7th Edition):
Bhagwat, Ashwini. “Methodologies and tools for computation offloading on heterogeneous multicores.” 2009. Web. 13 Apr 2021.
Vancouver:
Bhagwat A. Methodologies and tools for computation offloading on heterogeneous multicores. [Internet] [Masters thesis]. Georgia Tech; 2009. [cited 2021 Apr 13].
Available from: http://hdl.handle.net/1853/29688.
Council of Science Editors:
Bhagwat A. Methodologies and tools for computation offloading on heterogeneous multicores. [Masters Thesis]. Georgia Tech; 2009. Available from: http://hdl.handle.net/1853/29688
18.
Ozarde, Sarang Anil.
Performance understanding and tuning of
iterative computation using profiling techniques.
Degree: MS, Computing, 2010, Georgia Tech
URL: http://hdl.handle.net/1853/34757
► Most applications spend a significant amount of time in the iterative parts of a computation. They typically iterate over the same set of operations with…
(more)
▼ Most applications spend a significant amount of time in the iterative parts of a computation. They typically iterate over the same set of operations with different values. These values either depend on inputs or values calculated in previous iterations. While loops capture some iterative behavior, in many cases such a behavior is spread over whole program sometimes through recursion. Understanding iterative behavior of the
computation can be very useful to fine-tune it. In this thesis, we present a profiling based
framework to understand and improve performance of iterative computation. We capture
the state of iterations in two aspects 1) Algorithmic State 2) Program State. We demonstrate the applicability of our framework for capturing algorithmic state by applying it to the SAT Solvers and program state by applying it to a variety of benchmarks exhibiting completely parallelizable loops. Further, we show that such a
performance characterization can be successfully used to improve the performance of the
underlying application.
Many high performance combinatorial optimization applications involve SAT solving. A variety of SAT solvers have been developed that employ different data structures and different propagation methods for converging on a fixed point for generating a satisfiable solution. The performance debugging and tuning of SAT solvers
to a given domain is an important problem encountered in practice. Unfortunately not much work has been done to quantify the iterative efficiency of SAT solvers. In this work, we develop quantifiable measures for calculating convergence efficiency of SAT solvers. Here, we capture the Algorithmic state of the application by tracking the assignment of variables for each iteration. A compact representation of profile data is developed to track the rate of progress and convergence. The novelty of this approach is
that it is independent of the specific strategies used in individual solvers, yet it gives key
insights into the "progress" and "convergence behavior" of the solver in terms of a specific implementation at hand. An analysis tool is written to interpret the profile data and extract values of the following metrics such as: average convergence rate, efficiency of iteration and variable stabilization. Finally, using this system we produce a study of 4 well known SAT solvers to compare their iterative efficiency using random as well as industrial benchmarks. Using the framework, iterative inefficiencies that lead to slow convergence are identified. We also show how to fine-tune the solvers by adapting the key steps.
We also show that the similar profile data representation can be easily applied to loops, in general, to capture their program state. One of the key attributes of the program state inside loops is their branch behavior. We demonstrate the applicability of the framework by profiling completely parallelizable loops (no cross-iteration dependence) and by storing the branching behavior of each iteration. The branch behavior across a group of iterations is…
Advisors/Committee Members: Pande, Santosh (Committee Chair), Clark, Nate (Committee Member), Yalamanchili, Sudhakar (Committee Member).
Subjects/Keywords: Performance debugging; Performance analysis; Combinatorial optimization; Algorithms; Heuristic algorithms
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Ozarde, S. A. (2010). Performance understanding and tuning of
iterative computation using profiling techniques. (Masters Thesis). Georgia Tech. Retrieved from http://hdl.handle.net/1853/34757
Chicago Manual of Style (16th Edition):
Ozarde, Sarang Anil. “Performance understanding and tuning of
iterative computation using profiling techniques.” 2010. Masters Thesis, Georgia Tech. Accessed April 13, 2021.
http://hdl.handle.net/1853/34757.
MLA Handbook (7th Edition):
Ozarde, Sarang Anil. “Performance understanding and tuning of
iterative computation using profiling techniques.” 2010. Web. 13 Apr 2021.
Vancouver:
Ozarde SA. Performance understanding and tuning of
iterative computation using profiling techniques. [Internet] [Masters thesis]. Georgia Tech; 2010. [cited 2021 Apr 13].
Available from: http://hdl.handle.net/1853/34757.
Council of Science Editors:
Ozarde SA. Performance understanding and tuning of
iterative computation using profiling techniques. [Masters Thesis]. Georgia Tech; 2010. Available from: http://hdl.handle.net/1853/34757
19.
Kerr, Andrew.
A model of dynamic compilation for heterogeneous compute platforms.
Degree: PhD, Electrical and Computer Engineering, 2012, Georgia Tech
URL: http://hdl.handle.net/1853/47719
► Trends in computer engineering place renewed emphasis on increasing parallelism and heterogeneity. The rise of parallelism adds an additional dimension to the challenge of portability,…
(more)
▼ Trends in computer engineering place renewed emphasis on increasing parallelism and heterogeneity.
The rise of parallelism adds an additional dimension to the challenge of portability, as
different processors support different notions of parallelism, whether vector parallelism executing
in a few threads on multicore CPUs or large-scale thread hierarchies on GPUs. Thus, software
experiences obstacles to portability and efficient execution beyond differences in instruction sets;
rather, the underlying execution models of radically different architectures may not be compatible.
Dynamic compilation applied to data-parallel heterogeneous architectures presents an abstraction
layer decoupling program representations from optimized binaries, thus enabling portability without
encumbering performance. This dissertation proposes several techniques that extend dynamic
compilation to data-parallel execution models. These contributions include:
- characterization of data-parallel workloads
- machine-independent application metrics
- framework for performance modeling and prediction
- execution model translation for vector processors
- region-based compilation and scheduling
We evaluate these claims via the development of a novel dynamic compilation framework,
GPU Ocelot, with which we execute real-world workloads from GPU computing. This enables
the execution of GPU computing workloads to run efficiently on multicore CPUs, GPUs, and a
functional simulator. We show data-parallel workloads exhibit performance scaling, take advantage
of vector instruction set extensions, and effectively exploit data locality via scheduling which
attempts to maximize control locality.
Advisors/Committee Members: Yalamanchili, Sudha (Committee Chair), Lanterman, Aaron (Committee Member), Pande, Santosh (Committee Member), Richards, Mark (Committee Member), Shamma, Jeff (Committee Member).
Subjects/Keywords: Dynamic compilation; GPU computing; Cuda; Opencl; SIMD; Vector; Multicore; Parallel computing; Parallel computers; Parallel programs (Computer programs); Heterogeneous computing; Parallel processing (Electronic computers); High performance computing
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Kerr, A. (2012). A model of dynamic compilation for heterogeneous compute platforms. (Doctoral Dissertation). Georgia Tech. Retrieved from http://hdl.handle.net/1853/47719
Chicago Manual of Style (16th Edition):
Kerr, Andrew. “A model of dynamic compilation for heterogeneous compute platforms.” 2012. Doctoral Dissertation, Georgia Tech. Accessed April 13, 2021.
http://hdl.handle.net/1853/47719.
MLA Handbook (7th Edition):
Kerr, Andrew. “A model of dynamic compilation for heterogeneous compute platforms.” 2012. Web. 13 Apr 2021.
Vancouver:
Kerr A. A model of dynamic compilation for heterogeneous compute platforms. [Internet] [Doctoral dissertation]. Georgia Tech; 2012. [cited 2021 Apr 13].
Available from: http://hdl.handle.net/1853/47719.
Council of Science Editors:
Kerr A. A model of dynamic compilation for heterogeneous compute platforms. [Doctoral Dissertation]. Georgia Tech; 2012. Available from: http://hdl.handle.net/1853/47719
20.
Hong, Kirak.
A distributed framework for situation awareness on camera networks.
Degree: PhD, Computer Science, 2014, Georgia Tech
URL: http://hdl.handle.net/1853/52263
► With the proliferation of cameras and advanced video analytics, situation awareness applications that automatically generate actionable knowledge from live camera streams has become an important…
(more)
▼ With the proliferation of cameras and advanced video analytics, situation awareness applications that automatically generate actionable knowledge from live camera streams has become an important class of applications in various domains including surveillance, marketing, sports, health care, and traffic monitoring. However, despite the wide range of use cases, developing those applications on large-scale camera networks is extremely challenging because it involves both compute- and data-intensive workloads, has latency-sensitive quality of service requirement, and deals with inherent dynamism (e.g., number of faces detected in a certain area) from the real world. To support developing large-scale situation awareness applications, this dissertation presents a distributed framework that makes two key contributions: 1) it provides a programming model that ensures scalability of applications and 2) it supports low-latency computation and dynamic workload handling through opportunistic event processing and workload distribution over different locations and network hierarchy.
To provide a scalable programming model, two programming abstractions for different levels of application logic are proposed: the first abstraction at the level of real-time target detection and tracking, and the second abstraction for answering spatio-temporal queries at a higher level. The first programming abstraction, Target Container (TC), elevates target as a first-class citizen, allowing domain experts to simply provide handlers for detection, tracking, and comparison of targets. With those handlers, TC runtime system performs priority-aware scheduling to ensure real-time tracking of important targets when resources are not enough to track all targets. The second abstraction, Spatio-temporal Analysis (STA) supports applications to answer queries related to space, time, and occupants using a global state transition table and probabilistic events. To ensure scalability, STA supports bounded communication overhead of state update by providing tuning parameters for the information propagation among distributed workers.
The second part of this work explores two optimization strategies that reduce latency for stream processing and handle dynamic workload. The first strategy, an opportunistic event processing mechanism, performs event processing on predicted locations to provide just-in-time situational information to mobile users. Since location prediction algorithms are inherently inaccurate, the system selects multiple regions using a greedy algorithm to provide highly meaningful information at the given amount of computing resources. The second strategy is to distribute application workload over computing resources that are placed at different locations and various levels of network hierarchy. To support this strategy, the framework provides hierarchical communication primitives and a decentralized resource discovery protocol that allow scalable and highly adaptive load balancing over space and time.
Advisors/Committee Members: Ramachandran, Umakishore (advisor), Pande, Santosh (committee member), Ammar, Mostafa (committee member), Iftode, Liviu (committee member), Jayaraman, Bharat (committee member).
Subjects/Keywords: Distributed framework; Situation awareness; Camera networks; Stream processing
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Hong, K. (2014). A distributed framework for situation awareness on camera networks. (Doctoral Dissertation). Georgia Tech. Retrieved from http://hdl.handle.net/1853/52263
Chicago Manual of Style (16th Edition):
Hong, Kirak. “A distributed framework for situation awareness on camera networks.” 2014. Doctoral Dissertation, Georgia Tech. Accessed April 13, 2021.
http://hdl.handle.net/1853/52263.
MLA Handbook (7th Edition):
Hong, Kirak. “A distributed framework for situation awareness on camera networks.” 2014. Web. 13 Apr 2021.
Vancouver:
Hong K. A distributed framework for situation awareness on camera networks. [Internet] [Doctoral dissertation]. Georgia Tech; 2014. [cited 2021 Apr 13].
Available from: http://hdl.handle.net/1853/52263.
Council of Science Editors:
Hong K. A distributed framework for situation awareness on camera networks. [Doctoral Dissertation]. Georgia Tech; 2014. Available from: http://hdl.handle.net/1853/52263
21.
Cledat, Romain.
Programming models for speculative and optimistic parallelism based on algorithmic properties.
Degree: PhD, Computing, 2011, Georgia Tech
URL: http://hdl.handle.net/1853/42749
► Today's hardware is becoming more and more parallel. While embarrassingly parallel codes, such as high-performance computing ones, can readily take advantage of this increased number…
(more)
▼ Today's hardware is becoming more and more parallel. While embarrassingly parallel codes, such as high-performance computing ones, can readily take advantage of this increased number of cores, most other types of code cannot easily scale using traditional data and/or task parallelism and cores are therefore left idling resulting in lost opportunities to improve performance. The opportunistic computing paradigm, on which this thesis rests, is the idea that computations should dynamically adapt to and exploit the opportunities that arise due to idling resources to enhance their performance or quality.
In this thesis, I propose to utilize algorithmic properties to develop programming models that leverage this idea thereby providing models that increase and improve the parallelism that can be exploited. I exploit three distinct algorithmic properties: i) algorithmic diversity, ii) the semantic content of data-structures, and iii) the variable nature of results in certain applications.
This thesis presents three main contributions: i) the N-way model which leverages algorithmic diversity to speed up hitherto sequential code, ii) an extension to the N-way model which opportunistically improves the quality of computations and iii) a framework allowing the programmer to specify the semantics of data-structures to improve the performance of optimistic parallelism.
Advisors/Committee Members: Pande, Santosh (Committee Chair), Kim, Hyesoon (Committee Member), Ramachandran, Umakishore (Committee Member), Schwan, Karsten (Committee Member), Yalamanchili, Sudhakar (Committee Member).
Subjects/Keywords: Parallel programming; Algorithmic properties; Algorithmic diversity; Variable semantics; Data-structure semantics; N-way model; Algorithms; Parallel computers; Computer science
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Cledat, R. (2011). Programming models for speculative and optimistic parallelism based on algorithmic properties. (Doctoral Dissertation). Georgia Tech. Retrieved from http://hdl.handle.net/1853/42749
Chicago Manual of Style (16th Edition):
Cledat, Romain. “Programming models for speculative and optimistic parallelism based on algorithmic properties.” 2011. Doctoral Dissertation, Georgia Tech. Accessed April 13, 2021.
http://hdl.handle.net/1853/42749.
MLA Handbook (7th Edition):
Cledat, Romain. “Programming models for speculative and optimistic parallelism based on algorithmic properties.” 2011. Web. 13 Apr 2021.
Vancouver:
Cledat R. Programming models for speculative and optimistic parallelism based on algorithmic properties. [Internet] [Doctoral dissertation]. Georgia Tech; 2011. [cited 2021 Apr 13].
Available from: http://hdl.handle.net/1853/42749.
Council of Science Editors:
Cledat R. Programming models for speculative and optimistic parallelism based on algorithmic properties. [Doctoral Dissertation]. Georgia Tech; 2011. Available from: http://hdl.handle.net/1853/42749
22.
Lillethun, David.
ssIoTa: A system software framework for the internet of things.
Degree: PhD, Computer Science, 2015, Georgia Tech
URL: http://hdl.handle.net/1853/53531
► Sensors are widely deployed in our environment, and their number is increasing rapidly. In the near future, billions of devices will all be connected to…
(more)
▼ Sensors are widely deployed in our environment, and their number is increasing rapidly. In the near future, billions of devices will all be connected to each other, creating an Internet of Things. Furthermore, computational intelligence is needed to make applications involving these devices truly exciting. In IoT, however, the vast amounts of data will not be statically prepared for batch processing, but rather continually produced and streamed live to data consumers and intelligent algorithms. We refer to applications that perform live analysis on live data streams, bringing intelligence to IoT, as the Analysis of Things.
However, the Analysis of Things also comes with a new set of challenges.
The data sources are not collected in a single, centralized location, but rather distributed widely across the environment. AoT applications need to be able to access (consume, produce, and share with each other) this data in a way that is natural considering its live streaming nature. The data transport mechanism must also allow easy access to sensors, actuators, and analysis results. Furthermore, analysis applications require computational resources on which to run. We claim that system support for AoT can reduce the complexity of developing and executing such applications.
To address this, we make the following contributions:
- A framework for systems support of Live Streaming Analysis in the Internet of Things, which we refer to as the Analysis of Things (AoT), including a set of requirements for system design
- A system implementation that validates the framework by supporting Analysis of Things applications at a local scale, and a design for a federated system that supports AoT on a wide geographical scale
- An empirical system evaluation that validates the system design and implementation, including simulation experiments across a wide-area distributed system
We present five broad requirements for the Analysis of Things and discuss one set of specific system support features that can satisfy these requirements. We have implemented a system, called subscript{SS}IoTa, that implements these features and supports AoT applications running on local resources. The programming model for the system allows applications to be specified simply as operator graphs, by connecting operator inputs to operator outputs and sensor streams. Operators are code components that run arbitrary continuous analysis algorithms on streaming data. By conforming to a provided interface, operators may be developed that can be composed into operator graphs and executed by the system. The system consists of an Execution Environment, in which a Resource Manager manages the available computational resources and the applications running on them, a Stream Registry, in which available data streams can be registered so that they may be discovered and used by applications, and an Operator Store, which serves as a repository for operator code so that components can be shared and reused. Experimental results for the system implementation validate its…
Advisors/Committee Members: Ramachandran, Umakishore (advisor), Ahamad, Mustaque (committee member), Pande, Santosh (committee member), Schwan, Karsten (committee member), Bonomi, Flavio (committee member).
Subjects/Keywords: Complex event processing; Situation awareness; Cyberphysical systems; Live streaming analysis; Internet of things; Distributed systems; Distributed scheduling; Fog computing
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Lillethun, D. (2015). ssIoTa: A system software framework for the internet of things. (Doctoral Dissertation). Georgia Tech. Retrieved from http://hdl.handle.net/1853/53531
Chicago Manual of Style (16th Edition):
Lillethun, David. “ssIoTa: A system software framework for the internet of things.” 2015. Doctoral Dissertation, Georgia Tech. Accessed April 13, 2021.
http://hdl.handle.net/1853/53531.
MLA Handbook (7th Edition):
Lillethun, David. “ssIoTa: A system software framework for the internet of things.” 2015. Web. 13 Apr 2021.
Vancouver:
Lillethun D. ssIoTa: A system software framework for the internet of things. [Internet] [Doctoral dissertation]. Georgia Tech; 2015. [cited 2021 Apr 13].
Available from: http://hdl.handle.net/1853/53531.
Council of Science Editors:
Lillethun D. ssIoTa: A system software framework for the internet of things. [Doctoral Dissertation]. Georgia Tech; 2015. Available from: http://hdl.handle.net/1853/53531
23.
Wu, Haicheng.
Acceleration and execution of relational queries using general purpose graphics processing unit (GPGPU).
Degree: PhD, Electrical and Computer Engineering, 2015, Georgia Tech
URL: http://hdl.handle.net/1853/54405
► This thesis first maps the relational computation onto Graphics Processing Units (GPU)s by designing a series of tools and then explores the different opportunities of…
(more)
▼ This thesis first maps
the relational computation onto Graphics Processing Units (GPU)s by designing a
series of tools and then
explores the different opportunities of reducing the limitation brought by the
memory hierarchy across the CPU and GPU system.
First, a complete end-to-end compiler and runtime infrastructure, Red Fox, is proposed. The
evaluation on the full set of
industry standard TPC-H queries on a single node GPU
shows on average Red Fox is 11.20x faster compared with a commercial database system on a state
of art CPU machine.
Second, a new compiler technique called kernel fusion is designed to fuse the code bodies of several
relational operators to reduce data movement. Third, a multi-predicate join algorithm is
designed for GPUs which can provide much better performance and be used with
more flexibility compared with kernel fusion.
Fourth, the GPU optimized multi-predicate join is integrated into a
multi-threaded CPU database runtime system that supports out-of-core
data set to solve real world problem.
This thesis presents key insights, lessons learned, measurements from the
implementations, and opportunities for further improvements.
Advisors/Committee Members: Yalamanchili, Sudhakar (advisor), Kim, Hyesoon (committee member), Wills, Linda (committee member), Vuduc, Richard (committee member), Pande, Santosh (committee member).
Subjects/Keywords: Database; GPU
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Wu, H. (2015). Acceleration and execution of relational queries using general purpose graphics processing unit (GPGPU). (Doctoral Dissertation). Georgia Tech. Retrieved from http://hdl.handle.net/1853/54405
Chicago Manual of Style (16th Edition):
Wu, Haicheng. “Acceleration and execution of relational queries using general purpose graphics processing unit (GPGPU).” 2015. Doctoral Dissertation, Georgia Tech. Accessed April 13, 2021.
http://hdl.handle.net/1853/54405.
MLA Handbook (7th Edition):
Wu, Haicheng. “Acceleration and execution of relational queries using general purpose graphics processing unit (GPGPU).” 2015. Web. 13 Apr 2021.
Vancouver:
Wu H. Acceleration and execution of relational queries using general purpose graphics processing unit (GPGPU). [Internet] [Doctoral dissertation]. Georgia Tech; 2015. [cited 2021 Apr 13].
Available from: http://hdl.handle.net/1853/54405.
Council of Science Editors:
Wu H. Acceleration and execution of relational queries using general purpose graphics processing unit (GPGPU). [Doctoral Dissertation]. Georgia Tech; 2015. Available from: http://hdl.handle.net/1853/54405
24.
Railing, Brian Paul.
Collecting and representing parallel programs with high performance instrumentation.
Degree: PhD, Computer Science, 2015, Georgia Tech
URL: http://hdl.handle.net/1853/54431
► Computer architecture has looming challenges with finding program parallelism, process technology limits, and limited power budget. To navigate these challenges, a deeper understanding of parallel…
(more)
▼ Computer architecture has looming challenges with finding program parallelism, process technology limits, and limited power budget. To navigate these challenges, a deeper understanding of parallel programs is required. I will discuss the task graph representation and how it enables programmers and compiler optimizations to understand and exploit dynamic aspects of the program.
I will present Contech, which is a high performance framework for generating dynamic task graphs from arbitrary parallel programs. The Contech framework supports a variety of languages and parallelization libraries, and has been tested on both x86 and ARM. I will demonstrate how this framework encompasses a diversity of program analyses, particularly by modeling a dynamically reconfigurable, heterogeneous multi-core processor.
Advisors/Committee Members: Conte, Thomas M. (advisor), Pande, Santosh (committee member), Vuduc, Richard (committee member), Worthington, Bruce (committee member), Yalamanchili, Sudhakar (committee member).
Subjects/Keywords: Computer architecture; Compilers; Compiler-based instrumentation; Parallel programming; Parallel program analysis; Instrumentation performance; Task graph; Program representation; Heterogeneous computing
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Railing, B. P. (2015). Collecting and representing parallel programs with high performance instrumentation. (Doctoral Dissertation). Georgia Tech. Retrieved from http://hdl.handle.net/1853/54431
Chicago Manual of Style (16th Edition):
Railing, Brian Paul. “Collecting and representing parallel programs with high performance instrumentation.” 2015. Doctoral Dissertation, Georgia Tech. Accessed April 13, 2021.
http://hdl.handle.net/1853/54431.
MLA Handbook (7th Edition):
Railing, Brian Paul. “Collecting and representing parallel programs with high performance instrumentation.” 2015. Web. 13 Apr 2021.
Vancouver:
Railing BP. Collecting and representing parallel programs with high performance instrumentation. [Internet] [Doctoral dissertation]. Georgia Tech; 2015. [cited 2021 Apr 13].
Available from: http://hdl.handle.net/1853/54431.
Council of Science Editors:
Railing BP. Collecting and representing parallel programs with high performance instrumentation. [Doctoral Dissertation]. Georgia Tech; 2015. Available from: http://hdl.handle.net/1853/54431
25.
Mohapatra, Dushmanta.
Coordinated memory management in virtualized environments.
Degree: PhD, Computer Science, 2015, Georgia Tech
URL: http://hdl.handle.net/1853/54454
► Two recent advances are the primary motivating factors for the research in my dissertation. First, virtualization is no longer confined to the powerful server class…
(more)
▼ Two recent advances are the primary motivating factors for the research in my dissertation. First, virtualization is no longer confined to the powerful server class machines. It has already been introduced into smart-phones and will be a part of other high-end embedded systems like automobiles in the near future. Second, more and more resource intensive and latency sensitive applications are being used in devices which are rather resource constrained and introducing virtualization into the software stack just exacerbates the resource allocation issue.
The focus of my research is on memory management in virtualized environments. Existing memory-management mechanisms were designed for server class machines and their implementations are geared towards the applications running primarily on data centers and cloud setups. In these setups, appropriate load balancing and achieving fair division of resources are the goals and over-provisioning may be the norm. Latency involved in resource management mechanisms may not be a big concern. But in case of smart phones and other hand held devices, applications like media streaming, social-networking are prevalent, which are both resource intensive and latency sensitive. Moreover, the bursty nature of their memory requirement results in spikes in memory needs of the virtual machines. As over provisioning is not an option in these domains, fast and effective (memory) resource management mechanisms are necessary.
The overall thesis of my dissertation is: with appropriate design and implementation, it is possible to achieve inter-VM memory management with a latency comparable to the latency involved in intra-VM memory management mechanisms like ‘malloc’. Towards realizing and validating this goal, I have made the following research contributions through my dissertation: (1) I analyzed the memory requirement pattern of prevalent applications, which exhibit bursty behavior and showcased the need for fast memory management mechanisms. (2) I designed and implemented a Coordinated Memory Management mechanism in Xen based virtualized setup, based on the split driver principle (3) I analyzed this mechanism and did a comparative evaluation with parallel memory management mechanisms. (4)I analyzed the extent of interference from the schedulers in the operation of the mechanism and implemented constructs that help in reducing the interference and latency. (5) Based on my analysis, I revised the implementation of the mechanism to one in which Xen hypervisor plays a more significant and active role in the coordination of the mechanism and I did a detailed analysis to showcase the latency improvements due to this design change. (6) In order to validate my hypothesis, I did a comparative analysis of inter-vm and intra-vm memory management mechanisms as final part of my dissertation.
Advisors/Committee Members: Ramachandran, Umakishore (advisor), Ahamad, Mustaque (committee member), Prvulovic, Milos (committee member), Pande, Santosh (committee member), Perumalla, Kalyan (committee member).
Subjects/Keywords: Memory management; Virtualization
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Mohapatra, D. (2015). Coordinated memory management in virtualized environments. (Doctoral Dissertation). Georgia Tech. Retrieved from http://hdl.handle.net/1853/54454
Chicago Manual of Style (16th Edition):
Mohapatra, Dushmanta. “Coordinated memory management in virtualized environments.” 2015. Doctoral Dissertation, Georgia Tech. Accessed April 13, 2021.
http://hdl.handle.net/1853/54454.
MLA Handbook (7th Edition):
Mohapatra, Dushmanta. “Coordinated memory management in virtualized environments.” 2015. Web. 13 Apr 2021.
Vancouver:
Mohapatra D. Coordinated memory management in virtualized environments. [Internet] [Doctoral dissertation]. Georgia Tech; 2015. [cited 2021 Apr 13].
Available from: http://hdl.handle.net/1853/54454.
Council of Science Editors:
Mohapatra D. Coordinated memory management in virtualized environments. [Doctoral Dissertation]. Georgia Tech; 2015. Available from: http://hdl.handle.net/1853/54454
26.
Jung, Changhee.
Effective techniques for understanding and improving data structure usage.
Degree: PhD, Computer Science, 2013, Georgia Tech
URL: http://hdl.handle.net/1853/49101
► Turing Award winner Niklaus Wirth famously noted, `Algorithms + Data Structures = Programs', and it follows that data structures should be carefully considered for effective…
(more)
▼ Turing Award winner Niklaus Wirth famously noted, `Algorithms + Data Structures
= Programs', and it follows that data structures should be carefully considered
for effective application development. In fact, data structures are the main
focus of program understanding, performance engineering, bug detection, and
security enhancement, etc.
Our research is aimed at providing effective techniques for analyzing and
improving data structure usage in fundamentally new approaches: First, detecting
data structures; identifying what data structures are used within an application
is a critical step toward application understanding and performance engineering.
Second, selecting efficient data structures; analyzing data structures' behavior
can recognize improper use of data structures and suggest alternative data
structures better suited for the current situation where the application runs.
Third, detecting memory leaks for data structures; tracking data accesses with
little overhead and their careful analysis can enable practical and accurate
memory leak detection.
Finally, offloading time-consuming data structure operations; By leveraging a
dedicated helper thread that executes the operations on the behalf of the
application thread, we can improve the overall performance of the application.
Advisors/Committee Members: Pande, Santosh (advisor), Kim, Hyesoon (committee member), Yalamanchili, Sudhakar (committee member), Clark, Nathan (committee member), Rus, Silvius (committee member).
Subjects/Keywords: Data structure identification; Memory graphs; Interface functions; Data structure selection; Application generator; Training framework; Performance counters; Data structures (Computer science)
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Jung, C. (2013). Effective techniques for understanding and improving data structure usage. (Doctoral Dissertation). Georgia Tech. Retrieved from http://hdl.handle.net/1853/49101
Chicago Manual of Style (16th Edition):
Jung, Changhee. “Effective techniques for understanding and improving data structure usage.” 2013. Doctoral Dissertation, Georgia Tech. Accessed April 13, 2021.
http://hdl.handle.net/1853/49101.
MLA Handbook (7th Edition):
Jung, Changhee. “Effective techniques for understanding and improving data structure usage.” 2013. Web. 13 Apr 2021.
Vancouver:
Jung C. Effective techniques for understanding and improving data structure usage. [Internet] [Doctoral dissertation]. Georgia Tech; 2013. [cited 2021 Apr 13].
Available from: http://hdl.handle.net/1853/49101.
Council of Science Editors:
Jung C. Effective techniques for understanding and improving data structure usage. [Doctoral Dissertation]. Georgia Tech; 2013. Available from: http://hdl.handle.net/1853/49101
27.
Hou, Cong.
Automated synthesis for program inversion.
Degree: PhD, Computer Science, 2013, Georgia Tech
URL: http://hdl.handle.net/1853/49037
► We consider the problem of synthesizing program inverses for imperative languages. Our primary motivation comes from optimistic parallel discrete event simulation (OPDES). There, a simulator…
(more)
▼ We consider the problem of synthesizing program inverses for imperative languages. Our primary motivation comes from optimistic parallel discrete event simulation (OPDES). There, a simulator must process events while respecting logical temporal event-ordering constraints; to extract parallelism, an OPDES simulator may speculatively execute events and only rollback execution when event-ordering violations occur. In this context, the ability to perform rollback by running time- and space-efficient reverse programs, rather than saving and restoring large amounts of state, can make OPDES more practical. Synthesizing inverses also appears in numerous other software engineering contexts, such as debugging, synthesizing “undo” code, or even generating decompressors automatically given only lossless compression code.
This thesis mainly contains three chapters. In the first chapter, we focus on handling programs with only scalar data and arbitrary control flows. By building a value search graph (VSG) that represents recoverability relationships between variable values, we turn the problem of recovering previous values into a graph search one. Forward and reverse programs are generated according to the search results. For any loop that produces an output state given a particular input state, our method can synthesize an inverse loop that reconstructs the input state given the original loop's output state. The synthesis process consists of two major components: (a) building the inverse loop's body, and (b) building the inverse loop's predicate. Our method works for all natural loops, including those that take early exits (e.g., via breaks, gotos, returns).
In the second chapter we extend our method to handling programs containing arrays. Based on Array SSA, we develop a modified Array SSA from which we could easily build equalities between arrays and array elements. Specifically, to represent the equality between two arrays, we employ the array subregion as the constraint. During the search those subregions will be calculated to guarantee that all array elements will be retrieved. We also develop a demand-driven method to retrieve array elements from a loop, in which each time we only try to retrieve an array element from an iteration if that element has not been modified in previous iterations. To ensure the correctness of each retrieval, the boundary conditions are created and checked at the entry and the exit of the loop.
In the last chapter, we introduce several techniques of handling high-level constructs of C++ programs, including virtual functions, copying C++ objects, C++ STL containers, C++ source code normalization, inter-procedural function calls, etc. Since C++ is an object-oriented (OO) language, our discussion in this chapter can also be extended to other OO languages like Java.
Advisors/Committee Members: Vuduc, Richard (advisor), Fujimoto, Richard (committee member), Quinlan, Daniel (committee member), Jefferson, David (committee member), Pande, Santosh (committee member).
Subjects/Keywords: Program inversion; Program synthesis; SSA; Compiler; Computer simulation; Reversible computing
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Hou, C. (2013). Automated synthesis for program inversion. (Doctoral Dissertation). Georgia Tech. Retrieved from http://hdl.handle.net/1853/49037
Chicago Manual of Style (16th Edition):
Hou, Cong. “Automated synthesis for program inversion.” 2013. Doctoral Dissertation, Georgia Tech. Accessed April 13, 2021.
http://hdl.handle.net/1853/49037.
MLA Handbook (7th Edition):
Hou, Cong. “Automated synthesis for program inversion.” 2013. Web. 13 Apr 2021.
Vancouver:
Hou C. Automated synthesis for program inversion. [Internet] [Doctoral dissertation]. Georgia Tech; 2013. [cited 2021 Apr 13].
Available from: http://hdl.handle.net/1853/49037.
Council of Science Editors:
Hou C. Automated synthesis for program inversion. [Doctoral Dissertation]. Georgia Tech; 2013. Available from: http://hdl.handle.net/1853/49037
28.
Dayal, Jai.
Middleware for large scale in situ analytics workflows.
Degree: PhD, Computer Science, 2016, Georgia Tech
URL: http://hdl.handle.net/1853/56355
► The trend to exascale is causing researchers to rethink the entire computa- tional science stack, as future generation machines will contain both diverse hardware environments…
(more)
▼ The trend to exascale is causing researchers to rethink the entire computa- tional science stack, as future generation machines will contain both diverse hardware environments and run times that manage them. Additionally, the science applications themselves are stepping away from the traditional bulk-synchronous model and are moving towards a more dynamic and decoupled environment where analysis routines are run in situ alongside the large scale simulations. This thesis presents CoApps, a middleware that allows in situ science analytics applications to operate in a location-flexible manner. Additionally, CoApps explores methods to extract information from, and issue management operations to, lower level run times that are managing the diverse hardware expected to be found on next generation exascale machines. This work leverages experience with several extremely scalable applications in materials and fusion, and has been evaluated on machines ranging from local Linux clusters to the supercomputer Titan.
Advisors/Committee Members: Wolf, Matthew (advisor), Gavrilovsk, Ada (committee member), Lofstead, Gerald (committee member), Pande, Santosh (committee member), Liu, Ling (committee member).
Subjects/Keywords: In situ; High performance computing; Big data; Code coupling; Workflows
…33
9
Application Level Throughput on Sith and across Georgia Tech clusters. 34
10
High… …Ridge
National Labs, and on the Windu and Jedi clusters hosted at Georgia Tech. The Sith…
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Dayal, J. (2016). Middleware for large scale in situ analytics workflows. (Doctoral Dissertation). Georgia Tech. Retrieved from http://hdl.handle.net/1853/56355
Chicago Manual of Style (16th Edition):
Dayal, Jai. “Middleware for large scale in situ analytics workflows.” 2016. Doctoral Dissertation, Georgia Tech. Accessed April 13, 2021.
http://hdl.handle.net/1853/56355.
MLA Handbook (7th Edition):
Dayal, Jai. “Middleware for large scale in situ analytics workflows.” 2016. Web. 13 Apr 2021.
Vancouver:
Dayal J. Middleware for large scale in situ analytics workflows. [Internet] [Doctoral dissertation]. Georgia Tech; 2016. [cited 2021 Apr 13].
Available from: http://hdl.handle.net/1853/56355.
Council of Science Editors:
Dayal J. Middleware for large scale in situ analytics workflows. [Doctoral Dissertation]. Georgia Tech; 2016. Available from: http://hdl.handle.net/1853/56355
29.
Zhang, Xin.
Combining logical and probabilistic reasoning in program analysis.
Degree: PhD, Computer Science, 2017, Georgia Tech
URL: http://hdl.handle.net/1853/59200
► Software is becoming increasingly pervasive and complex. These trends expose masses of users to unintended software failures and deliberate cyber-attacks. A widely adopted solution to…
(more)
▼ Software is becoming increasingly pervasive and complex. These trends expose masses of users to unintended software failures and deliberate cyber-attacks. A widely adopted solution to enforce software quality is automated program analysis. Existing program analyses are expressed in the form of logical rules that are handcrafted by experts. While such a logic-based approach provides many benefits, it cannot handle uncertainty and lacks the ability to learn and adapt. This in turn hinders the accuracy, scalability, and usability of program analysis tools in practice.
We seek to address these limitations by proposing a methodology and framework for incorporating probabilistic reasoning directly into existing program analyses that are based on logical reasoning. The framework consists of a frontend, which automatically integrates probabilities into a logical analysis by synthesizing a system of weighted constraints, and a backend, which is a learning and inference engine for such constraints. We demonstrate that the combined approach can benefit a number of important applications of program analysis and thereby facilitate more widespread adoption of this technology. We also describe new algorithmic techniques to solve very large instances of weighted constraints that arise not only in our domain but also in other domains such as Big Data analytics and statistical AI.
Advisors/Committee Members: Naik, Mayur (advisor), Pande, Santosh (committee member), Harris, William (committee member), Yang, Hongseok (committee member), Nori, Aditya (committee member).
Subjects/Keywords: Program analysis; Logic; Probability; Combined logical and probabilistic reasoning; Markov logic networks; Datalog; MaxSAT; Verification; Bug finding; Programming languages; Software engineering
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Zhang, X. (2017). Combining logical and probabilistic reasoning in program analysis. (Doctoral Dissertation). Georgia Tech. Retrieved from http://hdl.handle.net/1853/59200
Chicago Manual of Style (16th Edition):
Zhang, Xin. “Combining logical and probabilistic reasoning in program analysis.” 2017. Doctoral Dissertation, Georgia Tech. Accessed April 13, 2021.
http://hdl.handle.net/1853/59200.
MLA Handbook (7th Edition):
Zhang, Xin. “Combining logical and probabilistic reasoning in program analysis.” 2017. Web. 13 Apr 2021.
Vancouver:
Zhang X. Combining logical and probabilistic reasoning in program analysis. [Internet] [Doctoral dissertation]. Georgia Tech; 2017. [cited 2021 Apr 13].
Available from: http://hdl.handle.net/1853/59200.
Council of Science Editors:
Zhang X. Combining logical and probabilistic reasoning in program analysis. [Doctoral Dissertation]. Georgia Tech; 2017. Available from: http://hdl.handle.net/1853/59200
30.
Kim, Minjang.
Dynamic program analysis algorithms to assist parallelization.
Degree: PhD, Computing, 2012, Georgia Tech
URL: http://hdl.handle.net/1853/45758
► All market-leading processor vendors have started to pursue multicore processors as an alternative to high-frequency single-core processors for better energy and power efficiency. This transition…
(more)
▼ All market-leading processor vendors have started to pursue multicore processors as an alternative to high-frequency single-core processors for better energy and power efficiency. This transition to multicore processors no longer provides the free performance gain enabled by increased clock frequency for programmers. Parallelization of existing serial programs has become the most powerful approach to improving application performance. Not surprisingly, parallel programming is still extremely difficult for many programmers mainly because thinking in parallel is simply beyond the human perception. However, we believe that software tools based on advanced analyses can significantly reduce this parallelization burden.
Much active research and many tools exist for already parallelized programs such as finding concurrency bugs. Instead we focus on program analysis algorithms that assist the actual parallelization steps: (1) finding parallelization candidates, (2) understanding the parallelizability and profits of the candidates, and (3) writing parallel code. A few commercial tools are introduced for these steps. A number of researchers have proposed various methodologies and techniques to assist parallelization. However, many weaknesses and limitations still exist.
In order to assist the parallelization steps more effectively and efficiently, this dissertation proposes Prospector, which consists of several new and enhanced program analysis algorithms.
First, an efficient loop profiling algorithm is implemented. Frequently executed loop can be candidates for profitable parallelization targets. The detailed execution profiling for loops provides a guide for selecting initial parallelization targets.
Second, an efficient and rich data-dependence profiling algorithm is presented. Data dependence is the most essential factor that determines parallelizability. Prospector exploits dynamic data-dependence profiling, which is an alternative and complementary approach to traditional static-only analyses. However, even state-of-the-art dynamic dependence analysis algorithms can only successfully profile a program with a small memory footprint. Prospector introduces an efficient data-dependence profiling algorithm to support large programs and inputs as well as provides highly detailed profiling information.
Third, a new speedup prediction algorithm is proposed. Although the loop profiling can give a qualitative estimate of the expected profit, obtaining accurate speedup estimates needs more sophisticated analysis. Prospector introduces a new dynamic emulation method to predict parallel speedups from annotated serial code. Prospector also provides a memory performance model to predict speedup saturation due to increased memory traffic. Compared to the latest related work, Prospector significantly improves both prediction accuracy and coverage.
Finally, Prospector provides algorithms that extract hidden parallelism and advice on writing parallel code. We present a number of case studies how Prospector…
Advisors/Committee Members: Kim, Hyesoon (Committee Chair), Lee, Hsien-Hsin (Committee Member), Luk, Chi-Keung (Committee Member), Pande, Santosh (Committee Member), Vuduc, Richard (Committee Member).
Subjects/Keywords: Parallel programming; Pallelization; Multi-core; Profiling; Program analysis; Compilers; Parallel programs (Computer programs); Computer programs; Software engineering; Computer science; Algorithms
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Kim, M. (2012). Dynamic program analysis algorithms to assist parallelization. (Doctoral Dissertation). Georgia Tech. Retrieved from http://hdl.handle.net/1853/45758
Chicago Manual of Style (16th Edition):
Kim, Minjang. “Dynamic program analysis algorithms to assist parallelization.” 2012. Doctoral Dissertation, Georgia Tech. Accessed April 13, 2021.
http://hdl.handle.net/1853/45758.
MLA Handbook (7th Edition):
Kim, Minjang. “Dynamic program analysis algorithms to assist parallelization.” 2012. Web. 13 Apr 2021.
Vancouver:
Kim M. Dynamic program analysis algorithms to assist parallelization. [Internet] [Doctoral dissertation]. Georgia Tech; 2012. [cited 2021 Apr 13].
Available from: http://hdl.handle.net/1853/45758.
Council of Science Editors:
Kim M. Dynamic program analysis algorithms to assist parallelization. [Doctoral Dissertation]. Georgia Tech; 2012. Available from: http://hdl.handle.net/1853/45758
◁ [1] [2] ▶
.