You searched for +publisher:"University of Texas – Austin" +contributor:("Lin, Calvin")
.
Showing records 1 – 17 of
17 total matches.
No search limiters apply to these results.
1.
-2178-1988.
Program analysis techniques for algorithmic complexity and relational properties.
Degree: PhD, Computer Science, 2019, University of Texas – Austin
URL: http://dx.doi.org/10.26153/tsw/2181
► Analyzing standard safety properties of a given program has traditionally been the primary focus of the program analysis community. Unfortunately, there are still many interesting…
(more)
▼ Analyzing standard safety properties of a given program has traditionally been
the primary focus of the program analysis community. Unfortunately, there are
still many interesting analysis tasks that cannot be effectively expressed with
standard safety properties. One such example is to derive the asymptotic
complexity of a given program. Another example is to verify relational
properties, i.e. properties that must be satisfied jointly by multiple programs
of multiple runs of one program. Existing program analysis techniques for
standard safety properties are usually not immediately applicable to asymptotic
complexity analysis problems and relational verification problems. New
approaches are therefore needed to solve these unconventional problems.
This thesis studies techniques for algorithmic complexity analysis as well as
relational verification. To that end, we present three case studies: (1) We
propose a new fuzzing technique for automatically finding inputs that trigger a
program's worst-case resource usage. (2) We show how to build a scalable,
end-to-end side channel detection tool by combining static taint analysis and a
program logic designed for verifying non-interference of a given program. (3) We
propose a general and effective relational verification algorithm that combines
reinforcement learning with backtracking search. A common theme
among all these solutions is to exploit problem-specific structures and adapt
existing techniques to exploit those structures accordingly.
Advisors/Committee Members: Dillig, Isil (advisor), Lin, Calvin (committee member), Chidambaram, Vijay (committee member), Tiwari, Mohit (committee member).
Subjects/Keywords: Complexity testing; Optimal program synthesis; Fuzzing; Genetic
programming; Performance bug; Vulnerability detection; Side channel; Static analysis; Relational verification; Reinforcement learning; Policy gradient
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
-2178-1988. (2019). Program analysis techniques for algorithmic complexity and relational properties. (Doctoral Dissertation). University of Texas – Austin. Retrieved from http://dx.doi.org/10.26153/tsw/2181
Note: this citation may be lacking information needed for this citation format:
Author name may be incomplete
Chicago Manual of Style (16th Edition):
-2178-1988. “Program analysis techniques for algorithmic complexity and relational properties.” 2019. Doctoral Dissertation, University of Texas – Austin. Accessed January 24, 2021.
http://dx.doi.org/10.26153/tsw/2181.
Note: this citation may be lacking information needed for this citation format:
Author name may be incomplete
MLA Handbook (7th Edition):
-2178-1988. “Program analysis techniques for algorithmic complexity and relational properties.” 2019. Web. 24 Jan 2021.
Note: this citation may be lacking information needed for this citation format:
Author name may be incomplete
Vancouver:
-2178-1988. Program analysis techniques for algorithmic complexity and relational properties. [Internet] [Doctoral dissertation]. University of Texas – Austin; 2019. [cited 2021 Jan 24].
Available from: http://dx.doi.org/10.26153/tsw/2181.
Note: this citation may be lacking information needed for this citation format:
Author name may be incomplete
Council of Science Editors:
-2178-1988. Program analysis techniques for algorithmic complexity and relational properties. [Doctoral Dissertation]. University of Texas – Austin; 2019. Available from: http://dx.doi.org/10.26153/tsw/2181
Note: this citation may be lacking information needed for this citation format:
Author name may be incomplete

University of Texas – Austin
2.
Zheng, Tianhao, Ph. D.
Efficient fine-grained virtual memory.
Degree: PhD, Electrical and Computer Engineering, 2018, University of Texas – Austin
URL: http://hdl.handle.net/2152/68079
► Virtual memory in modern computer systems provides a single abstraction of the memory hierarchy. By hiding fragmentation and overlays of physical memory, virtual memory frees…
(more)
▼ Virtual memory in modern computer systems provides a single abstraction of the memory hierarchy.
By hiding fragmentation and overlays of physical memory, virtual memory frees applications from managing physical memory and improves programmability.
However, virtual memory often introduces noticeable overhead.
State-of-the-art systems use a paged virtual memory that maps virtual addresses to physical addresses
in page granularity (typically 4 KiB ).This mapping is stored as a page table. Before accessing physically addressed memory, the page table is accessed
to translate virtual addresses to physical addresses. Research shows that the overhead of accessing the page table can even exceed the execution time for some important applications.
In addition, this fine-grained mapping changes the access patterns between virtual and physical address spaces, introducing difficulties to many architecture techniques, such as caches and prefecthers.
In this dissertation, I propose architecture mechanisms to reduce the overhead of accessing and managing fine-grained virtual memory without compromising existing benefits.
There are three main contributions in this dissertation.
First, I investigate the impact of address translation on cache. I examine the restriction of virtually indexed, physically tagged (VIPT) caches with fine-grained paging and conclude that this restriction may lead to sub-optimal cache designs.
I introduce a novel cache strategy, speculatively indexed, physically tagged (SIPT) to enable flexible cache indexing under fine-grained page mapping.
SIPT speculates on the value of a few more index bits (1 - 3 in our experiments) to access the cache speculatively before translation, and then verify that the physical tag matches after translation.
Utilizing the fact that a simple relation generally exists between virtual and physical addresses, because memory allocators often exhibit contiguity, I also propose low-cost mechanisms to predict and correct potential mis-speculations.
Next, I focus on reducing the overhead of address translation for fine-grained virtual memory. I propose a novel architecture mechanism, Embedded Page Translation Information (EMPTI),
to provide general fine-grained page translation information on top of coarse-grained virtual memory.
EMPTI does so by speculating that a virtual address is mapped to a pre-determined physical location and then verifying the translation with a very-low-cost access to metadata embedded with data.
Coarse-grained virtual memory mechanisms (e.g., segmentation) are used to suggest the pre-determined physical location for each virtual page.
Overall, EMPTI achieves the benefits of low overhead translation while keeping the flexibility and programmability of fine-grained paging.
Finally, I improve the efficiency of metadata caching based on the fact that memory mapping contiguity generally exists beyond a page boundary.
In state-of-the-art architectures, caches treat PTEs (page table entries) as regular data. Although this is simple and straightforward,
it…
Advisors/Committee Members: Erez, Mattan (advisor), Reddi, Vijay Janapa (committee member), Tiwari, Mohit (committee member), Lin, Calvin (committee member), Peter, Simon (committee member).
Subjects/Keywords: Memory; Cache; Metadata
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Zheng, Tianhao, P. D. (2018). Efficient fine-grained virtual memory. (Doctoral Dissertation). University of Texas – Austin. Retrieved from http://hdl.handle.net/2152/68079
Chicago Manual of Style (16th Edition):
Zheng, Tianhao, Ph D. “Efficient fine-grained virtual memory.” 2018. Doctoral Dissertation, University of Texas – Austin. Accessed January 24, 2021.
http://hdl.handle.net/2152/68079.
MLA Handbook (7th Edition):
Zheng, Tianhao, Ph D. “Efficient fine-grained virtual memory.” 2018. Web. 24 Jan 2021.
Vancouver:
Zheng, Tianhao PD. Efficient fine-grained virtual memory. [Internet] [Doctoral dissertation]. University of Texas – Austin; 2018. [cited 2021 Jan 24].
Available from: http://hdl.handle.net/2152/68079.
Council of Science Editors:
Zheng, Tianhao PD. Efficient fine-grained virtual memory. [Doctoral Dissertation]. University of Texas – Austin; 2018. Available from: http://hdl.handle.net/2152/68079
3.
Diamond, Jeffrey Robert.
Designing on-chip memory systems for throughput architectures.
Degree: PhD, Computer science, 2015, University of Texas – Austin
URL: http://hdl.handle.net/2152/33306
► Driven by the high arithmetic intensity and embarrassingly parallel nature of real time computer graphics, GPUs became the first wide spread throughput architecture. With the…
(more)
▼ Driven by the high arithmetic intensity and embarrassingly parallel nature of real time computer graphics, GPUs became the first wide spread throughput architecture. With the end of Dennard scaling and the plateau of single thread performance, nearly all computer chips at all scales have now become explicitly parallel, containing a hierarchy of cores and threads. Initially, these individual cores were imagined to be no different from traditional uniprocessors, and parallel programs no different than traditional parallel programs. Like GPUs, these modern chips share finite on-chip resources between threads. This results in novel performance and optimization issues at any granularity of parallelism, from cell phones to GPUs.

Unfortunately, the performance characteristics of these systems tend to be non-linear and counter-intuitive. The programmer’s software stack has been slow in adapting to this paradigm shift. Compilers still focus primarily on optimizing single thread performance at the expense of throughput. Existing parallel applications are not a perfect match for modern multicore, multithreaded processors. And existing methodologies for performance analysis and simulation are not aligned with multicore issues.
This dissertation begins with a mathematical analysis of throughput performance in the presence of shared on-chip resources. When cache hit rates begin to fall, there is a steep drop off in throughput performance. An optimistic view of this regime is that even small improvements to cache efficiency offer significant benefits. This motivates the exploration of general throughput optimizations in both hardware and software that apply to both coarse-grained and fine-grained parallel architectures, requiring no programmer intervention or tuning. This dissertation provides two such solutions.
The first solution is a compiler optimization called “loop microfission” that can boost throughput performance by up to 50%. In the context of the intrachip scalability of supercomputing applications, we demonstrate the failings of conven- tional performance tuning software and compiler algorithms in the presence of shared resources. We introduce a new approach to throughput optimization, including a memory friendly performance analysis tool, and show that techniques for throughput optimization are similar to traditional optimizations, but require new priorities.
The second solution is a hardware optimization called Arbitrary Modulus In- dexing (AMI), a technique that generalizes efficient implementations of the DIV/- MOD operation from Mersenne Primes to all integers. We show that the primary performance bottlenecks in modern GPUs for regular, memory intensive applications are bank and set conflicts in the shared on-chip memory system. AMI completely eliminates conflicts in all facets of the memory system at negligible hardware cost, and has even broader potential for optimizations throughout computer architecture.
Advisors/Committee Members: Fussell, Donald S., 1951- (advisor), Keckler, Stephen W. (advisor), van de Geijn, Robert (committee member), Lin, Calvin (committee member), Eijkhout, Victor (committee member).
Subjects/Keywords: Computer architecture; Caching; Throughput; Arbitrary modulus indexing
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Diamond, J. R. (2015). Designing on-chip memory systems for throughput architectures. (Doctoral Dissertation). University of Texas – Austin. Retrieved from http://hdl.handle.net/2152/33306
Chicago Manual of Style (16th Edition):
Diamond, Jeffrey Robert. “Designing on-chip memory systems for throughput architectures.” 2015. Doctoral Dissertation, University of Texas – Austin. Accessed January 24, 2021.
http://hdl.handle.net/2152/33306.
MLA Handbook (7th Edition):
Diamond, Jeffrey Robert. “Designing on-chip memory systems for throughput architectures.” 2015. Web. 24 Jan 2021.
Vancouver:
Diamond JR. Designing on-chip memory systems for throughput architectures. [Internet] [Doctoral dissertation]. University of Texas – Austin; 2015. [cited 2021 Jan 24].
Available from: http://hdl.handle.net/2152/33306.
Council of Science Editors:
Diamond JR. Designing on-chip memory systems for throughput architectures. [Doctoral Dissertation]. University of Texas – Austin; 2015. Available from: http://hdl.handle.net/2152/33306

University of Texas – Austin
4.
Gong, Seong-Lyong.
Memory protection techniques for DRAM scaling-induced errors.
Degree: PhD, Electrical and Computer Engineering, 2018, University of Texas – Austin
URL: http://hdl.handle.net/2152/68922
► Continued scaling of DRAM technologies induces more faulty DRAM cells than before. These inherent faults increase significantly at sub-20nm technology, and hence traditional remapping schemes…
(more)
▼ Continued scaling of DRAM technologies induces more faulty DRAM cells than before. These inherent faults increase significantly at sub-20nm technology, and hence traditional remapping schemes such as row/column sparing become very inefficient. Because the inherent faults manifest as single-bit errors, DRAM vendors are proposing to embed single-bit error correctable (SEC) ECC modules inside each DRAM chip, called In-DRAM ECC (IECC). However, IECC can achieve limited reliability improvement due to its weak correction capability. Specifically, at high scaling error rates, multi-bit scaling errors will easily occur in practice and escape from IECC protection. Because of the escaped scaling errors, the overall reliability may be degraded despite the increased overall overheads. For highly reliable systems that apply a strong ECC at the rank level (i.e., across DRAM chips that are accessed simultaneously), for example, Chipkill cannot be guaranteed anymore if the escaped errors occur.
In this dissertation, I address this scaling-induced error problem as follows. First, I propose a more sophisticated fault-error model that includes intermittent scaling errors. In general, the effectiveness of proposed solutions strongly relies on the evaluation methodology. Prior related work evaluated their solutions against scaling errors only with a simple model and concluded efficient remapping schemes effectively cope with scaling errors. However, intermittent scaling errors cannot be easily detected and remapped. This implies that rather than the proposed remapping schemes, forward error correction may be the only solution to the scaling error problem. Using the new evaluation model, the proposed solutions to scaling errors can be evaluated in a more comprehensive way than before.
Secondly, I propose two alternatives to In-DRAM ECC, Dual Use of On-chip redundancy (DUO) and Why-Pay-More (YPM), for highly reliable systems. DUO achieves higher reliability than In-DRAM ECC-based solutions by transferring on-chip redundancy to the rank level. Then, using the transferred redundancy together with original rank-level redundancy, a stronger rank-level ECC is applied. YPM is the first rank-level-only ECC protection against scaling errors. For this cost-saving design, YPM optimizes the correction capability by exploiting erasure Reed-Solomon (RS) decoding and iterative bit-flipping search. Each alternative is industry-changing in that DUO achieves much higher reliability than current rank-level ECC and YPM does not require In-DRAM ECC at all. Both alternatives are practical in that they require only small changes to DRAM designs.
Advisors/Committee Members: Erez, Mattan (advisor), Swartzlander, Earl (committee member), Touba, Nur (committee member), Dimakis, Alex (committee member), Lin, Calvin (committee member), Sullivan, Mike (committee member).
Subjects/Keywords: DRAM; Memory; ECC; Scaling errors
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Gong, S. (2018). Memory protection techniques for DRAM scaling-induced errors. (Doctoral Dissertation). University of Texas – Austin. Retrieved from http://hdl.handle.net/2152/68922
Chicago Manual of Style (16th Edition):
Gong, Seong-Lyong. “Memory protection techniques for DRAM scaling-induced errors.” 2018. Doctoral Dissertation, University of Texas – Austin. Accessed January 24, 2021.
http://hdl.handle.net/2152/68922.
MLA Handbook (7th Edition):
Gong, Seong-Lyong. “Memory protection techniques for DRAM scaling-induced errors.” 2018. Web. 24 Jan 2021.
Vancouver:
Gong S. Memory protection techniques for DRAM scaling-induced errors. [Internet] [Doctoral dissertation]. University of Texas – Austin; 2018. [cited 2021 Jan 24].
Available from: http://hdl.handle.net/2152/68922.
Council of Science Editors:
Gong S. Memory protection techniques for DRAM scaling-induced errors. [Doctoral Dissertation]. University of Texas – Austin; 2018. Available from: http://hdl.handle.net/2152/68922
5.
-1587-0677.
Strong, thorough, and efficient memory protection against existing and emerging DRAM errors.
Degree: PhD, Electrical and Computer engineering, 2016, University of Texas – Austin
URL: http://hdl.handle.net/2152/46770
► Memory protection is necessary to ensure the correctness of data in the presence of unavoidable faults. As such, large-scale systems typically employ Error Correcting Codes…
(more)
▼ Memory protection is necessary to ensure the correctness of data in the presence of unavoidable faults. As such, large-scale systems typically employ Error Correcting Codes (ECC) to trade off redundant storage and bandwidth for increased reliability. Single Device Data Correction (SDDC) ECC mechanisms are required to meet the reliability demands of servers and large-scale systems by tolerating even severe faults that disable an entire memory chip. In the future, however, stronger memory protection will be required due to increasing levels of system integration, shrinking process technology, and growing transfer rates. The energy-efficiency of memory protection is also important as DRAM already consumes a significant fraction of system energy budget. This dissertation develops a novel set of ECC schemes to provide strong, safe, flexible, and thorough protection against existing and emerging types of DRAM errors. This research also reduces energy consumption of such protection while only marginally impacting performance. First, this dissertation develops Bamboo ECC, a technique with strongerthan-SDDC correction and very safe detection capabilities (≥ 99.999994% of data errors with any severity are detected). Bamboo ECC changes ECC layout based on frequent DRAM error patterns, and can correct concurrent errors from multiple devices and all but eliminates the risk of silent data corruption. Also, Bamboo ECC provides flexible configurations to enable more adaptive graceful downgrade schemes in which the system continues to operate correctly after even severe chip faults, albeit at a reduced capacity to protect against future faults. These strength, safety, and flexibility advantages translate to a significantly more reliable memory sub-system for future exascale computing. Then, this dissertation focuses on emerging error types from scaling process technology and increasing data bandwidth. As DRAM process technology scales down to below 10nm, DRAM cells are becoming more vulnerable to errors from an imperfect manufacturing process. At the same time, DRAM signal transfers are getting more susceptible to timing and electrical noises as DRAM interfaces keep increasing signal transfer rates and decreasing I/O voltage levels. With individual DRAM chips getting more vulnerable to errors, industry and academia have proposed mechanisms to tolerate these emerging types of errors; yet they are inefficient because they rely on multiple levels of redundancy in the case of cell errors and ad-hoc schemes with suboptimal protection coverage for transmission errors. Active Guardband ECC and All-Inclusive ECC make systematic use of ECC and existing mechanisms to provide thorough end-to-end protection without requiring redundancy beyond what is common today. Finally, this dissertation targets the energy efficiency of memory protection. Frugal ECC combines ECC with fine-grained compression to provide versatile and energy-efficient protection. Frugal ECC compresses main memory at cache-block granularity, using any left over space to store…
Advisors/Committee Members: Erez, Mattan (advisor), Patt, Yale (committee member), Touba, Nur (committee member), Lin, Calvin (committee member), Alameldeen, Alaa (committee member).
Subjects/Keywords: DRAM; Reliability; ECC; Error correcting codes; Compression
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
-1587-0677. (2016). Strong, thorough, and efficient memory protection against existing and emerging DRAM errors. (Doctoral Dissertation). University of Texas – Austin. Retrieved from http://hdl.handle.net/2152/46770
Note: this citation may be lacking information needed for this citation format:
Author name may be incomplete
Chicago Manual of Style (16th Edition):
-1587-0677. “Strong, thorough, and efficient memory protection against existing and emerging DRAM errors.” 2016. Doctoral Dissertation, University of Texas – Austin. Accessed January 24, 2021.
http://hdl.handle.net/2152/46770.
Note: this citation may be lacking information needed for this citation format:
Author name may be incomplete
MLA Handbook (7th Edition):
-1587-0677. “Strong, thorough, and efficient memory protection against existing and emerging DRAM errors.” 2016. Web. 24 Jan 2021.
Note: this citation may be lacking information needed for this citation format:
Author name may be incomplete
Vancouver:
-1587-0677. Strong, thorough, and efficient memory protection against existing and emerging DRAM errors. [Internet] [Doctoral dissertation]. University of Texas – Austin; 2016. [cited 2021 Jan 24].
Available from: http://hdl.handle.net/2152/46770.
Note: this citation may be lacking information needed for this citation format:
Author name may be incomplete
Council of Science Editors:
-1587-0677. Strong, thorough, and efficient memory protection against existing and emerging DRAM errors. [Doctoral Dissertation]. University of Texas – Austin; 2016. Available from: http://hdl.handle.net/2152/46770
Note: this citation may be lacking information needed for this citation format:
Author name may be incomplete
6.
-8037-0201.
Formal verification of application and system programs based on a validated x86 ISA model.
Degree: PhD, Computer science, 2016, University of Texas – Austin
URL: http://hdl.handle.net/2152/46437
► Two main kinds of tools available for formal software verification are point tools and general-purpose tools. Point tools are targeted towards bug-hunting or proving a…
(more)
▼ Two main kinds of tools available for formal software verification are point tools and general-purpose tools. Point tools are targeted towards bug-hunting or proving a fixed set of properties, such as establishing the absence of buffer overflows. These tools have become a practical choice in the development and analysis of serious software systems, largely because they are easy to use. However, point tools are limited in their scope because they are pre-programmed to reason about a fixed set of behaviors. In contrast, general-purpose tools,like theorem provers, have a wider scope. Unfortunately, they also have a higher user overhead. These tools often use incomplete and/or unrealistic software models, in part to reduce this overhead. Formal verification based on such a model can be restrictive because it does not account for program behaviors that rely on missing features in the model. The results of such formal verification undertakings may be unreliable – consequently, they can offer a false sense of security. This dissertation demonstrates that formal verification of complex program properties can be made practical, without any loss of accuracy or expressiveness, by employing a machine-code analysis framework implemented using a mechanical theorem prover. To this end, we constructed a formal and executable model of the x86 Instruction-Set Architecture using the ACL2 theorem-proving system. This model includes a specification of 400+ x86 opcodes and architectural features like segmentation and paging. The model's high execution speed allows it to be validated routinely by performing co-simulations against a physical x86 processor – thus, formal analysis based on this model is reliable. We also developed a general framework for x86 machine-code analysis that can lower the overhead associated with the verification of a broad range of program properties, including correctness with respect to behavior, security, and resource requirements. We illustrate the capabilities of our framework by describing the verification of two application programs, population count and word count, and one system program, zero copy.
Advisors/Committee Members: Hunt, Warren A., 1958- (advisor), Alvisi, Lorenzo (committee member), Kaufmann, Matt (committee member), Lin, Calvin (committee member), Moore, J S (committee member), Watson, Robert (committee member).
Subjects/Keywords: Formal verification; x86 Machine-code analysis; x86 ISA; ACL2; Theorem proving; Program verification
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
-8037-0201. (2016). Formal verification of application and system programs based on a validated x86 ISA model. (Doctoral Dissertation). University of Texas – Austin. Retrieved from http://hdl.handle.net/2152/46437
Note: this citation may be lacking information needed for this citation format:
Author name may be incomplete
Chicago Manual of Style (16th Edition):
-8037-0201. “Formal verification of application and system programs based on a validated x86 ISA model.” 2016. Doctoral Dissertation, University of Texas – Austin. Accessed January 24, 2021.
http://hdl.handle.net/2152/46437.
Note: this citation may be lacking information needed for this citation format:
Author name may be incomplete
MLA Handbook (7th Edition):
-8037-0201. “Formal verification of application and system programs based on a validated x86 ISA model.” 2016. Web. 24 Jan 2021.
Note: this citation may be lacking information needed for this citation format:
Author name may be incomplete
Vancouver:
-8037-0201. Formal verification of application and system programs based on a validated x86 ISA model. [Internet] [Doctoral dissertation]. University of Texas – Austin; 2016. [cited 2021 Jan 24].
Available from: http://hdl.handle.net/2152/46437.
Note: this citation may be lacking information needed for this citation format:
Author name may be incomplete
Council of Science Editors:
-8037-0201. Formal verification of application and system programs based on a validated x86 ISA model. [Doctoral Dissertation]. University of Texas – Austin; 2016. Available from: http://hdl.handle.net/2152/46437
Note: this citation may be lacking information needed for this citation format:
Author name may be incomplete

University of Texas – Austin
7.
Hur, Ibrahim.
Enhancing memory controllers to improve DRAM power and performance.
Degree: PhD, Electrical and Computer Engineering, 2006, University of Texas – Austin
URL: http://hdl.handle.net/2152/2886
► Technological advances and new architectural techniques have enabled processor performance to double almost every two years. However, these performance improvements have not resulted in comparable…
(more)
▼ Technological advances and new architectural techniques have enabled processor performance to double almost every two years. However, these performance
improvements have not resulted in
comparable speedups for all applications, because
the memory system performance has not kept pace with processor performance in
modern systems. In this dissertation, by
concentrating on the interface between the
processors and memory, the memory
controller, we propose novel solutions to all
three aspects of the memory problem, that is bandwidth, latency, and power.
To increase available bandwidth between the memory controller and DRAM,
we introduce a new scheduling approach. To hide memory latency, we introduce a
new hardware prefetching technique that is useful for applications with regular or
irregular memory accesses. And finally, we show how memory controllers
can be
used to improve DRAM power
consumption.
We evaluate our techniques in the
context of the memory
controller of a
highly tuned modern processor, the IBM Power5+. Our evaluation for both technical and
commercial benchmarks in single-threaded and simultaneous multi-threaded
environments show that our techniques for bandwidth increase, latency hiding,
and power reduction achieve signicant improvements. For example, for singlethreaded
applications, when our scheduling approach and prefetching method are
implemented together, they improve the performance of the SPEC2006fp, NAS, and
a set of commercial benchmarks by 14.3%, 13.7%, and 11.2%, respectively.
In addition to providing substantial performance and power improvements,
our techniques are superior to the previously proposed methods in terms of
cost as
well. For example, a version of our scheduling approach has been implemented in
the Power5+, and it has increased the transistor
count of the
hip by only 0.02%.
This dissertation shows that without increasing the
complexity of neither the
processor nor the memory organization, all three aspects of memory systems
an be significantly improved with low-
cost enhancements to the memory
controller.
Advisors/Committee Members: Lin, Calvin (advisor).
Subjects/Keywords: Computer storage devices; Memory management (Computer science)
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Hur, I. (2006). Enhancing memory controllers to improve DRAM power and performance. (Doctoral Dissertation). University of Texas – Austin. Retrieved from http://hdl.handle.net/2152/2886
Chicago Manual of Style (16th Edition):
Hur, Ibrahim. “Enhancing memory controllers to improve DRAM power and performance.” 2006. Doctoral Dissertation, University of Texas – Austin. Accessed January 24, 2021.
http://hdl.handle.net/2152/2886.
MLA Handbook (7th Edition):
Hur, Ibrahim. “Enhancing memory controllers to improve DRAM power and performance.” 2006. Web. 24 Jan 2021.
Vancouver:
Hur I. Enhancing memory controllers to improve DRAM power and performance. [Internet] [Doctoral dissertation]. University of Texas – Austin; 2006. [cited 2021 Jan 24].
Available from: http://hdl.handle.net/2152/2886.
Council of Science Editors:
Hur I. Enhancing memory controllers to improve DRAM power and performance. [Doctoral Dissertation]. University of Texas – Austin; 2006. Available from: http://hdl.handle.net/2152/2886
8.
Subramanian, Suriya.
Dynamic software updates : a VM-centric approach.
Degree: PhD, Computer Sciences, 2010, University of Texas – Austin
URL: http://hdl.handle.net/2152/ETD-UT-2010-05-1436
► Because software systems are imperfect, developers are forced to fix bugs and add new features. The common way of applying changes to a running system…
(more)
▼ Because software systems are imperfect, developers are forced to fix bugs
and add new features. The common way of applying changes to a running
system is to stop the application or machine and restart with the new
version. Stopping and restarting causes a disruption in service that is at
best inconvenient and at worst causes revenue loss and compromises safety.
Dynamic software updating (DSU) addresses these problems by updating
programs while they execute. Prior DSU systems for managed languages like
Java and C# lack necessary functionality: they are inefficient and do not
support updates that occur commonly in practice.
This dissertation presents the design and implementation of Jvolve, a DSU
system for Java. Jvolve's combination of flexibility, safety, and
efficiency is a significant advance over prior approaches. Our key
contribution is the extension and integration of existing Virtual Machine
services with safe, flexible, and efficient dynamic updating
functionality. Our approach is flexible enough to support a large class of
updates, guarantees type-safety, and imposes no space or time overheads on
steady-state execution.
Jvolve supports many common updates. Users can add, delete, and change
existing classes. Changes may add or remove fields and methods, replace
existing ones, and change type signatures. Changes may occur at any level
of the class hierarchy. To initialize new fields and update existing ones,
Jvolve applies class and object transformer functions, the former for
static fields and the latter for object instance fields. These features
cover many updates seen in practice. Jvolve supports 20 of 22
updates to three open-source programs – Jetty web server, JavaEmailServer,
and CrossFTP server – based on actual releases occurring over a one to two
year period. This support is substantially more flexible than prior
systems.
Jvolve is safe. It relies on bytecode verification to statically type-check
updated classes. To avoid dynamic type errors due to the timing of an
update, Jvolve stops the executing threads at a DSU safe point and then
applies the update. DSU safe points are a subset of VM safe points, where
it is safe to perform garbage collection and thread scheduling. DSU safe
points further restrict the methods that may be on each thread's stack,
depending on the update. Restricted methods include updated methods for
code consistency and safety, and user-specified methods for semantic
safety. Jvolve installs return barriers and uses on-stack replacement to
speed up reaching a safe point when necessary. While Jvolve does not
guarantee that it will reach a DSU safe point, in our multithreaded
benchmarks it almost always does.
Jvolve includes a tool that automatically generates default object
transformers which initialize new and changed fields to default values and
retain values of unchanged fields in heap objects. If needed, programmers
may customize the default transformers. Jvolve is the first dynamic
updating system to extend the…
Advisors/Committee Members: McKinley, Kathryn S. (advisor), Blackburn, Steve (committee member), Hicks, Michael (committee member), Lin, Calvin (committee member), Pingali, Keshav (committee member).
Subjects/Keywords: Programming languages; Object-oriented programming languages; Virtual Machines; Java Virtual Machines; Java; Dynamic software updating; Jvolve; Safe points; Updating code
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Subramanian, S. (2010). Dynamic software updates : a VM-centric approach. (Doctoral Dissertation). University of Texas – Austin. Retrieved from http://hdl.handle.net/2152/ETD-UT-2010-05-1436
Chicago Manual of Style (16th Edition):
Subramanian, Suriya. “Dynamic software updates : a VM-centric approach.” 2010. Doctoral Dissertation, University of Texas – Austin. Accessed January 24, 2021.
http://hdl.handle.net/2152/ETD-UT-2010-05-1436.
MLA Handbook (7th Edition):
Subramanian, Suriya. “Dynamic software updates : a VM-centric approach.” 2010. Web. 24 Jan 2021.
Vancouver:
Subramanian S. Dynamic software updates : a VM-centric approach. [Internet] [Doctoral dissertation]. University of Texas – Austin; 2010. [cited 2021 Jan 24].
Available from: http://hdl.handle.net/2152/ETD-UT-2010-05-1436.
Council of Science Editors:
Subramanian S. Dynamic software updates : a VM-centric approach. [Doctoral Dissertation]. University of Texas – Austin; 2010. Available from: http://hdl.handle.net/2152/ETD-UT-2010-05-1436
9.
Jibaja, Ivan.
Exploiting hardware heterogeneity and parallelism for performance and energy efficiency of managed languages.
Degree: PhD, Computer science, 2015, University of Texas – Austin
URL: http://hdl.handle.net/2152/33304
► On the software side, managed languages and their workloads are ubiquitous, executing on mobile, desktop, and server hardware. Managed languages boost the productivity of programmers…
(more)
▼ On the software side, managed languages and their workloads are ubiquitous, executing on mobile, desktop, and server hardware. Managed languages boost the productivity of programmers by abstracting away the hardware using virtual machine technology. On the hardware side, modern hardware increasingly exploits parallelism to boost energy efficiency and performance with homogeneous cores, heterogenous cores, graphics processing units (GPUs), and vector instructions. Two major forms of parallelism are: task parallelism on different cores and vector instructions for data parallelism. With task parallelism, the hardware allows simultaneous execution of multiple instruction pipelines through multiple cores. With data parallelism, one core can perform the same instruction on multiple pieces of data. Furthermore, we expect hardware parallelism to continue to evolve and provide more heterogeneity. Existing programming language runtimes must continuously evolve so programmers and their workloads may efficiently utilize this evolving hardware for better performance and energy efficiency. However, efficiently exploiting hardware parallelism is at odds with programmer productivity, which seeks to abstract hardware details.
My thesis is that managed language systems should and can abstract hardware parallelism with modest to no burden on developers to achieve high performance, energy efficiency, and portability on ever evolving parallel hardware. In particular, this thesis explores how the runtime can optimize and abstract heterogenous parallel hardware and how the compiler can exploit data parallelism with new high-level languages abstractions with a minimal burden on developers.
We explore solutions from multiple levels of abstraction for different types of hardware parallelism. (1) For asymmetric multicore processors (AMP) which have been recently introduced, we design and implement an application scheduler in the Java virtual machine (JVM) that requires no changes to existing Java applications. The scheduler uses feedback from dynamic analyses that automatically identify critical threads and classifies application parallelism. Our scheduler automatically accelerates critical threads, honors thread priorities, considers core availability and thread sensitivity, and load balances scalable parallel threads on big and small cores to improve the average performance by 20% and energy efficiency by 9% on frequency-scaled AMP hardware for scalable, non-scalable, and sequential workloads over prior research and existing schedulers. (2) To exploit vector instructions, we design SIMD.js, a portable single instruction multiple data (SIMD) language extension for JavaScript (JS), and implement its compiler support that together add fine-grain data parallelism to JS. Our design principles seek portability, scalable performance across various SIMD hardware implementations, performance neutral without SIMD hardware, and compiler simplicity to ease vendor adoption on multiple browsers. We introduce type speculation, compiler optimizations, and…
Advisors/Committee Members: Witchel, Emmett (advisor), McKinley, Kathryn S. (advisor), Blackburn, Stephen M (committee member), Batory, Don (committee member), Lin, Calvin (committee member).
Subjects/Keywords: Scheduling; Asymmetric; Multicore; Heterogeneous; Managed software; Languages; Performance; Energy; Parallelism; JavaScript; Java; SIMD
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Jibaja, I. (2015). Exploiting hardware heterogeneity and parallelism for performance and energy efficiency of managed languages. (Doctoral Dissertation). University of Texas – Austin. Retrieved from http://hdl.handle.net/2152/33304
Chicago Manual of Style (16th Edition):
Jibaja, Ivan. “Exploiting hardware heterogeneity and parallelism for performance and energy efficiency of managed languages.” 2015. Doctoral Dissertation, University of Texas – Austin. Accessed January 24, 2021.
http://hdl.handle.net/2152/33304.
MLA Handbook (7th Edition):
Jibaja, Ivan. “Exploiting hardware heterogeneity and parallelism for performance and energy efficiency of managed languages.” 2015. Web. 24 Jan 2021.
Vancouver:
Jibaja I. Exploiting hardware heterogeneity and parallelism for performance and energy efficiency of managed languages. [Internet] [Doctoral dissertation]. University of Texas – Austin; 2015. [cited 2021 Jan 24].
Available from: http://hdl.handle.net/2152/33304.
Council of Science Editors:
Jibaja I. Exploiting hardware heterogeneity and parallelism for performance and energy efficiency of managed languages. [Doctoral Dissertation]. University of Texas – Austin; 2015. Available from: http://hdl.handle.net/2152/33304
10.
Robatmili, Behnam.
Efficient execution of sequential applications on multicore systems.
Degree: PhD, Computer Science, 2011, University of Texas – Austin
URL: http://hdl.handle.net/2152/ETD-UT-2011-08-3987
► Conventional CMOS scaling has been the engine of the technology revolution in most application domains. This trend has changed as in each technology generation, transistor…
(more)
▼ Conventional CMOS scaling has been the engine of the technology revolution in most application domains. This trend has changed as in each technology generation, transistor densities continue to increase while due to the limits on threshold voltage scaling, per-transistor energy consumption decreases much more slowly than in the past. The power scaling issues will restrict the adaptability of designs to operate in different power and performance regimes. Consequently, future systems must employ more efficient architectures for optimizing every thread in the program across different power and performance regimes, rather than architectures that utilize more transistors. One solution is composable or dynamic multicore architectures that can span a wide range of energy/performance operating points by enabling multiple simple cores to compose to form a larger and more powerful core.
Explicit Data Graph Execution (EDGE) architectures represent a highly scalable class of composable processors that exploit predicated dataflow block execution and distributed microarchitectures. However, prior EDGE architectures suffer from several energy and performance bottlenecks including expensive intra-block operand communication due to fine-grain instruction distribution among cores,
the compiler-generated fanout trees built for high-fanout operand delivery, poor next-block prediction accuracy, and low speculation rates due to predicates and expensive refills after pipeline flushes. To design an energy-efficient and flexible dynamic multicore, this dissertation employs a systematic methodology that detects inefficiencies and then designs and evaluates solutions that
maximize power and performance efficiency across different power and performance regimes. Some innovations and optimization techniques include:
(a) Deep Block Mapping extracts more coarse-grained parallelism and reduces cross-core operand network traffic by mapping each block of instructions into the instruction queue of one core instead of distributing blocks across all composed cores as done in previous EDGE designs,
(b) Iterative Path Predictor (IPP) reduces branch and predication overheads by unifying multi-exit block target prediction and predicate path prediction while providing improved accuracy for each,
(c) Register Bypassing reduces cross-core register communication delays by bypassing register values predicted to be critical directly from producing to consuming cores,
(d) Block Reissue reduces pipeline flush penalties by reissuing instructions in previously executed instances of blocks while they are still in the instruction queue, and
(e) Exposed Operand Broadcasts (EOBs) reduce wide-fanout instruction overheads by extending the ISA to employ architecturally exposed low-overhead broadcasts combined with dataflow for efficient operand delivery for both high- and low-fanout instructions.
These components form the basis for a third-generation EDGE microarchitecture called T3. T3 improves energy efficiency by about 2x and performance by 47% compared…
Advisors/Committee Members: McKinley, Kathryn S. (advisor), Burger, Douglas C., Ph. D. (advisor), Keckler, Stephen W. (committee member), Lin, Calvin (committee member), Reinhardt, Steve (committee member).
Subjects/Keywords: Microarchitecture; EDGE; Multicore; Single-thread performance; Dataflow; Block-atomic execution; Power efficiency; Composable cores
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Robatmili, B. (2011). Efficient execution of sequential applications on multicore systems. (Doctoral Dissertation). University of Texas – Austin. Retrieved from http://hdl.handle.net/2152/ETD-UT-2011-08-3987
Chicago Manual of Style (16th Edition):
Robatmili, Behnam. “Efficient execution of sequential applications on multicore systems.” 2011. Doctoral Dissertation, University of Texas – Austin. Accessed January 24, 2021.
http://hdl.handle.net/2152/ETD-UT-2011-08-3987.
MLA Handbook (7th Edition):
Robatmili, Behnam. “Efficient execution of sequential applications on multicore systems.” 2011. Web. 24 Jan 2021.
Vancouver:
Robatmili B. Efficient execution of sequential applications on multicore systems. [Internet] [Doctoral dissertation]. University of Texas – Austin; 2011. [cited 2021 Jan 24].
Available from: http://hdl.handle.net/2152/ETD-UT-2011-08-3987.
Council of Science Editors:
Robatmili B. Efficient execution of sequential applications on multicore systems. [Doctoral Dissertation]. University of Texas – Austin; 2011. Available from: http://hdl.handle.net/2152/ETD-UT-2011-08-3987
11.
Abraham, Sarah Anne.
Fluid-based brush systems for novel digital media.
Degree: PhD, Computer science, 2015, University of Texas – Austin
URL: http://hdl.handle.net/2152/32931
► Digital media allows artists to create a wealth of visually-interesting effects that are impossible in traditional media. This includes temporal effects, such as cinemagraph animations…
(more)
▼ Digital media allows artists to create a wealth of visually-interesting effects that are impossible in traditional media. This includes temporal effects, such as cinemagraph animations and expressive fluid effects. Yet these flexible and novel media often require highly technical expertise that is outside a traditional artist's skill with paintbrush or pen. Fluid Brush acts as a form of novel digital media, which retains the brush-based interactions of traditional media while expressing the movement of turbulent and laminar flow. As a digital media controlled through a non-technical interface, Fluid Brush functions like a painting system to make fluid effects accessible to a wider range of artists. To provide an informal demonstration of the medium's effects, applications, and accessibility, we asked designers, traditional artists, and digital artists to experiment with Fluid Brush. They produced a variety of works reflective of their artistic interests and backgrounds. We also consider the development of Fluid Brush to identify practical guidelines in the design of accessible digital media.
Advisors/Committee Members: Fussell, Donald S., 1951- (advisor), Lin, Calvin (committee member), Batory, Don (committee member), Perzynski, Bogdan (committee member), Pennycook, Bruce (committee member).
Subjects/Keywords: Painting systems; Non-photorealistic rendering; Digital brushes
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Abraham, S. A. (2015). Fluid-based brush systems for novel digital media. (Doctoral Dissertation). University of Texas – Austin. Retrieved from http://hdl.handle.net/2152/32931
Chicago Manual of Style (16th Edition):
Abraham, Sarah Anne. “Fluid-based brush systems for novel digital media.” 2015. Doctoral Dissertation, University of Texas – Austin. Accessed January 24, 2021.
http://hdl.handle.net/2152/32931.
MLA Handbook (7th Edition):
Abraham, Sarah Anne. “Fluid-based brush systems for novel digital media.” 2015. Web. 24 Jan 2021.
Vancouver:
Abraham SA. Fluid-based brush systems for novel digital media. [Internet] [Doctoral dissertation]. University of Texas – Austin; 2015. [cited 2021 Jan 24].
Available from: http://hdl.handle.net/2152/32931.
Council of Science Editors:
Abraham SA. Fluid-based brush systems for novel digital media. [Doctoral Dissertation]. University of Texas – Austin; 2015. Available from: http://hdl.handle.net/2152/32931
12.
Smith, Aaron Lee, 1977-.
Explicit data graph compilation.
Degree: PhD, Computer Sciences, 2009, University of Texas – Austin
URL: http://hdl.handle.net/2152/ETD-UT-2009-12-626
► Technology trends such as growing wire delays, power consumption limits, and diminishing clock rate improvements, present conventional instruction set architectures such as RISC, CISC, and…
(more)
▼ Technology trends such as growing wire delays, power consumption limits, and diminishing clock rate improvements, present conventional instruction set architectures such as RISC, CISC, and VLIW with difficult challenges. To show continued performance growth, future microprocessors must exploit concurrency power efficiently. An important question for any future system is the division of responsibilities between programmer, compiler, and hardware to discover and exploit concurrency.
In this research we develop the first compiler for an Explicit Data Graph Execution (EDGE) architecture and show how to solve the new challenge of compiling to a block-based architecture. In EDGE architectures, the compiler is responsible for partitioning the program into a sequence of structured blocks that logically execute atomically. The EDGE ISA defines the structure of, and the restrictions on, these blocks. The TRIPS prototype processor is an EDGE architecture that employs four restrictions on blocks intended to strike a balance between software and hardware complexity. They are: (1) fixed block sizes (maximum of 128 instructions), (2) restricted number of loads and stores (no more than 32 may issue per block), (3) restricted register accesses (no more than eight reads and eight writes to each of four banks per block), and (4) constant number of block outputs (each block must always generate a constant number of register writes and stores, plus exactly one branch).
The challenges addressed in this thesis are twofold. First, we develop the algorithms and internal representations necessary to support the new structural constraints imposed by the block-based EDGE execution model. This first step provides correct execution and demonstrates the feasibility of EDGE compilers.
Next, we show how to optimize blocks using a dataflow predication model and provide results showing how the compiler is meeting this challenge on the SPEC2000 benchmarks. Using basic blocks as the baseline performance, we show that optimizations utilizing the dataflow predication model achieve up to 64% speedup on SPEC2000 with an average speedup of 31%.
Advisors/Committee Members: Burger, Douglas C., Ph. D. (advisor), John, Lizy K. (committee member), Keckler, Stephen W. (committee member), Lin, Calvin (committee member), McKinley, Kathryn S. (committee member).
Subjects/Keywords: EDGE; Computer architecture; Compilers
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Smith, Aaron Lee, 1. (2009). Explicit data graph compilation. (Doctoral Dissertation). University of Texas – Austin. Retrieved from http://hdl.handle.net/2152/ETD-UT-2009-12-626
Chicago Manual of Style (16th Edition):
Smith, Aaron Lee, 1977-. “Explicit data graph compilation.” 2009. Doctoral Dissertation, University of Texas – Austin. Accessed January 24, 2021.
http://hdl.handle.net/2152/ETD-UT-2009-12-626.
MLA Handbook (7th Edition):
Smith, Aaron Lee, 1977-. “Explicit data graph compilation.” 2009. Web. 24 Jan 2021.
Vancouver:
Smith, Aaron Lee 1. Explicit data graph compilation. [Internet] [Doctoral dissertation]. University of Texas – Austin; 2009. [cited 2021 Jan 24].
Available from: http://hdl.handle.net/2152/ETD-UT-2009-12-626.
Council of Science Editors:
Smith, Aaron Lee 1. Explicit data graph compilation. [Doctoral Dissertation]. University of Texas – Austin; 2009. Available from: http://hdl.handle.net/2152/ETD-UT-2009-12-626
13.
Gebhart, Mark Alan.
Energy-efficient mechanisms for managing on-chip storage in throughput processors.
Degree: PhD, Computer Science, 2012, University of Texas – Austin
URL: http://hdl.handle.net/2152/ETD-UT-2012-05-5141
► Modern computer systems are power or energy limited. While the number of transistors per chip continues to increase, classic Dennard voltage scaling has come to…
(more)
▼ Modern computer systems are power or energy limited. While the number of transistors per chip continues to increase, classic Dennard voltage scaling has come to an end. Therefore, architects must improve a design's energy efficiency to continue to increase performance at historical rates, while staying within a system's power limit. Throughput processors, which use a large number of threads to tolerate
memory latency, have emerged as an energy-efficient platform for
achieving high performance on diverse workloads and are found in
systems ranging from cell phones to supercomputers. This work focuses
on graphics processing units (GPUs), which contain thousands of
threads per chip.
In this dissertation, I redesign the on-chip storage system of a
modern GPU to improve energy efficiency. Modern GPUs contain very large register files that consume between 15%-20% of the
processor's dynamic energy. Most values written into the register
file are only read a single time, often within a few instructions of
being produced. To optimize for these patterns, we explore various
designs for register file hierarchies. We study both a
hardware-managed register file cache and a software-managed operand register file. We evaluate the energy tradeoffs in varying the number of levels and the capacity of each level in the hierarchy. Our most efficient design reduces register file energy by 54%.
Beyond the register file, GPUs also contain on-chip scratchpad
memories and caches. Traditional systems have a fixed partitioning
between these three structures. Applications have diverse
requirements and often a single resource is most critical to
performance. We propose to unify the register file, primary data
cache, and scratchpad memory into a single structure that is
dynamically partitioned on a per-kernel basis to match the
application's needs.
The techniques proposed in this dissertation improve the utilization of on-chip memory, a scarce resource for systems with a large number of hardware threads. Making more efficient use of on-chip memory both improves performance and reduces energy. Future efficient systems will be achieved by the combination of several such techniques which
improve energy efficiency.
Advisors/Committee Members: Keckler, Stephen W. (advisor), Burger, Douglas C. (committee member), Erez, Mattan (committee member), Fussell, Donald S. (committee member), Lin, Calvin (committee member), McKinley, Kathryn S. (committee member).
Subjects/Keywords: Energy efficiency; Multi-threading; Register file organization; Throughput computing
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Gebhart, M. A. (2012). Energy-efficient mechanisms for managing on-chip storage in throughput processors. (Doctoral Dissertation). University of Texas – Austin. Retrieved from http://hdl.handle.net/2152/ETD-UT-2012-05-5141
Chicago Manual of Style (16th Edition):
Gebhart, Mark Alan. “Energy-efficient mechanisms for managing on-chip storage in throughput processors.” 2012. Doctoral Dissertation, University of Texas – Austin. Accessed January 24, 2021.
http://hdl.handle.net/2152/ETD-UT-2012-05-5141.
MLA Handbook (7th Edition):
Gebhart, Mark Alan. “Energy-efficient mechanisms for managing on-chip storage in throughput processors.” 2012. Web. 24 Jan 2021.
Vancouver:
Gebhart MA. Energy-efficient mechanisms for managing on-chip storage in throughput processors. [Internet] [Doctoral dissertation]. University of Texas – Austin; 2012. [cited 2021 Jan 24].
Available from: http://hdl.handle.net/2152/ETD-UT-2012-05-5141.
Council of Science Editors:
Gebhart MA. Energy-efficient mechanisms for managing on-chip storage in throughput processors. [Doctoral Dissertation]. University of Texas – Austin; 2012. Available from: http://hdl.handle.net/2152/ETD-UT-2012-05-5141
14.
Lee, Byeongcheol.
Language and tool support for multilingual programs.
Degree: PhD, Computer Science, 2011, University of Texas – Austin
URL: http://hdl.handle.net/2152/ETD-UT-2011-08-4084
► Programmers compose programs in multiple languages to combine the advantages of innovations in new high-level programming languages with decades of engineering effort in legacy libraries…
(more)
▼ Programmers compose programs in multiple languages to combine the
advantages of innovations in new high-level programming languages with
decades of engineering effort in legacy libraries and systems. For
language inter-operation, language designers provide two classes of
multilingual programming interfaces: (1) foreign function interfaces
and (2) code generation interfaces. These interfaces embody the
semantic mismatch for developers and multilingual systems
builders. Their programming rules are difficult or impossible to
verify. As a direct consequence, multilingual programs are full of
bugs at interface boundaries, and debuggers cannot assist developers
across these lines.
This dissertation shows how to use composition of single language
systems and interposition to improve the safety of multilingual
programs. Our compositional approach is scalable by construction
because it does not require any changes to single-language systems,
and it leverages their engineering efforts. We show it is effective by
composing a variety of multilingual tools that help programmers
eliminate bugs. We present the first concise taxonomy and formal
description of multilingual programming interfaces and their
programming rules. We next compose three classes of multilingual
tools: (1) Dynamic bug checkers for foreign function interfaces. We
demonstrate a new approach for automatically generating a dynamic bug
checker by interposing on foreign function interfaces, and we show
that it finds bugs in real-world applications including Eclipse,
Subversion, and Java Gnome. (2) Multilingual debuggers for foreign
function interfaces. We introduce an intermediate agent that wraps all
the methods and functions at language boundaries. This intermediate
agent is sufficient to build all the essential debugging features used
in single-language debuggers. (3) Safe macros for code generation
interfaces. We design a safe macro language, called Marco, that
generates programs in any language and demonstrate it by implementing
checkers for SQL and C++ generators. To check the correctness of the
generated programs, Marco queries single-language compilers and
interpreters through code generation interfaces. Using their error
messages, Marco points out the errors in program generators.
In summary, this dissertation presents the first concise taxonomy and
formal specification of multilingual interfaces and, based on this
taxonomy, shows how to compose multilingual tools to improve safety
in multilingual programs. Our results show that our compositional
approach is scalable and effective for improving safety in real-world
multilingual programs.
Advisors/Committee Members: McKinley, Kathryn S. (advisor), Cook, William R. (committee member), Grimm, Robert (committee member), Hirzel, Martin (committee member), Kim, Miryung (committee member), Lin, Calvin (committee member).
Subjects/Keywords: Multilingual programs; Composition; Interposition; Foreign function interface (FFI); Java native interface (JNI); Python/C; Dynamic analysis; FFI bugs; Specification; Specification generation; Type checking; Separate checking; Macros; Error messages; C
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Lee, B. (2011). Language and tool support for multilingual programs. (Doctoral Dissertation). University of Texas – Austin. Retrieved from http://hdl.handle.net/2152/ETD-UT-2011-08-4084
Chicago Manual of Style (16th Edition):
Lee, Byeongcheol. “Language and tool support for multilingual programs.” 2011. Doctoral Dissertation, University of Texas – Austin. Accessed January 24, 2021.
http://hdl.handle.net/2152/ETD-UT-2011-08-4084.
MLA Handbook (7th Edition):
Lee, Byeongcheol. “Language and tool support for multilingual programs.” 2011. Web. 24 Jan 2021.
Vancouver:
Lee B. Language and tool support for multilingual programs. [Internet] [Doctoral dissertation]. University of Texas – Austin; 2011. [cited 2021 Jan 24].
Available from: http://hdl.handle.net/2152/ETD-UT-2011-08-4084.
Council of Science Editors:
Lee B. Language and tool support for multilingual programs. [Doctoral Dissertation]. University of Texas – Austin; 2011. Available from: http://hdl.handle.net/2152/ETD-UT-2011-08-4084
15.
Jeong, Min Kyu.
Core-characteristic-aware off-chip memory management in a multicore system-on-chip.
Degree: PhD, Electrical and Computer Engineering, 2012, University of Texas – Austin
URL: http://hdl.handle.net/2152/ETD-UT-2012-12-6765
► Future processors will integrate an increasing number of cores because the scaling of single-thread performance is limited and because smaller cores are more power efficient.…
(more)
▼ Future processors will integrate an increasing number of cores because the scaling of single-thread performance is limited and because smaller cores are more power efficient. Off-chip memory bandwidth that is shared between those many cores, however, scales slower than the transistor (and core) count does. As a result, in many future systems, off-chip bandwidth will become the bottleneck of heavy demand from multiple cores. Therefore, optimally managing the limited off-chip bandwidth is critical to achieving high performance and efficiency in future systems.
In this dissertation, I will develop techniques to optimize the shared use of limited off-chip memory bandwidth in chip-multiprocessors. I focus on issues that arise from the sharing and exploit the differences in memory access characteristics, such as locality, bandwidth requirement, and latency sensitivity, between the applications running in parallel and competing for the bandwidth.
First, I investigate how the shared use of memory by many cores can result in reduced spatial locality in memory accesses. I propose a technique that partitions the internal memory banks between cores in order to isolate their access streams and eliminate locality interference. The technique compensates for the reduced bank-level parallelism of each thread by employing
memory sub-ranking to effectively increase the number of independent banks. For three different workload groups that consist of benchmarks with high spatial locality, low spatial locality, and mixes of the two, the average system efficiency improves by 10%, 7%, 9% for 2-rank systems, and 18%, 25%, 20% for 1-rank systems, respectively, over the baseline shared-bank system.
Next, I improve the performance of a heterogeneous system-on-chip (SoC) in which cores have distinct memory access characteristics. I develop a deadline-aware shared memory bandwidth management scheme for SoCs that have both CPU and GPU cores. I show that statically prioritizing the CPU can severely constrict GPU performance, and propose to dynamically adapt
the priority of CPU and GPU memory requests based on the progress of GPU workload. The proposed dynamic bandwidth management scheme provides the target GPU performance while prioritizing CPU performance as much as possible, for any CPU-GPU workload combination with different complexities.
Advisors/Committee Members: Erez, Mattan (advisor), John, Lizy K. (committee member), Chiou, Derek (committee member), Lin, Calvin (committee member), Schulte, Michael J. (committee member).
Subjects/Keywords: Memory; CMP; Locality; Parallelism; SoC; GPU; QoS
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Jeong, M. K. (2012). Core-characteristic-aware off-chip memory management in a multicore system-on-chip. (Doctoral Dissertation). University of Texas – Austin. Retrieved from http://hdl.handle.net/2152/ETD-UT-2012-12-6765
Chicago Manual of Style (16th Edition):
Jeong, Min Kyu. “Core-characteristic-aware off-chip memory management in a multicore system-on-chip.” 2012. Doctoral Dissertation, University of Texas – Austin. Accessed January 24, 2021.
http://hdl.handle.net/2152/ETD-UT-2012-12-6765.
MLA Handbook (7th Edition):
Jeong, Min Kyu. “Core-characteristic-aware off-chip memory management in a multicore system-on-chip.” 2012. Web. 24 Jan 2021.
Vancouver:
Jeong MK. Core-characteristic-aware off-chip memory management in a multicore system-on-chip. [Internet] [Doctoral dissertation]. University of Texas – Austin; 2012. [cited 2021 Jan 24].
Available from: http://hdl.handle.net/2152/ETD-UT-2012-12-6765.
Council of Science Editors:
Jeong MK. Core-characteristic-aware off-chip memory management in a multicore system-on-chip. [Doctoral Dissertation]. University of Texas – Austin; 2012. Available from: http://hdl.handle.net/2152/ETD-UT-2012-12-6765

University of Texas – Austin
16.
Wiedermann, Benjamin Alan.
Integrating programming languages and databases via program analysis and language design.
Degree: PhD, Computer Sciences, 2009, University of Texas – Austin
URL: http://hdl.handle.net/2152/ETD-UT-2009-12-687
► Researchers and practitioners alike have long sought to integrate programming languages and databases. Today's integration solutions focus on the data-types of the two domains, but…
(more)
▼ Researchers and practitioners alike have long sought to integrate programming
languages and databases. Today's integration solutions focus on the data-types of
the two domains, but today's programs lack transparency. A
transparently persistent program operates over all objects
in a uniform manner, regardless of whether those objects reside in memory or in a
database. Transparency increases modularity and lowers the barrier of adoption in
industry. Unfortunately, fully transparent programs perform so poorly that no one
writes them. The goal of this dissertation is to increase the performance of
these programs to make transparent persistence a viable programming paradigm.
This dissertation contributes two novel techniques that integrate programming
languages and databases. Our first contribution – called query
extraction – is based purely on program analysis. Query extraction analyzes a
transparent, object-oriented program that retrieves and filters collections of
objects. Some of these objects may be persistent, in which case the program
contains implicit queries of persistent data. Our interprocedural program
analysis extracts these queries from the program, translates them to explicit
queries, and transforms the transparent program into an equivalent one that
contains the explicit queries. Query extraction enables programmers to write
programs in a familiar, modular style and to rely on the compiler to transform
their program into one that performs well.
Our second contribution – called RBI-DB+ – is an extension
of a new programming language construct called a batch block. A batch
block provides a syntactic barrier around transparent code. It also provides a
latency guarantee: If the batch block compiles, then the code that appears in it
requires only one client-server communication trip. Researchers previously have
proposed batch blocks for databases. However, batch blocks cannot be modularized
or composed, and database batch blocks do not permit programmers to modify
persistent data. We extend database batch blocks to address these concerns and
formalize the results.
Today's technologies integrate the data-types of programming languages and
databases, but they discourage programmers from using procedural abstraction.
Our contributions restore procedural abstraction's use in enterprise
applications, without sacrificing performance. We argue that industry should
combine our contributions with data-type integration. The result would be a
robust, practical integration of programming languages and databases.
Advisors/Committee Members: Cook, William Randall (advisor), Batory, Don (committee member), Blackburn, Stephen M. (committee member), Lin, Calvin (committee member), McKinley, Kathryn S. (committee member).
Subjects/Keywords: Programming languages; Databases; Transparent persistence; Impedance mismatch; Program analysis; Language design
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Wiedermann, B. A. (2009). Integrating programming languages and databases via program analysis and language design. (Doctoral Dissertation). University of Texas – Austin. Retrieved from http://hdl.handle.net/2152/ETD-UT-2009-12-687
Chicago Manual of Style (16th Edition):
Wiedermann, Benjamin Alan. “Integrating programming languages and databases via program analysis and language design.” 2009. Doctoral Dissertation, University of Texas – Austin. Accessed January 24, 2021.
http://hdl.handle.net/2152/ETD-UT-2009-12-687.
MLA Handbook (7th Edition):
Wiedermann, Benjamin Alan. “Integrating programming languages and databases via program analysis and language design.” 2009. Web. 24 Jan 2021.
Vancouver:
Wiedermann BA. Integrating programming languages and databases via program analysis and language design. [Internet] [Doctoral dissertation]. University of Texas – Austin; 2009. [cited 2021 Jan 24].
Available from: http://hdl.handle.net/2152/ETD-UT-2009-12-687.
Council of Science Editors:
Wiedermann BA. Integrating programming languages and databases via program analysis and language design. [Doctoral Dissertation]. University of Texas – Austin; 2009. Available from: http://hdl.handle.net/2152/ETD-UT-2009-12-687

University of Texas – Austin
17.
Chan, Ernie W., 1982-.
Application of dependence analysis and runtime data flow graph scheduling to matrix computations.
Degree: PhD, Computer Sciences, 2010, University of Texas – Austin
URL: http://hdl.handle.net/2152/ETD-UT-2010-08-1563
► We present a methodology for exploiting shared-memory parallelism within matrix computations by expressing linear algebra algorithms as directed acyclic graphs. Our solution involves a separation…
(more)
▼ We present a methodology for exploiting shared-memory parallelism within matrix computations by expressing linear algebra algorithms as directed acyclic graphs. Our solution involves a separation of concerns that completely hides the exploitation of parallelism from the code that implements the linear algebra algorithms. This approach to the problem is fundamentally different since we also address the issue of programmability instead of strictly focusing on parallelization. Using the separation of concerns, we present a framework for analyzing and developing scheduling algorithms and heuristics for this problem domain. As such, we develop a theory and practice of scheduling concepts for matrix computations in this dissertation.
Advisors/Committee Members: Van de Geijn, Robert A. (advisor), Browne, James C. (committee member), Lin, Calvin (committee member), Pingali, Keshav (committee member), Plaxton, Charles G. (committee member), Quintana-Orti, Enrique S. (committee member).
Subjects/Keywords: Matrix computation; Directed acyclic graph; Algorithm-by-blocks
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Chan, Ernie W., 1. (2010). Application of dependence analysis and runtime data flow graph scheduling to matrix computations. (Doctoral Dissertation). University of Texas – Austin. Retrieved from http://hdl.handle.net/2152/ETD-UT-2010-08-1563
Chicago Manual of Style (16th Edition):
Chan, Ernie W., 1982-. “Application of dependence analysis and runtime data flow graph scheduling to matrix computations.” 2010. Doctoral Dissertation, University of Texas – Austin. Accessed January 24, 2021.
http://hdl.handle.net/2152/ETD-UT-2010-08-1563.
MLA Handbook (7th Edition):
Chan, Ernie W., 1982-. “Application of dependence analysis and runtime data flow graph scheduling to matrix computations.” 2010. Web. 24 Jan 2021.
Vancouver:
Chan, Ernie W. 1. Application of dependence analysis and runtime data flow graph scheduling to matrix computations. [Internet] [Doctoral dissertation]. University of Texas – Austin; 2010. [cited 2021 Jan 24].
Available from: http://hdl.handle.net/2152/ETD-UT-2010-08-1563.
Council of Science Editors:
Chan, Ernie W. 1. Application of dependence analysis and runtime data flow graph scheduling to matrix computations. [Doctoral Dissertation]. University of Texas – Austin; 2010. Available from: http://hdl.handle.net/2152/ETD-UT-2010-08-1563
.