You searched for +publisher:"University of Michigan" +contributor:("Tyson, Gary S.")
.
Showing records 1 – 6 of
6 total matches.
No search limiters apply to these results.

University of Michigan
1.
Lee, Hsien-Hsin Sean.
Improving energy and performance of data cache architectures by exploiting memory reference characteristics.
Degree: PhD, Electrical engineering, 2001, University of Michigan
URL: http://hdl.handle.net/2027.42/127956
► Minimizing power, increasing performance, and delivering effective memory bandwidth are today's primary microprocessor design goals for the embedded, high-end and multimedia workstation markets. In this…
(more)
▼ Minimizing power, increasing performance, and delivering effective memory bandwidth are today'
s primary microprocessor design goals for the embedded, high-end and multimedia workstation markets. In this dissertation, I will discuss three major data cache architecture design optimization techniques, each of which exploits the data memory reference characteristics of the applications written in high-level languages. Through a better understanding of the memory reference behavior, we can design a system that executes at higher performance, while consuming less energy, and delivering more effective memory bandwidth. The first part of this dissertation presents an in-depth characterization of data memory references, including analysis of semantic region accesses and behavior of data stores. This analysis leads to a new organization of the data cache hierarchy called Region-based Cachelets. Region-based Cachelets are capable of improving memory performance of embedded applications while significantly reducing dynamic energy consumption, resulting in a 50% to 70% improvement in energy-delay product efficiency using this approach. Following this, I will discuss a new cache-like structure, the Stack Value File (or SVF), which boosts performance of general purpose applications by routing stack data references to a separate storage structure optimized for the unique characteristics of the stack reference substream. By utilizing a custom structure for stack references, we are able to increase memory level parallelism, reduce memory latency, and reduce off-chip memory activity. The performance can be improved by 24% by implementing an 8KB SVF for a processor with a dual-ported L1 cache. Finally, I will address memory bandwidth issues by proposing a new write policy called Eager Writeback which can effectively improve overall system performance by shifting the writings of dirty cache lines from on-demand to times when the memory bus is less congested. It lessens the criticality of on-demand misses and improves performance by 6% to 16% for the 3D graphics geometry pipeline.
Advisors/Committee Members: Tyson, Gary S. (advisor).
Subjects/Keywords: Architectures; Characteristics; Data Cache; Energy; Exploiting; Improving; Memory Reference; Microprocessor Caches; Performance; Writeback Policy
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Lee, H. S. (2001). Improving energy and performance of data cache architectures by exploiting memory reference characteristics. (Doctoral Dissertation). University of Michigan. Retrieved from http://hdl.handle.net/2027.42/127956
Chicago Manual of Style (16th Edition):
Lee, Hsien-Hsin Sean. “Improving energy and performance of data cache architectures by exploiting memory reference characteristics.” 2001. Doctoral Dissertation, University of Michigan. Accessed January 22, 2021.
http://hdl.handle.net/2027.42/127956.
MLA Handbook (7th Edition):
Lee, Hsien-Hsin Sean. “Improving energy and performance of data cache architectures by exploiting memory reference characteristics.” 2001. Web. 22 Jan 2021.
Vancouver:
Lee HS. Improving energy and performance of data cache architectures by exploiting memory reference characteristics. [Internet] [Doctoral dissertation]. University of Michigan; 2001. [cited 2021 Jan 22].
Available from: http://hdl.handle.net/2027.42/127956.
Council of Science Editors:
Lee HS. Improving energy and performance of data cache architectures by exploiting memory reference characteristics. [Doctoral Dissertation]. University of Michigan; 2001. Available from: http://hdl.handle.net/2027.42/127956

University of Michigan
2.
Srinivasan, Vijayalakshmi.
Hardware solutions to reduce effective memory access time.
Degree: PhD, Electrical engineering, 2001, University of Michigan
URL: http://hdl.handle.net/2027.42/124069
► In this dissertation, we provide hardware solutions to increase the efficiency of the cache hierarchy, thereby reducing the effective memory access time. Specifically, we focus…
(more)
▼ In this dissertation, we provide hardware solutions to increase the efficiency of the cache hierarchy, thereby reducing the effective memory access time. Specifically, we focus on two approaches to reduce the effective memory access time, namely prefetching and novel cache designs. In the first part of this dissertation we show that traditional metrics like coverage and accuracy may be inadequate for evaluating the effectiveness of a prefetch algorithm. Our main contribution is the development of a prefetch traffic and miss taxonomy (PTMT) that provides a complete classification of all the prefetches; in particular, the PTMT classification precisely quantifies the direct and indirect effect of each prefetch on traffic and misses. We show that while most instruction prefetch algorithms do achieve a substantial reduction in misses, they fail to issue the prefetches in a timely fashion. Our branch history guided hardware prefetch algorithm (BHGP) improves the timeliness of instruction prefetches. Our results show that BHGP on average eliminates 66% of the I-cache misses for some important commercial and Windows-NT applications and some applications from the CPU2000 suite that have high I-cache misses. In addition, BHGP improves IPC by 14 to 18% for the CPU2000 applications studied. In the second part of this dissertation, we explore novel cache designs to reduce the effective L1 cache access time in the light of current technology trends. We show that the straightforward approach of adding one more level of memory hierarchy, an L0 cache between the processor and the L1 cache, does not always reduce the effective cache access time because of the high miss rate of the L0 cache and the small difference in access latency between L0 and L1. We develop a split latency cache system (SpliCS), which is an enhanced version of the traditional (L0 + L1) system and uses 2 primary caches: a small fast cache (A) and a larger, slower cache (B). Our experiments show that, relative to a similarly configured L1 cache alone, SpliCS achieves an 8% or 18% reduction in CPI with a cache B latency of 3 or 5 cycles, respectively. Moreover, SpliCS achieves an average 15% improvement in CPI relative to a traditional (L0 + L1) hierarchy. Abstract shortened by UMI.)
Advisors/Committee Members: Davidson, Edward S. (advisor), Tyson, Gary S. (advisor).
Subjects/Keywords: Branch History Guided; Branch History-guided; Effective; Hardware Prefetch; Memory Access Time; Reduce; Solutions; Split Latency Cache
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Srinivasan, V. (2001). Hardware solutions to reduce effective memory access time. (Doctoral Dissertation). University of Michigan. Retrieved from http://hdl.handle.net/2027.42/124069
Chicago Manual of Style (16th Edition):
Srinivasan, Vijayalakshmi. “Hardware solutions to reduce effective memory access time.” 2001. Doctoral Dissertation, University of Michigan. Accessed January 22, 2021.
http://hdl.handle.net/2027.42/124069.
MLA Handbook (7th Edition):
Srinivasan, Vijayalakshmi. “Hardware solutions to reduce effective memory access time.” 2001. Web. 22 Jan 2021.
Vancouver:
Srinivasan V. Hardware solutions to reduce effective memory access time. [Internet] [Doctoral dissertation]. University of Michigan; 2001. [cited 2021 Jan 22].
Available from: http://hdl.handle.net/2027.42/124069.
Council of Science Editors:
Srinivasan V. Hardware solutions to reduce effective memory access time. [Doctoral Dissertation]. University of Michigan; 2001. Available from: http://hdl.handle.net/2027.42/124069

University of Michigan
3.
Rivers, Jude A.
Performance aspects of high-bandwidth multi-lateral cache organizations.
Degree: PhD, Computer science, 1998, University of Michigan
URL: http://hdl.handle.net/2027.42/131086
► As the issue widths of processors continue to increase, efficient data supply will become ever more critical. Unfortunately, with processor speeds increasing faster than memory…
(more)
▼ As the issue widths of processors continue to increase, efficient data supply will become ever more critical. Unfortunately, with processor speeds increasing faster than memory speeds, supplying data efficiently will continue to be more and more difficult. Attempts to address this issue have included reducing the effective latency of memory accesses and increasing the available access bandwidth to the primary data cache. However, each of these two techniques is often proposed and evaluated in the absence of the other. This dissertation proposes and evaluates solutions for the latency and the bandwidth aspects of data supply, and a cache structure that incorporates both solutions. To solve the latency problem, we use the multi-lateral caching paradigm for active data placement and management in the L1 cache. The multi-lateral paradigm, which advocates the partitioning of the L1 cache into multiple subcaches, emerges from the fundamental belief that several classes of data elements have distinct reference or behavior characteristics that should be exploited differentially. However, some criterion must be found upon which to partition the reference stream into appropriate classes for selective caching. We demonstrate the value of temporality-based caching as an appropriate criterion with the introduction of the Non-Temporal Streaming (NTS) Cache, which performs better than larger single-structure caches. The NTS Cache utilizes data reuse information for intelligent data placement and active management of its 2-unit multi-lateral structure. To solve the bandwidth problem, we analyze the scalability of current multiporting approaches as data access parallelism increases. Currently, multibanking and multiporting through replication are the two popular and implementable approaches to providing multiple ports. However, neither approach appears to be scalable with increasing data access parallelism. Whereas the multibanking technique suffers performance degradation because of bank conflicts, the replication technique is die area limited and degraded by the need to broadcast stores for coherence. Multibanking is economical in terms of die area requirements and design complexity. Analysis of the SPEC95 memory reference streams reveals that the majority of all bank conflicts are due to nearby references that map into the same cache line of the same cache bank. Our solution to the bank conflict problem is the Locality-Based Interleaved Cache (LBIC), which is built on traditional multibanking, but exploits same line spatial locality to obtain performance similar to true multiporting at far less cost. Finally, this dissertation explores the effectiveness of reducing the average data access time via active management of a cache space in conjunction with high bandwidth techniques. Experimental results show that adding multi-lateral caching to an LBIC results in a cache structure capable of performing as well or better than traditionally managed single-structure LBIC caches of nearly twice the size.
Advisors/Committee Members: Davidson, Edward S. (advisor), Tyson, Gary S. (advisor).
Subjects/Keywords: Aspects; Bandwidth; Cache; High; Lateral; Management; Multi; Organizations; Performance
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Rivers, J. A. (1998). Performance aspects of high-bandwidth multi-lateral cache organizations. (Doctoral Dissertation). University of Michigan. Retrieved from http://hdl.handle.net/2027.42/131086
Chicago Manual of Style (16th Edition):
Rivers, Jude A. “Performance aspects of high-bandwidth multi-lateral cache organizations.” 1998. Doctoral Dissertation, University of Michigan. Accessed January 22, 2021.
http://hdl.handle.net/2027.42/131086.
MLA Handbook (7th Edition):
Rivers, Jude A. “Performance aspects of high-bandwidth multi-lateral cache organizations.” 1998. Web. 22 Jan 2021.
Vancouver:
Rivers JA. Performance aspects of high-bandwidth multi-lateral cache organizations. [Internet] [Doctoral dissertation]. University of Michigan; 1998. [cited 2021 Jan 22].
Available from: http://hdl.handle.net/2027.42/131086.
Council of Science Editors:
Rivers JA. Performance aspects of high-bandwidth multi-lateral cache organizations. [Doctoral Dissertation]. University of Michigan; 1998. Available from: http://hdl.handle.net/2027.42/131086

University of Michigan
4.
Tam, Edward S.
Improving cache performance via active management.
Degree: PhD, Electrical engineering, 1999, University of Michigan
URL: http://hdl.handle.net/2027.42/132017
► This dissertation analyzes a way to improve cache performance via active management of a target cache space. As microprocessor speeds continue to grow faster than…
(more)
▼ This dissertation analyzes a way to improve cache performance via active management of a target cache space. As microprocessor speeds continue to grow faster than memory subsystem speeds, minimizing the average data access time grows in importance. As current data caches are often poorly and inefficiently managed, a good management technique can improve the average data access time. Cache management involves two main processes: block allocation decisions and block replacement decisions. Active block allocation can be performed most efficiently in multilateral caches (several distinct data stores with disjoint contents placed in parallel within L1), where blocks exhibiting particular characteristics can be placed in the appropriate store. To aid in our evaluation of different active block management schemes, we have developed a multilateral cache simulator, mlcache, which provides a platform whereby different cache schemes can easily be specified, and produces evaluation statistics that can help explain their performance. Using mlcache, we have been able to evaluate the performance of proposed multilateral cache schemes and to derive new, better performing schemes. Our results show that multilateral schemes outperform traditional caches of similar size and often rival the performance of traditional caches nearly twice as large. However, the performance difference between previously-proposed implementable schemes and a multilateral configuration that uses a non-implementable near-optimal replacement policy is large. This disparity is due mainly to the simple prediction strategies presently used in the implementable schemes, along with their limited management of blocks while resident in the L1 cache structure. We introduce a new multilateral allocation scheme, Allocation By Conflict (ABC), which outperforms all previously proposed reuse-based multilateral configurations and performs comparably to multilateral schemes that have significantly more hardware requirements (particularly Victim, which requires a data path between its A and B caches). The ABC scheme incurs the lowest hardware cost of any of the proposed multilateral schemes, yet it performs the highest and is the most easily implementable. The ABC scheme requires the addition of only a single additional bit per block in cache A and a very simple logic circuit for making the allocation decisions. The ABC scheme'
s performance advantage also scales well as the sizes of the caches are increased and as the associativity of the A cache is increased.
Advisors/Committee Members: Davidson, Edward S. (advisor), Tyson, Gary S. (advisor).
Subjects/Keywords: Active Management; Allocation By Conflict; Cache; Improving; Performance; Via
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Tam, E. S. (1999). Improving cache performance via active management. (Doctoral Dissertation). University of Michigan. Retrieved from http://hdl.handle.net/2027.42/132017
Chicago Manual of Style (16th Edition):
Tam, Edward S. “Improving cache performance via active management.” 1999. Doctoral Dissertation, University of Michigan. Accessed January 22, 2021.
http://hdl.handle.net/2027.42/132017.
MLA Handbook (7th Edition):
Tam, Edward S. “Improving cache performance via active management.” 1999. Web. 22 Jan 2021.
Vancouver:
Tam ES. Improving cache performance via active management. [Internet] [Doctoral dissertation]. University of Michigan; 1999. [cited 2021 Jan 22].
Available from: http://hdl.handle.net/2027.42/132017.
Council of Science Editors:
Tam ES. Improving cache performance via active management. [Doctoral Dissertation]. University of Michigan; 1999. Available from: http://hdl.handle.net/2027.42/132017

University of Michigan
5.
Geiger, Michael J.
Improving performance and energy consumption in region-based caching architectures.
Degree: PhD, Electrical engineering, 2006, University of Michigan
URL: http://hdl.handle.net/2027.42/126157
► Embedded systems must simultaneously deliver high performance and low energy consumption. Meeting these goals requires customized designs that fit the requirements of the targeted applications.…
(more)
▼ Embedded systems must simultaneously deliver high performance and low energy consumption. Meeting these goals requires customized designs that fit the requirements of the targeted applications. This philosophy of tailoring the implementation to the domain applies to all subsystems in the embedded architecture. For the memory system, which is a key performance bottleneck and a significant source of energy consumption, generic caching strategies are insufficient. The system requires specialized cache structures that match the manner in which programmers use data. Since different data subsets exhibit varying degrees of locality, a partitioned cache offers the best opportunity to optimize performance and energy consumption for all memory accesses. In this dissertation, I explore several different methods that utilize partitioning for an energy-efficient, high performance data cache. Region-based caching, which replaces a unified data cache with multiple caches optimized for stack, global, and heap references, serves as the starting point for this research. I begin by addressing energy consumption in the level one data cache. Drowsy region-based caches combine static and dynamic energy reduction techniques to simultaneously lower both sources of energy consumption. This combination performs better than the sum of its parts because the separate region caches allow us to use different degrees of drowsy caching. I then show how additional cache partitioning can further reduce energy consumption, presenting a scheme to identify highly local data in the heap region and route their accesses to a smaller cache. I also study methods for improving memory system performance. The effectiveness of data prefetching can be increased by partitioning the cache in a manner that isolates data that prefetch well. Finally, I discuss how to reallocate data within region-based caches to eliminate unnecessary cache misses.
Advisors/Committee Members: Mudge, Trevor N. (advisor), Tyson, Gary S. (advisor).
Subjects/Keywords: Architectures; Based; Drowsy Caching; Energy Consumption; Improving; Performance; Prefetching; Region
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Geiger, M. J. (2006). Improving performance and energy consumption in region-based caching architectures. (Doctoral Dissertation). University of Michigan. Retrieved from http://hdl.handle.net/2027.42/126157
Chicago Manual of Style (16th Edition):
Geiger, Michael J. “Improving performance and energy consumption in region-based caching architectures.” 2006. Doctoral Dissertation, University of Michigan. Accessed January 22, 2021.
http://hdl.handle.net/2027.42/126157.
MLA Handbook (7th Edition):
Geiger, Michael J. “Improving performance and energy consumption in region-based caching architectures.” 2006. Web. 22 Jan 2021.
Vancouver:
Geiger MJ. Improving performance and energy consumption in region-based caching architectures. [Internet] [Doctoral dissertation]. University of Michigan; 2006. [cited 2021 Jan 22].
Available from: http://hdl.handle.net/2027.42/126157.
Council of Science Editors:
Geiger MJ. Improving performance and energy consumption in region-based caching architectures. [Doctoral Dissertation]. University of Michigan; 2006. Available from: http://hdl.handle.net/2027.42/126157

University of Michigan
6.
Cheng, Allen Chao-Hung.
Application-specific architecture framework for high-performance low -power embedded computing.
Degree: PhD, Electrical engineering, 2006, University of Michigan
URL: http://hdl.handle.net/2027.42/125632
► The design space of embedded systems is enormously large. These embedded applications have strict requirements on power consumption, performance, cost, and time to market. It…
(more)
▼ The design space of embedded systems is enormously large. These embedded applications have strict requirements on power consumption, performance, cost, and time to market. It is extremely critical to fully address these constraints when designing microprocessors for embedded systems. This dissertation proposes Framework-based Instruction-set Tuning Synthesis (FITS). FITS is an architectural and microarchitectural innovation that effectively tackles all the above requirements. FITS reduces power consumption by running applications with half the code size and much improved locality, which allows the use of a smaller instruction cache that achieves higher hit rates while requiring less power to operate. FITS improves the performance through custom-tailored application-specific instruction set architecture (ISA) and ground-breaking micro architectural enhancement. The application-specific instruction set tailoring is achieved by synthesizing ISA to match precisely the requirements of the targeted application. The microarchitecture is enhanced by integrating the revolutionary Versatile Integrated Processing (VIP) unit and a Zero-Overhead Loop Execution (ZOLE) unit into it. The VIP unit is a universal data-crunching engine that delivers superior data computing and data streaming performances. The ZOLE unit streamlines the program control flow by removing expensive loop control overhead from both nested and non-nested loops. Both architectural and microarchitectural innovations are accomplished by replacing the fixed instruction decoder of general-purpose embedded processors with a programmable decoder. Using a programmable decoder decouples the microarchitecture from the ISA so that designers can add new capabilities to the microarchitecture without being restricted by the limited instruction space. A general-purpose, fully-capable microarchitecture reduces the design cost and shortens the time to market by leveraging fabrication advantages of a single-chip solution that can amortize high non-recurring engineering cost and long turnaround design cycle through mass production. Through the use of a programmable decoder, and an enhanced general-purpose microarchitecture equipped with VIP and ZOLE, FITS pioneers a new genre of embedded microprocessors that can achieve application-specific processor performance and low energy consumption, while maintaining the fabrication advantages of a mass-produced single-chip solution that yields low production cost and fast time to market.
Advisors/Committee Members: Tyson, Gary S. (advisor), Mudge, Trevor N. (advisor).
Subjects/Keywords: Application; Computer Architecture; Embedded Computing; Framework; High; Instruction Sets; Low-power; Microprocessors; Performance; Specific
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Cheng, A. C. (2006). Application-specific architecture framework for high-performance low -power embedded computing. (Doctoral Dissertation). University of Michigan. Retrieved from http://hdl.handle.net/2027.42/125632
Chicago Manual of Style (16th Edition):
Cheng, Allen Chao-Hung. “Application-specific architecture framework for high-performance low -power embedded computing.” 2006. Doctoral Dissertation, University of Michigan. Accessed January 22, 2021.
http://hdl.handle.net/2027.42/125632.
MLA Handbook (7th Edition):
Cheng, Allen Chao-Hung. “Application-specific architecture framework for high-performance low -power embedded computing.” 2006. Web. 22 Jan 2021.
Vancouver:
Cheng AC. Application-specific architecture framework for high-performance low -power embedded computing. [Internet] [Doctoral dissertation]. University of Michigan; 2006. [cited 2021 Jan 22].
Available from: http://hdl.handle.net/2027.42/125632.
Council of Science Editors:
Cheng AC. Application-specific architecture framework for high-performance low -power embedded computing. [Doctoral Dissertation]. University of Michigan; 2006. Available from: http://hdl.handle.net/2027.42/125632
.