University of Michigan
Hardware solutions to reduce effective memory access time.
Degree: PhD, Electrical engineering, 2001, University of Michigan
In this dissertation, we provide hardware solutions to increase the efficiency of the cache hierarchy, thereby reducing the effective memory access time. Specifically, we focus on two approaches to reduce the effective memory access time, namely prefetching and novel cache designs. In the first part of this dissertation we show that traditional metrics like coverage and accuracy may be inadequate for evaluating the effectiveness of a prefetch algorithm. Our main contribution is the development of a prefetch traffic and miss taxonomy (PTMT) that provides a complete classification of all the prefetches; in particular, the PTMT classification precisely quantifies the direct and indirect effect of each prefetch on traffic and misses. We show that while most instruction prefetch algorithms do achieve a substantial reduction in misses, they fail to issue the prefetches in a timely fashion. Our branch history guided hardware prefetch algorithm (BHGP) improves the timeliness of instruction prefetches. Our results show that BHGP on average eliminates 66% of the I-cache misses for some important commercial and Windows-NT applications and some applications from the CPU2000 suite that have high I-cache misses. In addition, BHGP improves IPC by 14 to 18% for the CPU2000 applications studied. In the second part of this dissertation, we explore novel cache designs to reduce the effective L1 cache access time in the light of current technology trends. We show that the straightforward approach of adding one more level of memory hierarchy, an L0 cache between the processor and the L1 cache, does not always reduce the effective cache access time because of the high miss rate of the L0 cache and the small difference in access latency between L0 and L1. We develop a split latency cache system (SpliCS), which is an enhanced version of the traditional (L0 + L1) system and uses 2 primary caches: a small fast cache (A) and a larger, slower cache (B). Our experiments show that, relative to a similarly configured L1 cache alone, SpliCS achieves an 8% or 18% reduction in CPI with a cache B latency of 3 or 5 cycles, respectively. Moreover, SpliCS achieves an average 15% improvement in CPI relative to a traditional (L0 + L1) hierarchy. Abstract shortened by UMI.)
Advisors/Committee Members: Davidson, Edward S. (advisor), Tyson, Gary S. (advisor).
Subjects/Keywords: Branch History Guided; Branch History-guided; Effective; Hardware Prefetch; Memory Access Time; Reduce; Solutions; Split Latency Cache
to Zotero / EndNote / Reference
APA (6th Edition):
Srinivasan, V. (2001). Hardware solutions to reduce effective memory access time. (Doctoral Dissertation). University of Michigan. Retrieved from http://hdl.handle.net/2027.42/124069
Chicago Manual of Style (16th Edition):
Srinivasan, Vijayalakshmi. “Hardware solutions to reduce effective memory access time.” 2001. Doctoral Dissertation, University of Michigan. Accessed January 22, 2021.
MLA Handbook (7th Edition):
Srinivasan, Vijayalakshmi. “Hardware solutions to reduce effective memory access time.” 2001. Web. 22 Jan 2021.
Srinivasan V. Hardware solutions to reduce effective memory access time. [Internet] [Doctoral dissertation]. University of Michigan; 2001. [cited 2021 Jan 22].
Available from: http://hdl.handle.net/2027.42/124069.
Council of Science Editors:
Srinivasan V. Hardware solutions to reduce effective memory access time. [Doctoral Dissertation]. University of Michigan; 2001. Available from: http://hdl.handle.net/2027.42/124069