You searched for subject:(Caches)
.
Showing records 1 – 30 of
60 total matches.
◁ [1] [2] ▶

Penn State University
1.
Muralidhara, Sai Prashanth.
Reducing Interference in Memory Hierarchy Resources Using Application Aware Management.
Degree: 2011, Penn State University
URL: https://submit-etda.libraries.psu.edu/catalog/12150
► Aggressive technology scaling has resulted in an increase in number of cores being integrated on-chip. While on-chip cores are increasing at a fast rate, the…
(more)
▼ Aggressive technology scaling has resulted in an increase in number of cores being integrated on-chip. While on-chip cores are increasing at a fast rate, the memory hierarchy resources are scaling at a much slower pace. The memory resources, such as different levels of on-chip cache and the off-chip memory bandwidth, are costly and are often shared across multiple on-chip cores.
This leads to multiple applications contending for access to these common resources. In the process, these applications can harmfully interfere with one another, and, this interference can result in significant degradation of both system throughput and individual application performance. Therefore, intelligently managing the shared memory resources by mitigating inter-application interference is vitally important in emerging multicore systems.
This dissertation makes three key contributions towards addressing the above problem of interference in shared memory resources. First, this dissertation considers the last-level shared cache and the off-chip memory as instances of shared memory resources, and, studies the causes and different ways in which applications interfere with one another while contending for a resource.
Second, this dissertation studies the negative impact of resource contention on application and system performances. Third, this dissertation proposes novel schemes to mitigate inter-application interference and thereby improve system and application performance. These schemes aim to efficiently manage the resources in an application aware manner with the goal of mitigating the overall inter-application interference. An application aware resource management scheme considers the memory access characteristics of all the contending applications and uses this information to manage the shared resource. The resource management decisions are based on two key principles: 1) isolating applications/threads that harmfully interfere from each other by partitioning the resources between the interfering applications, and, 2) deciding the size of the resource partition that an application gets based on its memory access characteristics and requirements.
The trend of integrating increasing number of cores on a single chip is projected to continue into the future. This continued scaling is propelling the parallel computation capability of emerging multicore systems. Efficient management of shared memory hierarchy resources will become ever more important in the future if we are to ensure that applications extract the maximum possible parallelism from these multicore systems. This dissertation takes an important step towards addressing this problem by proposing novel schemes to efficiently manage multiple memory hierarchy resources. These schemes are very effective in practice, improving both system performance and individual application performance.
Advisors/Committee Members: Mahmut Taylan Kandemir, Dissertation Advisor/Co-Advisor, Mahmut Taylan Kandemir, Committee Chair/Co-Chair, Padma Raghavan, Committee Member, Mary Jane Irwin, Committee Member, Qian Wang, Committee Member.
Subjects/Keywords: Multicores; memory hierarchy; caches; DRAM
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Muralidhara, S. P. (2011). Reducing Interference in Memory Hierarchy Resources Using Application Aware Management. (Thesis). Penn State University. Retrieved from https://submit-etda.libraries.psu.edu/catalog/12150
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
Chicago Manual of Style (16th Edition):
Muralidhara, Sai Prashanth. “Reducing Interference in Memory Hierarchy Resources Using Application Aware Management.” 2011. Thesis, Penn State University. Accessed March 06, 2021.
https://submit-etda.libraries.psu.edu/catalog/12150.
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
MLA Handbook (7th Edition):
Muralidhara, Sai Prashanth. “Reducing Interference in Memory Hierarchy Resources Using Application Aware Management.” 2011. Web. 06 Mar 2021.
Vancouver:
Muralidhara SP. Reducing Interference in Memory Hierarchy Resources Using Application Aware Management. [Internet] [Thesis]. Penn State University; 2011. [cited 2021 Mar 06].
Available from: https://submit-etda.libraries.psu.edu/catalog/12150.
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
Council of Science Editors:
Muralidhara SP. Reducing Interference in Memory Hierarchy Resources Using Application Aware Management. [Thesis]. Penn State University; 2011. Available from: https://submit-etda.libraries.psu.edu/catalog/12150
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation

Penn State University
2.
Kim, Soontae.
ENERGY-EFFICIENT HIGH PERFORMANCE CACHE ARCHITECTURES.
Degree: 2008, Penn State University
URL: https://submit-etda.libraries.psu.edu/catalog/6183
► The demand for high-performance architectures and powerful battery-operated mobile devices has accentuated the need for low-power systems. In many media and embedded applications, the memory…
(more)
▼ The demand for high-performance architectures and powerful battery-operated mobile devices has
accentuated the need for low-power systems.
In many media and embedded applications, the memory system can consume more than 50%
of the overall system energy, making this a ripe candidate for optimization.
Also,
caches play an important role in performance by filtering a large percentage of main memory accesses that would take long latencies.
To address this increasingly important problem, this thesis studies energy-efficient high performance
cache architectures
that can have a significant impact on the overall system energy consumption and performance.
This thesis makes four contributions to this end.
The first contribution focuses on partitioning the cache resources architecturally for energy and performance optimizations. Specifically, this thesis investigates splitting the cache into several smaller units, each of which is a cache by itself (called subcache). The proposed subcache architecture employs page-based placement, dynamic page re-mapping, and subcache prediction policies in order to improve the memory system energy and performance, especially for instruction accesses.
As technology scales down into deep-submicron, leakage energy is becoming a dominant source of energy consumption. Leakage energy is generally proportional to the number of transistors in a circuit and
caches constitute a large portion of the die transistor count. Most techniques have targeted cell leakage energy minimization; bitline leakage is critical as well. To this end, this thesis proposes a predictive precharging scheme for minimizing bitline leakage as its second contribution.
Many of the recently proposed techniques to reduce power consumption in
caches introduce an additional level of non-determinism in cache access latency.
Due to this additional latency, instructions dependent on a non-deterministic load and speculatively issued must be re-executed as they will not have the correct data in time. This penalty can potentially offset the claimed power benefits of using such low-power
caches. To address this problem, this thesis proposes an early cache set resolution scheme as its third contribution.
Increasing clock frequencies and issue rates aggravate the memory latency problem, imposing higher memory bandwidth requirements. While
caches can be multi-ported for providing high memory bandwidth, increase in their access latency with more ports limits their potential. This thesis proposes a novel temporal cache architecture, as the fourth contribution, for improving performance and reducing energy consumption by satisfying a large percentage of loads from a small power-efficient temporal cache early in the pipeline.
Advisors/Committee Members: Vijaykrishnan Narayanan, Committee Chair/Co-Chair, Mahmut Taylan Kandemir, Committee Member, Mary Jane Irwin, Committee Member, Richard Brooks, Committee Member.
Subjects/Keywords: energy; performance; caches
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Kim, S. (2008). ENERGY-EFFICIENT HIGH PERFORMANCE CACHE ARCHITECTURES. (Thesis). Penn State University. Retrieved from https://submit-etda.libraries.psu.edu/catalog/6183
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
Chicago Manual of Style (16th Edition):
Kim, Soontae. “ENERGY-EFFICIENT HIGH PERFORMANCE CACHE ARCHITECTURES.” 2008. Thesis, Penn State University. Accessed March 06, 2021.
https://submit-etda.libraries.psu.edu/catalog/6183.
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
MLA Handbook (7th Edition):
Kim, Soontae. “ENERGY-EFFICIENT HIGH PERFORMANCE CACHE ARCHITECTURES.” 2008. Web. 06 Mar 2021.
Vancouver:
Kim S. ENERGY-EFFICIENT HIGH PERFORMANCE CACHE ARCHITECTURES. [Internet] [Thesis]. Penn State University; 2008. [cited 2021 Mar 06].
Available from: https://submit-etda.libraries.psu.edu/catalog/6183.
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
Council of Science Editors:
Kim S. ENERGY-EFFICIENT HIGH PERFORMANCE CACHE ARCHITECTURES. [Thesis]. Penn State University; 2008. Available from: https://submit-etda.libraries.psu.edu/catalog/6183
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation

University of Toronto
3.
Soares, Livio.
Operating System Techniques for Reducing Processor State Pollution.
Degree: 2012, University of Toronto
URL: http://hdl.handle.net/1807/32895
► Application performance on modern processors has become increasingly dictated by the use of on-chip structures, such as caches and look-aside buffers. The hierarchical (multi-leveled) design…
(more)
▼ Application performance on modern processors has become increasingly dictated by the use of on-chip structures, such as caches and look-aside buffers. The hierarchical (multi-leveled) design of processor structures, the ubiquity of multicore processor architectures, as well as the increasing
relative cost of accessing memory have all contributed to this condition. Our thesis is that operating systems should provide services and mechanisms for applications to more efficiently utilize on-chip processor structures. As such, this dissertation demonstrates how the operating system can
improve processor efficiency of applications through specific techniques.
Two operating system services are investigated: (1) improving secondary and last-level cache utilization through a run-time cache filtering technique, and (2) improving the processor efficiency of system intensive applications through a new exception-less system call mechanism. With the first mechanism, we introduce the concept of a software pollute buffer and show that it can be used effectively at run-time, with assistance from commodity hardware performance counters, to reduce pollution of secondary on-chip caches.
In the second mechanism, we are able to decouple application and operating system execution, showing the benefits of the reduced interference in various processor components such as the first level instruction and data caches, second level caches and branch predictor. We show that exception-less system calls are particularly effective on modern multicore processors. We explore two ways for applications to use exception-less system calls. The first way, which is completely transparent to the application, uses multi-threading to hide asynchronous communication between the operating system kernel and the application. In the second way, we propose that applications can directly use the exception-less system call interface by designing programs that follow an event-driven architecture.
PhD
Advisors/Committee Members: Stumm, Michael, Electrical and Computer Engineering.
Subjects/Keywords: Operating Systems; Processors; Processor Caches; Multicore; 0984
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Soares, L. (2012). Operating System Techniques for Reducing Processor State Pollution. (Doctoral Dissertation). University of Toronto. Retrieved from http://hdl.handle.net/1807/32895
Chicago Manual of Style (16th Edition):
Soares, Livio. “Operating System Techniques for Reducing Processor State Pollution.” 2012. Doctoral Dissertation, University of Toronto. Accessed March 06, 2021.
http://hdl.handle.net/1807/32895.
MLA Handbook (7th Edition):
Soares, Livio. “Operating System Techniques for Reducing Processor State Pollution.” 2012. Web. 06 Mar 2021.
Vancouver:
Soares L. Operating System Techniques for Reducing Processor State Pollution. [Internet] [Doctoral dissertation]. University of Toronto; 2012. [cited 2021 Mar 06].
Available from: http://hdl.handle.net/1807/32895.
Council of Science Editors:
Soares L. Operating System Techniques for Reducing Processor State Pollution. [Doctoral Dissertation]. University of Toronto; 2012. Available from: http://hdl.handle.net/1807/32895

University of Arizona
4.
Gajaria, Dhruv Mayur.
DVFS-Aware Asymmetric-Retention STT-RAM Caches for Energy-Efficient Multicore Processors
.
Degree: 2019, University of Arizona
URL: http://hdl.handle.net/10150/633246
► Spin-transfer torque RAMs (STT-RAMs) have been studied as a promising alternative to SRAMs in emerging caches and main memories due to their low leakage power…
(more)
▼ Spin-transfer torque RAMs (STT-RAMs) have been studied as a promising alternative to SRAMs in emerging
caches and main memories due to their low leakage power and high density. However, STT-RAMs, also have drawbacks of high dynamic write energy and long write latency. Relaxing the retention time of the non-volatile STT-RAM has been widely studied as a way to reduce STT-RAM's write energy and latency. However, since different applications may require different retention times, STT-RAM retention times must be critically explored to satisfy various applications' needs. This process can be challenging due to exploration overhead, and exacerbated by the fact that STT-RAM
caches are emerging and are not readily available for design time exploration. This work explores using known statistics (e.g., SRAM statistics) to predict the appropriate STT-RAM retention times, in order to minimize exploration overhead. We propose an STT-RAM Cache Retention Time (SCART) model, which utilizes machine learning to enable design time or runtime prediction of best STT-RAM retention times for latency or energy optimization.
Furthermore, we analyze the impacts of dynamic voltage and frequency scaling (DVFS) – a common optimization in modern processors – on STT-RAM L1 cache design. Our analysis reveals that, apart from the fact that different applications may require different retention times, the clock frequency, which is typically ignored in most STT-RAM studies, may also significantly impact applications' retention time needs. Based on our findings, we propose an asymmetric-retention core (ARC) design for multicore architectures. ARC features retention time heterogeneity to specialize STT-RAM retention times to applications' needs. We also propose a runtime prediction model to determine the best core on which to run an application, based on the applications' characteristics, their retention time requirements, and available DVFS settings. Results reveal that the proposed approach can reduce the average cache energy by 39.21% and overall processor energy by 13.66%, compared to an SRAM-based system, and by 20.19% and 7.66%, respectively, compared to a homogeneous STT-RAM cache design.
Advisors/Committee Members: Adegbija, Tosiron (advisor), Lysecky, Roman (committeemember), Akoglu, Ali (committeemember).
Subjects/Keywords: Caches;
DVFS;
Multi-Core Processors;
STT-RAM
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Gajaria, D. M. (2019). DVFS-Aware Asymmetric-Retention STT-RAM Caches for Energy-Efficient Multicore Processors
. (Masters Thesis). University of Arizona. Retrieved from http://hdl.handle.net/10150/633246
Chicago Manual of Style (16th Edition):
Gajaria, Dhruv Mayur. “DVFS-Aware Asymmetric-Retention STT-RAM Caches for Energy-Efficient Multicore Processors
.” 2019. Masters Thesis, University of Arizona. Accessed March 06, 2021.
http://hdl.handle.net/10150/633246.
MLA Handbook (7th Edition):
Gajaria, Dhruv Mayur. “DVFS-Aware Asymmetric-Retention STT-RAM Caches for Energy-Efficient Multicore Processors
.” 2019. Web. 06 Mar 2021.
Vancouver:
Gajaria DM. DVFS-Aware Asymmetric-Retention STT-RAM Caches for Energy-Efficient Multicore Processors
. [Internet] [Masters thesis]. University of Arizona; 2019. [cited 2021 Mar 06].
Available from: http://hdl.handle.net/10150/633246.
Council of Science Editors:
Gajaria DM. DVFS-Aware Asymmetric-Retention STT-RAM Caches for Energy-Efficient Multicore Processors
. [Masters Thesis]. University of Arizona; 2019. Available from: http://hdl.handle.net/10150/633246

University of Texas – Austin
5.
-9048-1017.
Exploiting long-term behavior for improved memory system performance.
Degree: PhD, Computer science, 2016, University of Texas – Austin
URL: http://hdl.handle.net/2152/42015
► Memory latency is a key bottleneck for many programs. Caching and prefetching are two popular hardware mechanisms to alleviate the impact of long memory latencies,…
(more)
▼ Memory latency is a key bottleneck for many programs. Caching and prefetching are two popular hardware mechanisms to alleviate the impact of long memory latencies, but despite decades of research, significant headroom remains. In this thesis, we show how we can significantly improve caching and prefetching by exploiting a long history of the program's behavior. Towards this end, we define new learning goals that fully exploit long-term information, and we propose history representations that make it feasible to track and manipulate long histories. Based on these insights, we advance the state-of-the-art for three important memory system optimizations. For cache replacement, where existing solutions have relied on simplistic heuristics, our solution pursues the new goal of learning from the optimal solution for past references to predict caching decisions for future references. For irregular prefetching, where previous solutions are limited in scope due to their inefficient management of long histories, our goal is to realize the previously unattainable combination of two popular learning techniques, namely address correlation and PC-localization. Finally, for regular prefetching, where recent solutions learn increasingly complex patterns, we leverage long histories to simplify the learning goal and to produce more timely and accurate prefetches. Our results are significant. For cache replacement, our solution reduces misses for memory-intensive SPEC 2006 benchmarks by 17.4% compared to 11.4% for the previous best. For irregular prefetching, our prefetcher obtains 23.1% speedup (vs. 14.1% for the previous best) with 93.7% accuracy, and it comes close to the performance of an idealized prefetcher with no resource constraints. Finally, for regular prefetching, our prefetcher improves performance by 102.3% over a baseline with no prefetching compared to the 90% speedup for the previous state-of-the-art prefetcher; our solution also incurs 10% less traffic than the previous best regular prefetcher.
Advisors/Committee Members: Lin, Yun Calvin (advisor), Burger, Doug (committee member), Fussell, Donald S (committee member), Patt, Yale N (committee member), Pingali, Keshav (committee member).
Subjects/Keywords: Caches; Replacement policy; Prefetching; Memory system
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
-9048-1017. (2016). Exploiting long-term behavior for improved memory system performance. (Doctoral Dissertation). University of Texas – Austin. Retrieved from http://hdl.handle.net/2152/42015
Note: this citation may be lacking information needed for this citation format:
Author name may be incomplete
Chicago Manual of Style (16th Edition):
-9048-1017. “Exploiting long-term behavior for improved memory system performance.” 2016. Doctoral Dissertation, University of Texas – Austin. Accessed March 06, 2021.
http://hdl.handle.net/2152/42015.
Note: this citation may be lacking information needed for this citation format:
Author name may be incomplete
MLA Handbook (7th Edition):
-9048-1017. “Exploiting long-term behavior for improved memory system performance.” 2016. Web. 06 Mar 2021.
Note: this citation may be lacking information needed for this citation format:
Author name may be incomplete
Vancouver:
-9048-1017. Exploiting long-term behavior for improved memory system performance. [Internet] [Doctoral dissertation]. University of Texas – Austin; 2016. [cited 2021 Mar 06].
Available from: http://hdl.handle.net/2152/42015.
Note: this citation may be lacking information needed for this citation format:
Author name may be incomplete
Council of Science Editors:
-9048-1017. Exploiting long-term behavior for improved memory system performance. [Doctoral Dissertation]. University of Texas – Austin; 2016. Available from: http://hdl.handle.net/2152/42015
Note: this citation may be lacking information needed for this citation format:
Author name may be incomplete
6.
Oliveira, Jacqueline Augusto de.
"Uso de caches na Web - Influência das políticas de substituição de objetos".
Degree: Mestrado, Ciências de Computação e Matemática Computacional, 2004, University of São Paulo
URL: http://www.teses.usp.br/teses/disponiveis/55/55134/tde-26072004-121404/
;
► Este trabalho tem como objetivo analisar a influência provocada pelas políticas de substituição de objetos em caches na Web. Isso é feito por meio da…
(more)
▼ Este trabalho tem como objetivo analisar a influência provocada pelas políticas de substituição de objetos em caches na Web. Isso é feito por meio da investigação das políticas existentes na literatura, considerando um estudo de caracterização de carga, de avaliação de desempenho e de comparação do uso dessas políticas. Para realizar a avaliação das políticas é utilizado um simulador de caches para a Web. Durante a pesquisa, foi desenvolvida uma nova política, denominada MeMoExP. Ela utiliza os conceitos de Média Móvel para otimizar HR e BHR. As simulações realizadas mostraram que a MeMoExP segue a mesma tendência da política FBR, tida como eficiente na literatura. Finalmente, são expostas algumas ponderações sobre as idéias apresentadas nos capítulos componentes desta dissertação, além de serem apresentadas as contribuições provenientes desta pesquisa e alguns trabalhos futuros propostos a partir desta dissertação.
This work aims to analise the influence of the replacement policies on web caches. This is carried out by investigating the policies found at the literature, considering an study of load characterization and performance assessment as well as a comparison between the policies' usage. All the experiments are done using a web cache simulator. During the research, it was developed a new policie, called MeMoExp. It utilizes the concept of Moving Exponencial Average to optimize the HR and BHR metrics. The simulation studies showed that the MeMoExP policie follows the same tendency than the FBR metric, which is considered efficient in the literature. Finally, some considerations about the ideas presented in the dissertation are exposed. The contributions of this research work are presented and some future works are proposed.
Advisors/Committee Members: Santana, Marcos José.
Subjects/Keywords: caches web; políticas de substituição; replacement policies; web caches
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Oliveira, J. A. d. (2004). "Uso de caches na Web - Influência das políticas de substituição de objetos". (Masters Thesis). University of São Paulo. Retrieved from http://www.teses.usp.br/teses/disponiveis/55/55134/tde-26072004-121404/ ;
Chicago Manual of Style (16th Edition):
Oliveira, Jacqueline Augusto de. “"Uso de caches na Web - Influência das políticas de substituição de objetos".” 2004. Masters Thesis, University of São Paulo. Accessed March 06, 2021.
http://www.teses.usp.br/teses/disponiveis/55/55134/tde-26072004-121404/ ;.
MLA Handbook (7th Edition):
Oliveira, Jacqueline Augusto de. “"Uso de caches na Web - Influência das políticas de substituição de objetos".” 2004. Web. 06 Mar 2021.
Vancouver:
Oliveira JAd. "Uso de caches na Web - Influência das políticas de substituição de objetos". [Internet] [Masters thesis]. University of São Paulo; 2004. [cited 2021 Mar 06].
Available from: http://www.teses.usp.br/teses/disponiveis/55/55134/tde-26072004-121404/ ;.
Council of Science Editors:
Oliveira JAd. "Uso de caches na Web - Influência das políticas de substituição de objetos". [Masters Thesis]. University of São Paulo; 2004. Available from: http://www.teses.usp.br/teses/disponiveis/55/55134/tde-26072004-121404/ ;

Universitat Politècnica de València
7.
Valero Bresó, Alejandro.
Hybrid caches: design and data management
.
Degree: 2013, Universitat Politècnica de València
URL: http://hdl.handle.net/10251/32663
► Cache memories have been usually implemented with Static Random-Access Memory (SRAM) technology since it is the fastest electronic memory technology. However, this technology consumes a…
(more)
▼ Cache memories have been usually implemented with Static Random-Access Memory
(SRAM) technology since it is the fastest electronic memory technology. However, this
technology consumes a high amount of leakage currents, which is a major design concern
because leakage energy consumption increases as the transistor size shrinks. Alternative
technologies are being considered to reduce this consumption. Among them, embedded
Dynamic RAM (eDRAM) technology provides minimal area and leakage by design but
reads are destructive and it is not as fast as SRAM.
In this thesis, both SRAM and eDRAM technologies are mingled to take the advantatges
that each of them o¿ers. First, they are combined at cell level to implement an n-bit
macrocell consisting of one SRAM cell and n-1 eDRAM cells. The macrocell is used to
build n-way set-associative hybrid ¿rst-level (L1) data
caches having one SRAM way and
n-1 eDRAM ways. A single SRAM way is enough to achieve good performance given the
high data locality of L1
caches. Architectural mechanisms such as way-prediction, swaps,
and scrub operations are considered to avoid unnecessary eDRAM reads, to maintain
the Most Recently Used (MRU) data in the fast SRAM way, and to completely avoid
refresh logic. Experimental results show that, compared to a conventional SRAM cache,
leakage and area are largely reduced with a scarce impact on performance.
The study of the bene¿ts of hybrid
caches has been also carried out in second-level (L2)
caches acting as Last-Level
Caches (LLCs). In this case, the technologies are combined
at bank level and the optimal ratio of SRAM and eDRAM banks that achieves the
best trade-o¿ among performance, energy, and area is identi¿ed. Like in L1
caches, the
MRU blocks are kept in the SRAM banks and they are accessed ¿rst to avoid unnecessary
destructive reads. Nevertheless, refresh logic is not removed since data locality widely
di¿ers in this cache level. Experimental results show that a hybrid LLC with an eighth
of its banks built with SRAM technology is enough to achieve the best target trade-o¿.
This dissertation also deals with performance of replacement policies in heterogeneous
LLCs mainly focusing on the energy overhead incurred by refresh operations. In this
thesis it is de¿ned a new concept, namely MRU-Tour (MRUT), that helps estimate reuse information of cache blocks. Based on this concept, it is proposed a family of MRUTbased replacement algorithms that randomly select the victim block among those having
a single MRUT. These policies are enhanced to leverage recency of information for a
few blocks and to adapt to changes in the working set of the benchmarks. Results show
that the proposed MRUT policies, with simpler hardware complexity, outperform the
Least Recently Used (LRU) policy and a set of the most representative state-of-the-art
replacement policies for LLCs.
Refresh operations represent an important fraction of the overall dynamic energy consumption of eDRAM LLCs. This fraction increases with the cache capacity, since…
Advisors/Committee Members: Petit Martí, Salvador Vicente (advisor), Sahuquillo Borrás, Julio (advisor).
Subjects/Keywords: Cache memories;
EDRAM technology;
Energy consumption;
Hybrid caches;
Last-Level Caches;
Macrocell;
MRU-Tour;
Performance;
Replacement algorithms;
Selective refresh;
SRAM technology
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Valero Bresó, A. (2013). Hybrid caches: design and data management
. (Doctoral Dissertation). Universitat Politècnica de València. Retrieved from http://hdl.handle.net/10251/32663
Chicago Manual of Style (16th Edition):
Valero Bresó, Alejandro. “Hybrid caches: design and data management
.” 2013. Doctoral Dissertation, Universitat Politècnica de València. Accessed March 06, 2021.
http://hdl.handle.net/10251/32663.
MLA Handbook (7th Edition):
Valero Bresó, Alejandro. “Hybrid caches: design and data management
.” 2013. Web. 06 Mar 2021.
Vancouver:
Valero Bresó A. Hybrid caches: design and data management
. [Internet] [Doctoral dissertation]. Universitat Politècnica de València; 2013. [cited 2021 Mar 06].
Available from: http://hdl.handle.net/10251/32663.
Council of Science Editors:
Valero Bresó A. Hybrid caches: design and data management
. [Doctoral Dissertation]. Universitat Politècnica de València; 2013. Available from: http://hdl.handle.net/10251/32663
8.
Péneau, Pierre-Yves.
Intégration de technologies de mémoires non volatiles émergentes dans la hiérarchie de caches pour améliorer l'efficacité énergétique : Integration of emerging non volatile memory technologies in cache hierarchy for improving energy-efficiency.
Degree: Docteur es, Systèmes automatiques et micro-électroniques, 2018, Montpellier
URL: http://www.theses.fr/2018MONTS108
► De nos jours, des efforts majeurs pour la conception de systèmes sur puces performants et efficaces énergétiquement sont en cours. Le déclin de la loi…
(more)
▼ De nos jours, des efforts majeurs pour la conception de systèmes sur puces performants et efficaces énergétiquement sont en cours. Le déclin de la loi de Moore au début du XX e siècle a poussé les concepteurs à augmenter le nombre de cœurs par processeur pour continuer d’améliorer les performances. En conséquence, la surface de silicium occupée par les mémoires caches a augmentée. La finesse de gravure toujours plus petite a également fait augmenter le courant de fuite des transistors CMOS. Ainsi, la consommation énergétique des mémoires occupe une part de plus en plus importante dans la consommation globale des puces. Pour diminuer cette consommation, de nouvelles technologies de mémoires émergent depuis une dizaine d’années : les mémoires non volatiles (NVM). Ces mémoires ont la particularité d’avoir un courant de fuite très faible comparé aux technologies CMOS classiques. De fait, leur utilisation dans une architecture permettrait de diminuer la consommation globale de la hiérarchie de caches. Cependant, ces technologies souffrent de latences d’accès plus élevées que la SRAM, de coûts énergétiques d’accès plus importants et d’une durée de vie limitée. Leur intégration à des systèmes sur puces nécessite de continuer à rechercher des solutions. Cette thèse cherche à évaluer l’impact d’un changement de technologie dans la hiérarchie de caches.Plus spécifiquement, elle s’intéresse au cache de dernier niveau (LLC) et la technologie non volatile considérée est la STT-MRAM. Nos travaux adoptent un point de vue architectural dans lequel une modification de la technologie n’est pas retenue. Nous cherchons alors à intégrer les caractéristiques différentes de la STT-MRAM lors de la conception de la hiérarchie mémoire. Une première étude a permis de mettre en place un cadre d’exploration architectural pour des systèmes contenant des mémoires émergentes. Une seconde étude sur les optimisations architecturales au niveau du LLC a été menée pour identifier quelles sont les opportunités d’intégration de la STT-MRAM. Le but est d’améliorer l’efficacité énergétique tout en atténuant les pénalités d’accès dues aux fortes latences de cette technologie.
Today, intensive efforts to design energy-efficient and high-performance systems-on-chip (SoCs) are underway. Moore’s end in the early 20 th century pushed designers to increase the number of core per processor to continue to improve the performance. As a result, the silicon area occupied by cache memories has increased. The ever smaller technology node also increased the leakage current of CMOS transistors. Thus, the energy consumption of memories represents an increasingly important part in the overall consumption of chips.To reduce this energy consumption, new memory technologies have emerged overthe past decade : non-volatile memories (NVM). These memories have the particularity of having a very low leakage current compared to conventional CMOS technologies. In fact, their use in an architecture would reduce the overall consumption of the cache hierarchy. However, these…
Advisors/Committee Members: Gamatié, Abdoulaye (thesis director), Sassatelli, Gilles (thesis director).
Subjects/Keywords: Efficacité énergétique; Stt-Mram; Hiérarchie mémoire; Caches; Llc; Energy-Efficiency; Stt-Mram; Memory hierarchy; Caches; Llc
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Péneau, P. (2018). Intégration de technologies de mémoires non volatiles émergentes dans la hiérarchie de caches pour améliorer l'efficacité énergétique : Integration of emerging non volatile memory technologies in cache hierarchy for improving energy-efficiency. (Doctoral Dissertation). Montpellier. Retrieved from http://www.theses.fr/2018MONTS108
Chicago Manual of Style (16th Edition):
Péneau, Pierre-Yves. “Intégration de technologies de mémoires non volatiles émergentes dans la hiérarchie de caches pour améliorer l'efficacité énergétique : Integration of emerging non volatile memory technologies in cache hierarchy for improving energy-efficiency.” 2018. Doctoral Dissertation, Montpellier. Accessed March 06, 2021.
http://www.theses.fr/2018MONTS108.
MLA Handbook (7th Edition):
Péneau, Pierre-Yves. “Intégration de technologies de mémoires non volatiles émergentes dans la hiérarchie de caches pour améliorer l'efficacité énergétique : Integration of emerging non volatile memory technologies in cache hierarchy for improving energy-efficiency.” 2018. Web. 06 Mar 2021.
Vancouver:
Péneau P. Intégration de technologies de mémoires non volatiles émergentes dans la hiérarchie de caches pour améliorer l'efficacité énergétique : Integration of emerging non volatile memory technologies in cache hierarchy for improving energy-efficiency. [Internet] [Doctoral dissertation]. Montpellier; 2018. [cited 2021 Mar 06].
Available from: http://www.theses.fr/2018MONTS108.
Council of Science Editors:
Péneau P. Intégration de technologies de mémoires non volatiles émergentes dans la hiérarchie de caches pour améliorer l'efficacité énergétique : Integration of emerging non volatile memory technologies in cache hierarchy for improving energy-efficiency. [Doctoral Dissertation]. Montpellier; 2018. Available from: http://www.theses.fr/2018MONTS108

Penn State University
9.
Kotra, Jagadish Babu.
HARDWARE SOFTWARE CO-DESIGN FOR OPTIMIZING MEMORY HIERARCHY IN MANY-CORE AND MULTI-SOCKET SYSTEMS.
Degree: 2017, Penn State University
URL: https://submit-etda.libraries.psu.edu/catalog/14830jbk5155
► Thanks to Moore’s law, the number of transistors on a chip have been increasing over time without increasing area of the processing die. The increased…
(more)
▼ Thanks to Moore’s law, the number of transistors on a chip have been increasing
over time without increasing area of the processing die. The increased number of
transistors are being invested in separate cores instead of optimizing the already
complex out-of-order cores to ensure the power-density ie., the heat dissipated per
unit area is not too high. Hence, the complex uni-core systems have paved way in
to multi- and many-core systems on a processor die of necessarily the same size,
thereby resulting in increased amount of processing per unit area. Similar to the
processing-end, the number of transistors on the memory side have also increased
(though not at the same rate), resulting in the increased memory (DRAM) capacity
over the years.
Such increased number of transistors at the processor- and memory-ends have
enabled significant computation and memory capacity scalings over time in the
same area. However, the speed-ups observed due to the increased processing
power were not linear. This was because the number of pins that connect the
processor and memory haven’t been increased as that would make the die size
bigger. As a result, with the increased number of cores, the effective memory
bandwidth per computation core decreased over time. Apart from the reduced
memory bandwidth per core, the increased memory density (capacity per unit area)
resulted in interesting performance and power ramifications in DRAM. Due to
the volatile nature of the DRAM, the increased memory density warranted more
number of rows to be refreshed in effectively the same retention time. As a result,
certain sections of DRAM remained inaccessible to continuously feed the data in
to the processing elements resulting in reduced overall memory bandwidth as well.
As a result, the performance-gap between the processor and memory have
increased significantly over time. This gap in performance between processor and
memory is widely referred to by the researchers as “memory-wall”.
In my thesis, I have proposed various techniques to bridge the performance-gap between the processor and memory. The techniques I have proposed can be broadly
be classified in to:
1. Entirely hardware-based proposals,
2. Entirely software-based proposals, and
3. Hardware-Software based co-design proposals.
Advisors/Committee Members: Mahmut Taylan Kandemir, Dissertation Advisor/Co-Advisor, Mahmut Taylan Kandemir, Committee Chair/Co-Chair, Mary Jane Irwin, Committee Member, Kamesh Madduri, Committee Member, Dinghao Wu, Outside Member.
Subjects/Keywords: Hardware-software co-design; memory hierarchy; manycore processors; memory; caches
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Kotra, J. B. (2017). HARDWARE SOFTWARE CO-DESIGN FOR OPTIMIZING MEMORY HIERARCHY IN MANY-CORE AND MULTI-SOCKET SYSTEMS. (Thesis). Penn State University. Retrieved from https://submit-etda.libraries.psu.edu/catalog/14830jbk5155
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
Chicago Manual of Style (16th Edition):
Kotra, Jagadish Babu. “HARDWARE SOFTWARE CO-DESIGN FOR OPTIMIZING MEMORY HIERARCHY IN MANY-CORE AND MULTI-SOCKET SYSTEMS.” 2017. Thesis, Penn State University. Accessed March 06, 2021.
https://submit-etda.libraries.psu.edu/catalog/14830jbk5155.
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
MLA Handbook (7th Edition):
Kotra, Jagadish Babu. “HARDWARE SOFTWARE CO-DESIGN FOR OPTIMIZING MEMORY HIERARCHY IN MANY-CORE AND MULTI-SOCKET SYSTEMS.” 2017. Web. 06 Mar 2021.
Vancouver:
Kotra JB. HARDWARE SOFTWARE CO-DESIGN FOR OPTIMIZING MEMORY HIERARCHY IN MANY-CORE AND MULTI-SOCKET SYSTEMS. [Internet] [Thesis]. Penn State University; 2017. [cited 2021 Mar 06].
Available from: https://submit-etda.libraries.psu.edu/catalog/14830jbk5155.
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
Council of Science Editors:
Kotra JB. HARDWARE SOFTWARE CO-DESIGN FOR OPTIMIZING MEMORY HIERARCHY IN MANY-CORE AND MULTI-SOCKET SYSTEMS. [Thesis]. Penn State University; 2017. Available from: https://submit-etda.libraries.psu.edu/catalog/14830jbk5155
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
10.
Bhatia, Eshan.
Perceptron Learning in Cache Management and Prediction Techniques.
Degree: MS, Computer Engineering, 2019, Texas A&M University
URL: http://hdl.handle.net/1969.1/185028
► Hardware prefetching is an effective technique for hiding cache miss latencies in modern processor designs. An efficient prefetcher should identify complex memory access patterns during…
(more)
▼ Hardware prefetching is an effective technique for hiding cache miss latencies in modern processor designs. An efficient prefetcher should identify complex memory access patterns during program execution. This ability enables the prefetcher to read a block ahead of its demand access, potentially preventing a cache miss. Accurately identifying the right blocks to prefetch is essential to achieving high performance from the prefetcher.
Prefetcher performance can be characterized by two main metrics that are generally at odds with one another: coverage, the fraction of baseline cache misses which the prefetcher brings into the cache; and accuracy, the fraction of prefetches which are ultimately used. An overly aggressive prefetcher may improve coverage at the cost of reduced accuracy. Thus, performance may be harmed by this over-aggressiveness because many resources are wasted, including cache capacity and bandwidth. An ideal prefetcher would have both high coverage and accuracy.
In this thesis, I propose Perceptron-based Prefetch Filtering (PPF) as a way to increase the coverage of the prefetches generated by a baseline prefetcher without negatively impacting accuracy. PPF enables more aggressive tuning of a given baseline prefetcher, leading to increased coverage by filtering out the growing numbers of inaccurate prefetches such an aggressive tuning implies. I also explore a range of features to use to train PPF’s perceptron layer to identify inaccurate prefetches. PPF improves performance on a memory-intensive subset of the SPEC CPU 2017 benchmarks by 3.78% for a single-core configuration, and by 11.4% for a 4-core configuration, compared to the baseline prefetcher alone.
Advisors/Committee Members: Gratz, Paul V (advisor), Jimenez, Daniel A (advisor), Khatri, Sunil P (committee member).
Subjects/Keywords: Computer Architecture; Caches
…pre-load the needed data into the processor’s caches in a timely manner. Memory
access… …be stored
in, and deciding what data should be evicted from the caches to accommodate the…
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Bhatia, E. (2019). Perceptron Learning in Cache Management and Prediction Techniques. (Masters Thesis). Texas A&M University. Retrieved from http://hdl.handle.net/1969.1/185028
Chicago Manual of Style (16th Edition):
Bhatia, Eshan. “Perceptron Learning in Cache Management and Prediction Techniques.” 2019. Masters Thesis, Texas A&M University. Accessed March 06, 2021.
http://hdl.handle.net/1969.1/185028.
MLA Handbook (7th Edition):
Bhatia, Eshan. “Perceptron Learning in Cache Management and Prediction Techniques.” 2019. Web. 06 Mar 2021.
Vancouver:
Bhatia E. Perceptron Learning in Cache Management and Prediction Techniques. [Internet] [Masters thesis]. Texas A&M University; 2019. [cited 2021 Mar 06].
Available from: http://hdl.handle.net/1969.1/185028.
Council of Science Editors:
Bhatia E. Perceptron Learning in Cache Management and Prediction Techniques. [Masters Thesis]. Texas A&M University; 2019. Available from: http://hdl.handle.net/1969.1/185028
11.
Panda, Reena.
A Branch-Directed Data Cache Prefetching Technique for Inorder Processors.
Degree: MS, Computer Engineering, 2012, Texas A&M University
URL: http://hdl.handle.net/1969.1/ETD-TAMU-2011-12-10287
► The increasing gap between processor and main memory speeds has become a serious bottleneck towards further improvement in system performance. Data prefetching techniques have been…
(more)
▼ The increasing gap between processor and main memory speeds has become a serious
bottleneck towards further improvement in system performance. Data prefetching
techniques have been proposed to hide the performance impact of such long memory
latencies. But most of the currently proposed data prefetchers predict future memory
accesses based on current memory misses. This limits the opportunity that can be
exploited to guide prefetching.
In this thesis, we propose a branch-directed data prefetcher that uses the high prediction
accuracies of current-generation branch predictors to predict a future basic block trace
that the program will execute and issues prefetches for all the identified memory
instructions contained therein. We also propose a novel technique to generate prefetch
addresses by exploiting the correlation between the addresses generated by memory
instructions and the values of the corresponding source registers at prior branch
instances. We evaluate the impact of our prefetcher by using a cycle-accurate simulation
of an inorder processor on the M5 simulator. The results of the evaluation show that the
branch-directed prefetcher improves the performance on a set of 18 SPEC CPU2006
benchmarks by an average of 38.789% over a no-prefetching implementation and
2.148% over a system that employs a Spatial Memory Streaming prefetcher.
Advisors/Committee Members: Hu, Jiang (advisor), Gratz, Paul V. (advisor), Kim, Eun J. (committee member), Kundur, Deepa (committee member).
Subjects/Keywords: Caches; Prefetching; Inorder Processors
…growing gap between memory and processor speeds. One such
technique is the use of “caches” [… …also been proposed to the
cache implementation and handling, like lock-up free caches [2… …control flow
sequence of a program. It caches pairs of branch instruction PCs, where the
second…
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Panda, R. (2012). A Branch-Directed Data Cache Prefetching Technique for Inorder Processors. (Masters Thesis). Texas A&M University. Retrieved from http://hdl.handle.net/1969.1/ETD-TAMU-2011-12-10287
Chicago Manual of Style (16th Edition):
Panda, Reena. “A Branch-Directed Data Cache Prefetching Technique for Inorder Processors.” 2012. Masters Thesis, Texas A&M University. Accessed March 06, 2021.
http://hdl.handle.net/1969.1/ETD-TAMU-2011-12-10287.
MLA Handbook (7th Edition):
Panda, Reena. “A Branch-Directed Data Cache Prefetching Technique for Inorder Processors.” 2012. Web. 06 Mar 2021.
Vancouver:
Panda R. A Branch-Directed Data Cache Prefetching Technique for Inorder Processors. [Internet] [Masters thesis]. Texas A&M University; 2012. [cited 2021 Mar 06].
Available from: http://hdl.handle.net/1969.1/ETD-TAMU-2011-12-10287.
Council of Science Editors:
Panda R. A Branch-Directed Data Cache Prefetching Technique for Inorder Processors. [Masters Thesis]. Texas A&M University; 2012. Available from: http://hdl.handle.net/1969.1/ETD-TAMU-2011-12-10287

Brigham Young University
12.
Ostler, Michaela Ann.
Chiapa de Corzo Mound 3 Revisited: Burials, Caches, and Architecture.
Degree: MA, 2017, Brigham Young University
URL: https://scholarsarchive.byu.edu/cgi/viewcontent.cgi?article=9540&context=etd
► Chiapa de Corzo Mound 3 was excavated by Tim Tucker under the direction of the New World Archaeological Foundation in July 1965. Mound 3 is…
(more)
▼ Chiapa de Corzo Mound 3 was excavated by Tim Tucker under the direction of the New World Archaeological Foundation in July 1965. Mound 3 is located in the ritual center of Chiapa de Corzo, the southwest quadrant. Significant Preclassic and Protoclassic architecture, burials, and caches were discovered there but were never fully analyzed or published. A complete analysis of this mound is necessary to better understand the role of Chiapa de Corzo as a whole and as a regional power. This thesis completes the analysis and accomplishes the following goals: (1) completes the ceramic analysis and classification started by Tucker, (2) produces a catalog of all the burials and caches and their furniture found in Mound 3, and (3) describes changes in the architecture of this mound for each construction phase to determine the general function of Mound 3 throughout its occupation.
Subjects/Keywords: Chiapa de Corzo; burials; caches; architecture; function; NWAF; ceramics
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Ostler, M. A. (2017). Chiapa de Corzo Mound 3 Revisited: Burials, Caches, and Architecture. (Masters Thesis). Brigham Young University. Retrieved from https://scholarsarchive.byu.edu/cgi/viewcontent.cgi?article=9540&context=etd
Chicago Manual of Style (16th Edition):
Ostler, Michaela Ann. “Chiapa de Corzo Mound 3 Revisited: Burials, Caches, and Architecture.” 2017. Masters Thesis, Brigham Young University. Accessed March 06, 2021.
https://scholarsarchive.byu.edu/cgi/viewcontent.cgi?article=9540&context=etd.
MLA Handbook (7th Edition):
Ostler, Michaela Ann. “Chiapa de Corzo Mound 3 Revisited: Burials, Caches, and Architecture.” 2017. Web. 06 Mar 2021.
Vancouver:
Ostler MA. Chiapa de Corzo Mound 3 Revisited: Burials, Caches, and Architecture. [Internet] [Masters thesis]. Brigham Young University; 2017. [cited 2021 Mar 06].
Available from: https://scholarsarchive.byu.edu/cgi/viewcontent.cgi?article=9540&context=etd.
Council of Science Editors:
Ostler MA. Chiapa de Corzo Mound 3 Revisited: Burials, Caches, and Architecture. [Masters Thesis]. Brigham Young University; 2017. Available from: https://scholarsarchive.byu.edu/cgi/viewcontent.cgi?article=9540&context=etd

Brigham Young University
13.
Ostler, Michaela Ann.
Chiapa de Corzo Mound 3 Revisited: Burials, Caches, and Architecture.
Degree: MA, 2017, Brigham Young University
URL: https://scholarsarchive.byu.edu/cgi/viewcontent.cgi?article=9538&context=etd
► Chiapa de Corzo Mound 3 was excavated by Tim Tucker under the direction of the New World Archaeological Foundation in July 1965. Mound 3…
(more)
▼ Chiapa de Corzo Mound 3 was excavated by Tim Tucker under the direction of the New World Archaeological Foundation in July 1965. Mound 3 is located in the ritual center of Chiapa de Corzo, the southwest quadrant. Significant Preclassic and Protoclassic architecture, burials, and caches were discovered there but were never fully analyzed or published. A complete analysis of this mound is necessary to better understand the role of Chiapa de Corzo as a whole and as a regional power. This thesis completes the analysis and accomplishes the following goals: (1) completes the ceramic analysis and classification started by Tucker, (2) produces a catalog of all the burials and caches and their furniture found in Mound 3, and (3) describes changes in the architecture of this mound for each construction phase to determine the general function of Mound 3 throughout its occupation. Keywords:
Subjects/Keywords: Chiapa de Corzo; burials; caches; architecture; function; NWAF; ceramics; Anthropology
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Ostler, M. A. (2017). Chiapa de Corzo Mound 3 Revisited: Burials, Caches, and Architecture. (Masters Thesis). Brigham Young University. Retrieved from https://scholarsarchive.byu.edu/cgi/viewcontent.cgi?article=9538&context=etd
Chicago Manual of Style (16th Edition):
Ostler, Michaela Ann. “Chiapa de Corzo Mound 3 Revisited: Burials, Caches, and Architecture.” 2017. Masters Thesis, Brigham Young University. Accessed March 06, 2021.
https://scholarsarchive.byu.edu/cgi/viewcontent.cgi?article=9538&context=etd.
MLA Handbook (7th Edition):
Ostler, Michaela Ann. “Chiapa de Corzo Mound 3 Revisited: Burials, Caches, and Architecture.” 2017. Web. 06 Mar 2021.
Vancouver:
Ostler MA. Chiapa de Corzo Mound 3 Revisited: Burials, Caches, and Architecture. [Internet] [Masters thesis]. Brigham Young University; 2017. [cited 2021 Mar 06].
Available from: https://scholarsarchive.byu.edu/cgi/viewcontent.cgi?article=9538&context=etd.
Council of Science Editors:
Ostler MA. Chiapa de Corzo Mound 3 Revisited: Burials, Caches, and Architecture. [Masters Thesis]. Brigham Young University; 2017. Available from: https://scholarsarchive.byu.edu/cgi/viewcontent.cgi?article=9538&context=etd

University of Toronto
14.
Nacouzi, Michel El.
On Optimizing Die-stacked DRAM Caches.
Degree: 2013, University of Toronto
URL: http://hdl.handle.net/1807/42831
► Die-stacking is a new technology that allows multiple integrated circuits to be stacked on top of each other while connected with a high-bandwidth and high-speed…
(more)
▼ Die-stacking is a new technology that allows multiple integrated circuits to be stacked on top of each other while connected with a high-bandwidth and high-speed interconnect. In particular, die-stacking can be useful in boosting the effective bandwidth and speed of DRAM systems. Die-stacked DRAM caches have recently emerged as one of the top applications of die-stacking. They provide higher capacity than their SRAM counterparts and are faster than offchip DRAMs. In addition, DRAM caches can provide almost eight times the bandwidth of off-chip DRAMs. They, however, come with their own challenges. Since they are only twice as fast as main memory, they considerably increase latency for misses and incur significant energy overhead for remote lookups in snoop-based multi-socket systems. In this thesis, we present a Dual-Grain Filter for avoiding unnecessary accesses to the DRAM cache at reduced hardware cost and we compare it to recent works on die-stacked DRAM caches.
MAST
Advisors/Committee Members: Moshovos, Andreas, Electrical and Computer Engineering.
Subjects/Keywords: Die-Stacked DRAM caches; Computer Architecture; 0984; 0544; 0537
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Nacouzi, M. E. (2013). On Optimizing Die-stacked DRAM Caches. (Masters Thesis). University of Toronto. Retrieved from http://hdl.handle.net/1807/42831
Chicago Manual of Style (16th Edition):
Nacouzi, Michel El. “On Optimizing Die-stacked DRAM Caches.” 2013. Masters Thesis, University of Toronto. Accessed March 06, 2021.
http://hdl.handle.net/1807/42831.
MLA Handbook (7th Edition):
Nacouzi, Michel El. “On Optimizing Die-stacked DRAM Caches.” 2013. Web. 06 Mar 2021.
Vancouver:
Nacouzi ME. On Optimizing Die-stacked DRAM Caches. [Internet] [Masters thesis]. University of Toronto; 2013. [cited 2021 Mar 06].
Available from: http://hdl.handle.net/1807/42831.
Council of Science Editors:
Nacouzi ME. On Optimizing Die-stacked DRAM Caches. [Masters Thesis]. University of Toronto; 2013. Available from: http://hdl.handle.net/1807/42831

Delft University of Technology
15.
De Langen, P.J.
Energy reduction techniques for caches and multiprocessors.
Degree: 2009, Delft University of Technology
URL: http://resolver.tudelft.nl/uuid:4060494b-3725-4abc-aefb-8235fd05871c
;
urn:NBN:nl:ui:24-uuid:4060494b-3725-4abc-aefb-8235fd05871c
;
urn:NBN:nl:ui:24-uuid:4060494b-3725-4abc-aefb-8235fd05871c
;
http://resolver.tudelft.nl/uuid:4060494b-3725-4abc-aefb-8235fd05871c
Subjects/Keywords: energy reduction; caches; multiprocessor scheduling
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
De Langen, P. J. (2009). Energy reduction techniques for caches and multiprocessors. (Doctoral Dissertation). Delft University of Technology. Retrieved from http://resolver.tudelft.nl/uuid:4060494b-3725-4abc-aefb-8235fd05871c ; urn:NBN:nl:ui:24-uuid:4060494b-3725-4abc-aefb-8235fd05871c ; urn:NBN:nl:ui:24-uuid:4060494b-3725-4abc-aefb-8235fd05871c ; http://resolver.tudelft.nl/uuid:4060494b-3725-4abc-aefb-8235fd05871c
Chicago Manual of Style (16th Edition):
De Langen, P J. “Energy reduction techniques for caches and multiprocessors.” 2009. Doctoral Dissertation, Delft University of Technology. Accessed March 06, 2021.
http://resolver.tudelft.nl/uuid:4060494b-3725-4abc-aefb-8235fd05871c ; urn:NBN:nl:ui:24-uuid:4060494b-3725-4abc-aefb-8235fd05871c ; urn:NBN:nl:ui:24-uuid:4060494b-3725-4abc-aefb-8235fd05871c ; http://resolver.tudelft.nl/uuid:4060494b-3725-4abc-aefb-8235fd05871c.
MLA Handbook (7th Edition):
De Langen, P J. “Energy reduction techniques for caches and multiprocessors.” 2009. Web. 06 Mar 2021.
Vancouver:
De Langen PJ. Energy reduction techniques for caches and multiprocessors. [Internet] [Doctoral dissertation]. Delft University of Technology; 2009. [cited 2021 Mar 06].
Available from: http://resolver.tudelft.nl/uuid:4060494b-3725-4abc-aefb-8235fd05871c ; urn:NBN:nl:ui:24-uuid:4060494b-3725-4abc-aefb-8235fd05871c ; urn:NBN:nl:ui:24-uuid:4060494b-3725-4abc-aefb-8235fd05871c ; http://resolver.tudelft.nl/uuid:4060494b-3725-4abc-aefb-8235fd05871c.
Council of Science Editors:
De Langen PJ. Energy reduction techniques for caches and multiprocessors. [Doctoral Dissertation]. Delft University of Technology; 2009. Available from: http://resolver.tudelft.nl/uuid:4060494b-3725-4abc-aefb-8235fd05871c ; urn:NBN:nl:ui:24-uuid:4060494b-3725-4abc-aefb-8235fd05871c ; urn:NBN:nl:ui:24-uuid:4060494b-3725-4abc-aefb-8235fd05871c ; http://resolver.tudelft.nl/uuid:4060494b-3725-4abc-aefb-8235fd05871c
16.
Hamelin, Claire.
Couples de spin-orbite en vue d'applications aux mémoires cache : Spin orbit torques for cache memory applications.
Degree: Docteur es, Nanophysique, 2016, Université Grenoble Alpes (ComUE)
URL: http://www.theses.fr/2016GREAY061
► Le remplacement des technologies DRAM et SRAM des mémoires caches est un enjeu pour l’industrie microélectronique qui doit faire face à des demandes de miniaturisation,…
(more)
▼ Le remplacement des technologies DRAM et SRAM des mémoires caches est un enjeu pour l’industrie microélectronique qui doit faire face à des demandes de miniaturisation, de réduction des amplitudes et des durées des courants d’écriture et de lecture des données. Les mémoires à accès direct magnétiques (MRAM) sont des candidates pour une future génération de mémoires et la découverte des couples de spin-orbite (SOT) a ouvert la voix à une combinaison des deux technologies appelée SOT-MRAM. Ces mémoires sont très prometteuses car elles allient non-volatilité et bonne fiabilité, mais de nombreux défis techniques et théoriques restent à relever.L’objectif de ce travail de thèse est d’étudier le retournement de l’aimantation par couple de spin-orbite avec des impulsions de courant sub-nanoseconde et de diminuer les courants d’écriture à couple de spin-orbite. Ce travail est préliminaire à la preuve de concept d’une mémoire SOT-MRAM écrite avec des impulsions de courant électrique ultra-courtes et des amplitudes relativement faibles.Pour cela nous avons étudié des cellules mémoire à base de Ta-CoFeB-MgO. Nous avons vérifié les dépendances du courant critique en durées d’impulsions et en un champ magnétique extérieur. Nous avons ensuite, sur une cellule type SOT-MRAM, prouvé l’écriture ultrarapide avec des impulsions de courant inférieures à la nanoseconde. Puis nous nous sommes intéressés à la diminution du courant d’écriture de SOT-MRAM à l’aide d’un champ électrique. Nous avons démontré que ce dernier permet de modulerl’anisotropie magnétique. Sa diminution lors d’une impulsion de courant dans la liste de tantale montre que la densité de courant critique pour le retournement de l’aimantation du CoFeB par SOT est réduite. Ces résultats sont très encourageants pour le développement des SOT-MRAM et incitent à approfondir ces études. Le mécanisme de retournement de l’aimantation semble être une nucléation puis une propagation de parois de domaines magnétiques. Cette hypothèse se fonde sur des tendances physiques observées lors des expériences ainsi que sur des simulations numériques.
They require smaller areas for bigger storage densities, non-volatility as well as reduced and shorter writing electrical currents. Magnetic Random Access Memory (MRAM) is one of the best candidates for the replacement of SRAM and DRAM. Moreover, the recent discovery of spin-orbit torques (SOT) may lead to a new technology called SOT-MRAM. These promising technologies combine non-volatility and good reliability but many challenges still need to be taken up.This thesis aims at switching magnetization by spin-orbit torques with ultra-fast current pulse and at reducing their amplitude. This preliminary work should enable one to proof the concept of SOT-MRAM written with short current pulses and low electrical consumption to write a memory cell.To do so, we studied Ta-CoFeB-MgO-based memory cells for which we verified current dependencies on pulse lengths and external magnetic field. Then we proved the ultrafast writing of a SOT-MRAM cell with…
Advisors/Committee Members: Gaudin, Gilles (thesis director), Boulle, Olivier (thesis director).
Subjects/Keywords: Spintronique; Mémoires caches; Nanomagnétisme; Spintronic; Cache memory; Nanomagnetism; 530
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Hamelin, C. (2016). Couples de spin-orbite en vue d'applications aux mémoires cache : Spin orbit torques for cache memory applications. (Doctoral Dissertation). Université Grenoble Alpes (ComUE). Retrieved from http://www.theses.fr/2016GREAY061
Chicago Manual of Style (16th Edition):
Hamelin, Claire. “Couples de spin-orbite en vue d'applications aux mémoires cache : Spin orbit torques for cache memory applications.” 2016. Doctoral Dissertation, Université Grenoble Alpes (ComUE). Accessed March 06, 2021.
http://www.theses.fr/2016GREAY061.
MLA Handbook (7th Edition):
Hamelin, Claire. “Couples de spin-orbite en vue d'applications aux mémoires cache : Spin orbit torques for cache memory applications.” 2016. Web. 06 Mar 2021.
Vancouver:
Hamelin C. Couples de spin-orbite en vue d'applications aux mémoires cache : Spin orbit torques for cache memory applications. [Internet] [Doctoral dissertation]. Université Grenoble Alpes (ComUE); 2016. [cited 2021 Mar 06].
Available from: http://www.theses.fr/2016GREAY061.
Council of Science Editors:
Hamelin C. Couples de spin-orbite en vue d'applications aux mémoires cache : Spin orbit torques for cache memory applications. [Doctoral Dissertation]. Université Grenoble Alpes (ComUE); 2016. Available from: http://www.theses.fr/2016GREAY061
17.
Díaz, Pedro.
Mechanisms to improve the efficiency of hardware data prefetchers.
Degree: PhD, 2011, University of Edinburgh
URL: http://hdl.handle.net/1842/5650
► A well known performance bottleneck in computer architecture is the so-called memory wall. This term refers to the huge disparity between on-chip and off-chip access…
(more)
▼ A well known performance bottleneck in computer architecture is the so-called memory wall. This term refers to the huge disparity between on-chip and off-chip access latencies. Historically speaking, the operating frequency of processors has increased at a steady pace, while most past advances in memory technology have been in density, not speed. Nowadays, the trend for ever increasing processor operating frequencies has been replaced by an increasing number of CPU cores per chip. This will continue to exacerbate the memory wall problem, as several cores now have to compete for off-chip data access. As multi-core systems pack more and more cores, it is expected that the access latency as observed by each core will continue to increase. Although the causes of the memory wall have changed, it is, and will continue to be in the near future, a very significant challenge in terms of computer architecture design. Prefetching has been an important technique to amortize the effect of the memory wall. With prefetching, data or instructions that are expected to be used in the near future are speculatively moved up in the memory hierarchy, were the access latency is smaller. This dissertation focuses on hardware data prefetching at the last cache level before memory (last level cache, LLC). Prefetching at the LLC usually offers the best performance increase, as this is where the disparity between hit and miss latencies is the largest. Hardware prefetchers operate by examining the miss address stream generated by the cache and identifying patterns and correlations between the misses. Most prefetchers divide the global miss stream in several sub-streams, according to some pre-specified criteria. This process is known as localization. The benefits of localization are well established: it increases the accuracy of the predictions and helps filtering out spurious, non-predictable misses. However localization has one important drawback: since the misses are classified into different sub-streams, important chronological information is lost. A consequence of this is that most localizing prefetchers issue prefetches in an untimely manner, fetching data too far in advance. This behavior promotes data pollution in the cache. The first part of this thesis proposes a new class of prefetchers based on the novel concept of Stream Chaining. With Stream Chaining, the prefetcher tries to reconstruct the chronological information lost in the process of localization, while at the same time keeping its benefits. We describe two novel Stream Chaining prefetching algorithms based on two state of the art localizing prefetchers: PC/DC and C/DC. We show how both prefetchers issue prefetches in a more timely manner than their nonchaining counterparts, increasing performance by as much as 55% (10% on average) on a suite of sequential benchmarks, while consuming roughly the same amount of memory bandwidth. In order to hide the effects of the memory wall, hardware prefetchers are usually configured to aggressively prefetch as much data as possible. However,…
Subjects/Keywords: 004; prefetching; caches; hardware
…sensitivity for 256KB, 512KB and 2MB L2 caches. . .
55
4.2
L2 cache Read Hit Rate for 256KB… …512KB and 2MB L2 caches. . . .
56
4.3
Number of L2 cache accesses per 1K instructions… …instruction caches [19], I/O [20] or the TLB
[9]. Moreover, data… …caches higher in the hierarchy, and therefore
prefetching is more important for hiding the… …replacement for second level caches. The conclusion of this analysis is
that for the benchmarks…
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Díaz, P. (2011). Mechanisms to improve the efficiency of hardware data prefetchers. (Doctoral Dissertation). University of Edinburgh. Retrieved from http://hdl.handle.net/1842/5650
Chicago Manual of Style (16th Edition):
Díaz, Pedro. “Mechanisms to improve the efficiency of hardware data prefetchers.” 2011. Doctoral Dissertation, University of Edinburgh. Accessed March 06, 2021.
http://hdl.handle.net/1842/5650.
MLA Handbook (7th Edition):
Díaz, Pedro. “Mechanisms to improve the efficiency of hardware data prefetchers.” 2011. Web. 06 Mar 2021.
Vancouver:
Díaz P. Mechanisms to improve the efficiency of hardware data prefetchers. [Internet] [Doctoral dissertation]. University of Edinburgh; 2011. [cited 2021 Mar 06].
Available from: http://hdl.handle.net/1842/5650.
Council of Science Editors:
Díaz P. Mechanisms to improve the efficiency of hardware data prefetchers. [Doctoral Dissertation]. University of Edinburgh; 2011. Available from: http://hdl.handle.net/1842/5650
18.
Tian, Yingying.
Reducing Waste in Memory Hierarchies.
Degree: PhD, Computer Science, 2015, Texas A&M University
URL: http://hdl.handle.net/1969.1/155075
► Memory hierarchies play an important role in microarchitectural design to bridge the performance gap between modern microprocessors and main memory. However, memory hierarchies are inefficient…
(more)
▼ Memory hierarchies play an important role in microarchitectural design to bridge the performance gap between modern microprocessors and main memory. However, memory hierarchies are inefficient due to storing waste. This dissertation quantifies two types of waste, dead blocks and data redundancy. This dissertation studies waste in diverse memory hierarchies and proposes techniques to reduce waste to improve performance with limited overhead.
This dissertation observes that waste of dead blocks in an inclusive last level cache consists of two kinds of blocks: blocks that are highly accessed in core
caches and blocks that have low temporal locality in both core
caches and the last-level cache. Blindly replacing all dead blocks in an inclusive last level cache may degrade performance. This dissertation proposes temporal-based multilevel correlating cache replacement to improve performance of inclusive cache hierarchies.
This dissertation observes that waste exists in private
caches of graphics processing units (GPUs) as zero-reuse blocks. This dissertation defines zero-reuse blocks as blocks that are dead after being inserted into
caches. This dissertation proposes adaptive GPU cache bypassing technique to improve performance as well as reducing power consumption by dynamically bypassing zero-reuse blocks.
This dissertation exploits waste of data redundancy at the block-level granularity and finds that conventional cache design wastes capacity because it stores duplicate data. This dissertation quantifies the percentage of data duplication and analyze causes. This dissertation proposes a practical cache deduplication technique to increase the effectiveness of the cache with limited area and power consumption.
Advisors/Committee Members: Jimenez, Daniel A. (advisor), Amato, Nancy M. (committee member), Kim, Eun Jung (committee member), Gratz, Paul V. (committee member).
Subjects/Keywords: Caches; cache management
…temporal
locality and spatial locality, respectively.
Caches [106] store instructions… …and data that exhibit locality with low access latencies. Caches are usually small and fast… …compared to the main memory. References
to memory locations that are stored in caches can be… …However, in practices caches are often inefficient
because they store useless or redundant data… …1.1.1
Waste of Dead Blocks
Caches organize data and instructions into fixed-sized blocks of…
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Tian, Y. (2015). Reducing Waste in Memory Hierarchies. (Doctoral Dissertation). Texas A&M University. Retrieved from http://hdl.handle.net/1969.1/155075
Chicago Manual of Style (16th Edition):
Tian, Yingying. “Reducing Waste in Memory Hierarchies.” 2015. Doctoral Dissertation, Texas A&M University. Accessed March 06, 2021.
http://hdl.handle.net/1969.1/155075.
MLA Handbook (7th Edition):
Tian, Yingying. “Reducing Waste in Memory Hierarchies.” 2015. Web. 06 Mar 2021.
Vancouver:
Tian Y. Reducing Waste in Memory Hierarchies. [Internet] [Doctoral dissertation]. Texas A&M University; 2015. [cited 2021 Mar 06].
Available from: http://hdl.handle.net/1969.1/155075.
Council of Science Editors:
Tian Y. Reducing Waste in Memory Hierarchies. [Doctoral Dissertation]. Texas A&M University; 2015. Available from: http://hdl.handle.net/1969.1/155075

Virginia Tech
19.
Sharma, Niti.
Impact of Increased Cache Misses on Runtime Performance of MPX-enabled Programs.
Degree: MS, Computer Science and Applications, 2019, Virginia Tech
URL: http://hdl.handle.net/10919/89914
► Low level programming languages like C and C++ are primary choices to write low-level systems software such as operating systems, virtual machines, embedded software, and…
(more)
▼ Low level programming languages like C and C++ are primary choices to write low-level systems software such as operating systems, virtual machines, embedded software, and performance-critical applications. But these languages are considered as unsafe and prone to memory safety errors. Intel introduced a new technique- Memory Protection Extensions(MPX) to protect against these memory errors. But prior research found that applications supported with MPX have increased runtimes(slowdowns). In our research, we analyze these slowdowns for different input sizes(medium and large) in 15 benchmark applications. Based on the input sizes, the average slowdowns range from 140% to 144%. We then examine if there is a correlation between increase in cache misses under MPX and the slowdowns. A hardware cache is a component that stores data so that future requests for that data can be served faster. Hence, cache miss is a state where the data requested for processing by a component or application is not found in the cache. Whenever a cache miss happen, the processor waits for the data to be fetched from the next cache level or from main memory before it can continue to execute. This wait influences the runtime performance of the application. Our evaluations find that 10 out of 15 applications which have increased runtimes, also have increase in cache misses. This shows a positive correlation between these two parameters. Along with that, we also found that increase in instruction size in MPX protected applications also has a direct correlation with the runtime degradation. We also quantify these relationships with a statistical measure called correlation coefficient.
Advisors/Committee Members: Jian, Xun (committeechair), Jung, Changhee (committee member), Lee, Dongyoon (committee member).
Subjects/Keywords: Spatial Security; Memory Protection Extensions; Caches; Benchmarks; Runtime; Overheads; TLB
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Sharma, N. (2019). Impact of Increased Cache Misses on Runtime Performance of MPX-enabled Programs. (Masters Thesis). Virginia Tech. Retrieved from http://hdl.handle.net/10919/89914
Chicago Manual of Style (16th Edition):
Sharma, Niti. “Impact of Increased Cache Misses on Runtime Performance of MPX-enabled Programs.” 2019. Masters Thesis, Virginia Tech. Accessed March 06, 2021.
http://hdl.handle.net/10919/89914.
MLA Handbook (7th Edition):
Sharma, Niti. “Impact of Increased Cache Misses on Runtime Performance of MPX-enabled Programs.” 2019. Web. 06 Mar 2021.
Vancouver:
Sharma N. Impact of Increased Cache Misses on Runtime Performance of MPX-enabled Programs. [Internet] [Masters thesis]. Virginia Tech; 2019. [cited 2021 Mar 06].
Available from: http://hdl.handle.net/10919/89914.
Council of Science Editors:
Sharma N. Impact of Increased Cache Misses on Runtime Performance of MPX-enabled Programs. [Masters Thesis]. Virginia Tech; 2019. Available from: http://hdl.handle.net/10919/89914
20.
Kommanaboina, Kishor Yadav.
Auto-Determination of Cache/TLB parameters.
Degree: MS, Computer Science and Engineering, 2013, The Ohio State University
URL: http://rave.ohiolink.edu/etdc/view?acc_num=osu1367581773
► Modern optimizing compilers are adept at performing transformations like loop tiling, fusion, etc. The improvement in performance achieved by compilers due to these transformations relies…
(more)
▼ Modern optimizing compilers are adept at performing
transformations like loop tiling, fusion, etc. The improvement in
performance achieved by compilers due to these transformations
relies upon the assumptions made by the compilers about the
properties of the different levels of
caches and TLBs. Having
accurate information about these hardware components can go a long
way in helping the compiler generate efficient code.In this paper,
We present a novel algorithm which relies on micro-benchmarks and
repeated accesses to memory in fixed stride fashion to deduce the
number of sets, the associativity and line-size of the different
levels of
caches and TLBs across different architectures. We
compare our framework against other existing techniques and show
the effectiveness of our approach across architectures. We show
that the approaches previously developed are unable to
differentiate between all levels of cache and in some cases are
unable to detect the properties of different levels of
TLB.
Advisors/Committee Members: Ponnuswamy, Sadayappan (Advisor).
Subjects/Keywords: Computer Science; Cache; TLB; Microbenchmarks; inclusive caches; exclusive caches
…28
3.4
Determining lower level of caches… …27
Figure 5 : A1 < A2 with exclusive caches… …of the different levels of caches and TLBs. Having
accurate information about these… …memory hierarchy
like Data caches and TLBs are presented.
1.1
Objective
Our objective is to… …Caches hold frequently accessed blocks of main memory. They improve
performance by reducing the…
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Kommanaboina, K. Y. (2013). Auto-Determination of Cache/TLB parameters. (Masters Thesis). The Ohio State University. Retrieved from http://rave.ohiolink.edu/etdc/view?acc_num=osu1367581773
Chicago Manual of Style (16th Edition):
Kommanaboina, Kishor Yadav. “Auto-Determination of Cache/TLB parameters.” 2013. Masters Thesis, The Ohio State University. Accessed March 06, 2021.
http://rave.ohiolink.edu/etdc/view?acc_num=osu1367581773.
MLA Handbook (7th Edition):
Kommanaboina, Kishor Yadav. “Auto-Determination of Cache/TLB parameters.” 2013. Web. 06 Mar 2021.
Vancouver:
Kommanaboina KY. Auto-Determination of Cache/TLB parameters. [Internet] [Masters thesis]. The Ohio State University; 2013. [cited 2021 Mar 06].
Available from: http://rave.ohiolink.edu/etdc/view?acc_num=osu1367581773.
Council of Science Editors:
Kommanaboina KY. Auto-Determination of Cache/TLB parameters. [Masters Thesis]. The Ohio State University; 2013. Available from: http://rave.ohiolink.edu/etdc/view?acc_num=osu1367581773

University of Rochester
21.
Shriraman, Arrvindh.
Architectural techniques for memory oversight in
multiprocessors.
Degree: PhD, 2011, University of Rochester
URL: http://hdl.handle.net/1802/14180
► Computer architects have exploited the transistors afforded by Moore’s law to provide software developers with high performance computing resources. Software has translated this growth in…
(more)
▼ Computer architects have exploited the transistors
afforded by Moore’s law to provide software
developers with high
performance computing resources. Software has translated this
growth in
hardware resources into improved features and
applications. Unfortunately, applications have
become increasingly
complex and are prone to a variety of bugs when multiple software
modules
interact. The advent of multicore processors introduces a
new challenge, parallel programming,
which requires programmers to
coordinate multiple tasks.
This dissertation develops
general-purpose hardware mechanisms that address the dual
challenges
of parallel programming and software reliability. We
have devised hardware mechanisms
in the memory hierarchy that shed
light on the memory system and control the visibility of data
among the multiple threads. The key novelty is the use of cache
coherence protocols to implement
hardware mechanisms that enable
software to track and regulate memory accesses at
cache-line
granularity. We demonstrate that exposing the events in the memory
hierarchy provides
useful information that was either previously
invisible to software or would have required
heavyweight
instrumentation.
Focusing on the challenge of parallel
programming, our mechanisms aid implementations
of Transactional
Memory (TM), a programming construct that seeks to simplify
synchronization
of shared state. We develop two mechanisms,
Alert-On-Update (AOU) and Programmable
Data Isolation (PDI), to
accelerate common TM tasks. AOU selectively exposes cache events,
including those that are triggered by remote accesses, to software
in the form of events. TM
runtimes use it to detect accesses that
overlap between transactions (i.e., conflicts), and track a
transaction’s status. Programmable-Data-Isolation (PDI) allows
multiple threads to temporarily
hide their speculative writes from
concurrent threads in their private caches until software
decides
to make them visible. We have used PDI and AOU to implement two TM
run-time
systems, RTM and FlexTM. Both RTM and FlexTM are flexible
runtimes that permit software
control of the timing of conflict
resolution and the policy used for conflict management.
To address
the challenge of software reliability, we propose Sentry, a
lightweight, flexible
access-control mechanism. Sentry allows
software to regulate the reads and writes to memory
regions at
cache-line granularity based on the context in the program. Sentry
coordinates the
coherence states in a novel manner to eliminate
the need for permission checks entirely for a
large majority of
the program’s accesses (all cache hits), thereby improving
efficiency. Sentry
improves application reliability by regulating
data visibility and movement among the multiple
software modules
present in the application. We use a real-world webserver, Apache,
as a
case study to illustrate Sentry’s ability to guard the core
application from vulnerabilities in the
application’s
modules.
Subjects/Keywords: Memory hierarchy; Caches; Cache coherence; Monitoring; Isolation; Protection; Transactional memory; RTM; FlexTM; Sentry
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Shriraman, A. (2011). Architectural techniques for memory oversight in
multiprocessors. (Doctoral Dissertation). University of Rochester. Retrieved from http://hdl.handle.net/1802/14180
Chicago Manual of Style (16th Edition):
Shriraman, Arrvindh. “Architectural techniques for memory oversight in
multiprocessors.” 2011. Doctoral Dissertation, University of Rochester. Accessed March 06, 2021.
http://hdl.handle.net/1802/14180.
MLA Handbook (7th Edition):
Shriraman, Arrvindh. “Architectural techniques for memory oversight in
multiprocessors.” 2011. Web. 06 Mar 2021.
Vancouver:
Shriraman A. Architectural techniques for memory oversight in
multiprocessors. [Internet] [Doctoral dissertation]. University of Rochester; 2011. [cited 2021 Mar 06].
Available from: http://hdl.handle.net/1802/14180.
Council of Science Editors:
Shriraman A. Architectural techniques for memory oversight in
multiprocessors. [Doctoral Dissertation]. University of Rochester; 2011. Available from: http://hdl.handle.net/1802/14180

Cornell University
22.
Ismail, Mohamed.
Hardware-Software Co-optimization for Dynamic Languages.
Degree: PhD, Electrical and Computer Engineering, 2019, Cornell University
URL: http://hdl.handle.net/1813/67721
► As software becomes more complex and the costs of developing and maintaining code increase, dynamic programming languages such as Python are becoming more desirable alternatives…
(more)
▼ As software becomes more complex and the costs of developing and maintaining code increase, dynamic programming languages such as Python are becoming more desirable alternatives to traditional static languages such as C. Programmers can express more functionality with fewer lines of code and can spend less time debugging low-level bugs such as buffer overflows and memory leaks. Unfortunately, programs written in a dynamic language often execute significantly slower than an equivalent program written in a static language, sometimes by orders of magnitude. This dissertation investigates the following question: How can dynamic languages achieve high performance through HW/SW co-optimization? The first part of the dissertation studies inefficiencies in dynamic languages through a detailed quantitative analysis of the overhead in Python. The study identifies a new major source of overhead, C function calls, for the Python interpreter. Additionally, studying the interaction of the runtime with the underlying processor hardware shows that the performance of Python with JIT depends heavily on the cache hierarchy and memory system. Proper nursery sizing is necessary for each application to optimize the trade-off between cache performance and garbage collection overhead. Based on insights from the study, the software and hardware are co-optimized to improve the memory management performance. In the second part of the dissertation, a cache-aware optimization for single-application memory management is presented. The performance and memory bandwidth usage is improved by co-optimizing garbage collection overhead and cache performance for newly-initialized and dead objects. Further study shows that less frequent garbage collection results in a large number of cache misses for initial stores to new objects. The problem is solved by directly placing uninitialized objects into on-chip
caches without off-chip memory accesses. Cache performance is further optimized by reducing unnecessary cache pollution and write-backs through a partial tracing algorithm that invalidates dead objects between full garbage collections. The dissertation then focuses on the case of multiple applications running concurrently on a multi-core processor with shared
caches. It is shown that the performance of dynamic languages can degrade significantly due to cache contention among multiple concurrent applications that share a cache. To address this problem, program memory access patterns are reshaped by adjusting the nursery size. Both a static and a dynamic scheme are presented that determine good nursery sizes for multiple programs running concurrently.
Advisors/Committee Members: Suh, Gookwon Edward (chair), Martinez, Jose F. (committee member), Weatherspoon, Hakim (committee member).
Subjects/Keywords: Computer engineering; caches; dynamic languages; JavaScript; memory management; Python; Computer science; Engineering
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Ismail, M. (2019). Hardware-Software Co-optimization for Dynamic Languages. (Doctoral Dissertation). Cornell University. Retrieved from http://hdl.handle.net/1813/67721
Chicago Manual of Style (16th Edition):
Ismail, Mohamed. “Hardware-Software Co-optimization for Dynamic Languages.” 2019. Doctoral Dissertation, Cornell University. Accessed March 06, 2021.
http://hdl.handle.net/1813/67721.
MLA Handbook (7th Edition):
Ismail, Mohamed. “Hardware-Software Co-optimization for Dynamic Languages.” 2019. Web. 06 Mar 2021.
Vancouver:
Ismail M. Hardware-Software Co-optimization for Dynamic Languages. [Internet] [Doctoral dissertation]. Cornell University; 2019. [cited 2021 Mar 06].
Available from: http://hdl.handle.net/1813/67721.
Council of Science Editors:
Ismail M. Hardware-Software Co-optimization for Dynamic Languages. [Doctoral Dissertation]. Cornell University; 2019. Available from: http://hdl.handle.net/1813/67721

Penn State University
23.
Shah, Shail Paragbhai.
CHARACTERIZING AND OPTIMIZING ON-CHIP SHARED MEMORY RESOURCES USING MARKET-DRIVEN MECHANISMS.
Degree: 2019, Penn State University
URL: https://submit-etda.libraries.psu.edu/catalog/16496sks5698
► Heterogeneous applications often share memory resources such as last-level caches and memory bandwidth within the many-core system deployed in the IaaS model. The performance of…
(more)
▼ Heterogeneous applications often share memory resources such as last-level
caches and memory bandwidth within the many-core system deployed in the IaaS model. The performance of applications in such an environment depends highly upon the contention caused in the shared resources by the co-runners. Resource provider wants to dynamically allocate these physical resources among various application running in the system. They aim to improve the hardware utilization of the system while maximizing the benefit of the clients. Different applications derive varying utility as a function of the number of resources allocated to them; making it difficult for the resource providers. We present a way to evaluate the performance metric of an application under varying resource constraints along with co-runners. We present a market-driven mechanism which uses the auction to allocate LLC and memory bandwidth among different application in the system. We used benchmarks from SPECCPU2006 suite to evaluate our mechanism. Experimental results show the improvement of 1.5x-2x as compared to the state-of-the-art algorithms.
Advisors/Committee Members: Mahmut Taylan Kandemir, Thesis Advisor/Co-Advisor, John Morgan Sampson, Thesis Advisor/Co-Advisor.
Subjects/Keywords: Resource Allocation; last level caches; characterization; memory bandwidth; auction mechanism; game theory
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Shah, S. P. (2019). CHARACTERIZING AND OPTIMIZING ON-CHIP SHARED MEMORY RESOURCES USING MARKET-DRIVEN MECHANISMS. (Thesis). Penn State University. Retrieved from https://submit-etda.libraries.psu.edu/catalog/16496sks5698
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
Chicago Manual of Style (16th Edition):
Shah, Shail Paragbhai. “CHARACTERIZING AND OPTIMIZING ON-CHIP SHARED MEMORY RESOURCES USING MARKET-DRIVEN MECHANISMS.” 2019. Thesis, Penn State University. Accessed March 06, 2021.
https://submit-etda.libraries.psu.edu/catalog/16496sks5698.
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
MLA Handbook (7th Edition):
Shah, Shail Paragbhai. “CHARACTERIZING AND OPTIMIZING ON-CHIP SHARED MEMORY RESOURCES USING MARKET-DRIVEN MECHANISMS.” 2019. Web. 06 Mar 2021.
Vancouver:
Shah SP. CHARACTERIZING AND OPTIMIZING ON-CHIP SHARED MEMORY RESOURCES USING MARKET-DRIVEN MECHANISMS. [Internet] [Thesis]. Penn State University; 2019. [cited 2021 Mar 06].
Available from: https://submit-etda.libraries.psu.edu/catalog/16496sks5698.
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
Council of Science Editors:
Shah SP. CHARACTERIZING AND OPTIMIZING ON-CHIP SHARED MEMORY RESOURCES USING MARKET-DRIVEN MECHANISMS. [Thesis]. Penn State University; 2019. Available from: https://submit-etda.libraries.psu.edu/catalog/16496sks5698
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation

Penn State University
24.
Patrick, Christina M.
Minimizing End-To-End Interference in I/O Stacks Spanning Shared Multi-Level Buffer Caches
.
Degree: 2011, Penn State University
URL: https://submit-etda.libraries.psu.edu/catalog/11776
► Service providers and IT solutions generally favor shared resources to avoid excessive cost and over-provisioning. Most applications access shared storage servers through a multi-level buffer…
(more)
▼ Service providers and IT solutions generally favor shared resources to avoid excessive cost and over-provisioning. Most applications
access shared storage servers through a multi-level buffer cache hierarchy. Multiple I/O applications accessing these shared resources give rise to inter-application interference at the disk level as well as the shared buffer
caches throughout the hierarchy. Consequently, the disk spends most of its time positioning the head for the next I/O operation (IOP) and numerous destructive interferences are observed throughout all the shared buffer
caches accessed by the compute nodes and the shared storage servers. This application interference reduces the effective capacity of the
caches by displacing useful data before it is accessed, at the same time forcing the disk to spend most of the time in seeking due to the interleaving of the different I/O streams arising from these
different applications.
This thesis presents an end-to-end interference minimizing uniquely designed high performance I/O stack that spans multi-level shared buffer cache hierarchies accessing shared I/O servers to deliver a seamless high performance I/O stack. In this thesis, I show that I can build a superior I/O stack which minimizes the inter-application interference and consequently increases application performance based on our understanding of application characteristics such as reuse and locality, application data access patterns, application execution history, disk characteristics such as spin time and seek
time. This thesis offers a unparalleled substantiation of designing high performance I/O stack components to minimize the inter-application interference suitably to get the best application performance. I propose several novel modules that work at different layers in the I/O stack to increase application performance dramatically.
This thesis has four main contributions:
(1) I propose a novel client side prefetching module called APP, (Aggressive Pipelined
Prefetching which automatically detects and configures prefetching parameters, selectively offloads data to idle remote buffer
caches and transfers data in the background to minimize the
inter-application interference seen on the shared disks to ultimately increase disk throughput and decrease application execution time.
(2) I propose an application scheduler called AppMap (Application Mapper) which schedules
applications based on application characteristics such as reuse and locality on the nodes of multi-level buffer cache hierarchy to increase application hit rates throughout the shared buffer
caches in the hierarchy and decrease application execution time.
(3) I propose an end-to-end hint exploiter I/O stack called Mnemosyne which makes use of user specified data access pattern hints to increase the effectiveness of multi-tiered buffer
caches by eliminating duplication and caching the hotter data in the upper level
caches. At the same time it uses these hints to predict the next access and prefetch the data, thus, reducing the disk interference.…
Advisors/Committee Members: Mahmut Taylan Kandemir, Dissertation Advisor/Co-Advisor, Mahmut Taylan Kandemir, Committee Chair/Co-Chair, Mary Jane Irwin, Committee Member, Padma Raghavan, Committee Member, Brian Cameron, Committee Member, Raj Acharya, Committee Member.
Subjects/Keywords: storage cache; interference; Multi-level buffer caches; performance; I/O; disk; cache
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Patrick, C. M. (2011). Minimizing End-To-End Interference in I/O Stacks Spanning Shared Multi-Level Buffer Caches
. (Thesis). Penn State University. Retrieved from https://submit-etda.libraries.psu.edu/catalog/11776
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
Chicago Manual of Style (16th Edition):
Patrick, Christina M. “Minimizing End-To-End Interference in I/O Stacks Spanning Shared Multi-Level Buffer Caches
.” 2011. Thesis, Penn State University. Accessed March 06, 2021.
https://submit-etda.libraries.psu.edu/catalog/11776.
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
MLA Handbook (7th Edition):
Patrick, Christina M. “Minimizing End-To-End Interference in I/O Stacks Spanning Shared Multi-Level Buffer Caches
.” 2011. Web. 06 Mar 2021.
Vancouver:
Patrick CM. Minimizing End-To-End Interference in I/O Stacks Spanning Shared Multi-Level Buffer Caches
. [Internet] [Thesis]. Penn State University; 2011. [cited 2021 Mar 06].
Available from: https://submit-etda.libraries.psu.edu/catalog/11776.
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
Council of Science Editors:
Patrick CM. Minimizing End-To-End Interference in I/O Stacks Spanning Shared Multi-Level Buffer Caches
. [Thesis]. Penn State University; 2011. Available from: https://submit-etda.libraries.psu.edu/catalog/11776
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation

Michigan Technological University
25.
Byrne, Daniel.
mPart: Miss Ratio Curve Guided Partitioning in Key-Value Stores.
Degree: MS, Department of Computer Science, 2018, Michigan Technological University
URL: https://digitalcommons.mtu.edu/etdr/653
► Web applications employ key-value stores to cache the data that is most commonly accessed. The cache improves an web application’s performance by serving its…
(more)
▼ Web applications employ key-value stores to cache the data that is most commonly accessed. The cache improves an web application’s performance by serving its requests from memory, avoiding fetching them from the backend database. Since the memory space is limited, maximizing the memory utilization is a key to delivering the best performance possible. This has lead to the use of multi-tenant systems, allowing applications to share cache space. In addition, application data access patterns change over time, so the system should be adaptive in its memory allocation. In this thesis, we address both multi-tenancy (where a single cache is used for mul- tiple applications) and dynamic workloads (changing access patterns) using a model that relates the cache size to the application miss ratio, known as a miss ratio curve. Intuitively, the larger the cache, the less likely the system will need to fetch the data from the database. Our efficient, online construction of the miss ratio curve allows us to determine a near optimal memory allocation given the available system memory, while adapting to changing data access patterns. We show that our model outper- forms an existing state-of-the-art sharing model, Memshare, in terms of cache hit ratio and does so at a lower time cost. We show that average hit ratio is consistently 1 percentage point greater and 99.9th percentile latency is reduced by as much as 2.9% under standard web application workloads containing millions of requests.
Advisors/Committee Members: Zhenlin Wang, Nilufer Onder.
Subjects/Keywords: Cloud Computing; Caches; Multi-tenant Architectures; Key-Value Stores; Miss Ratio Curves; Data Storage Systems
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Byrne, D. (2018). mPart: Miss Ratio Curve Guided Partitioning in Key-Value Stores. (Masters Thesis). Michigan Technological University. Retrieved from https://digitalcommons.mtu.edu/etdr/653
Chicago Manual of Style (16th Edition):
Byrne, Daniel. “mPart: Miss Ratio Curve Guided Partitioning in Key-Value Stores.” 2018. Masters Thesis, Michigan Technological University. Accessed March 06, 2021.
https://digitalcommons.mtu.edu/etdr/653.
MLA Handbook (7th Edition):
Byrne, Daniel. “mPart: Miss Ratio Curve Guided Partitioning in Key-Value Stores.” 2018. Web. 06 Mar 2021.
Vancouver:
Byrne D. mPart: Miss Ratio Curve Guided Partitioning in Key-Value Stores. [Internet] [Masters thesis]. Michigan Technological University; 2018. [cited 2021 Mar 06].
Available from: https://digitalcommons.mtu.edu/etdr/653.
Council of Science Editors:
Byrne D. mPart: Miss Ratio Curve Guided Partitioning in Key-Value Stores. [Masters Thesis]. Michigan Technological University; 2018. Available from: https://digitalcommons.mtu.edu/etdr/653
26.
Ramaswamy, Subramanian.
Active management of Cache resources.
Degree: PhD, Electrical and Computer Engineering, 2008, Georgia Tech
URL: http://hdl.handle.net/1853/24663
► This dissertation addresses two sets of challenges facing processor design as the industry enters the deep sub-micron region of semiconductor design. The first set of…
(more)
▼ This dissertation addresses two sets of challenges facing processor design as the industry enters the deep sub-micron region of semiconductor design. The first set of challenges relates to the memory bottleneck. As the focus shifts from scaling processor frequency to scaling the number of cores, performance growth demands increasing die area. Scaling the number of cores also places a concurrent area demand in the form of larger
caches. While on-chip
caches occupy 50-60% of area and consume 20-30% of energy expended on-chip, their performance and energy efficiencies are less than 15% and 1% respectively for a range of benchmarks! The second set of challenges is posed by transistor leakage and process variation (inter-die and intra-die) at future technology nodes. Leakage power is anticipated to increase exponentially and sharply lower defect-free yield with successive technology generations. For performance scaling to continue, cache efficiencies have to improve significantly. This thesis proposes and evaluates a broad family of such improvements.
This dissertation first contributes a model for cache efficiencies and finds them to be extremely low - performance efficiencies less than 15% and energy efficiencies in the order of 1%. Studying the sources of inefficiency leads to a framework for efficiency improvement based on two interrelated strategies. The approach for improving energy efficiency primarily relies on sizing the cache to match the application memory footprint during a program phase while powering down all remaining cache sets. Importantly, the sized is fully functional with no references to inactive sets. Improving performance efficiency primarily relies on cache shaping, i.e., changing the placement function and thereby the manner in which memory shares the cache.
Sizing and shaping are applied at different phase of the design cycle: i) post-manufacturing & offline, ii) at compile-time, and at iii) run-time. This thesis proposes and explores techniques at each phase collectively realizing a repertoire of techniques for future memory system designers. The techniques use a combination of HW-SW techniques and are demonstrated to provide substantive improvements with modest overheads.
Advisors/Committee Members: Yalamanchili, Sudhakar (Committee Chair), Davis, Jeffrey (Committee Member), Ramachandran, Umakishore (Committee Member), Schimmel, David (Committee Member), Wardi, Yorai (Committee Member).
Subjects/Keywords: Efficiency; Customized placement; Reconfigurable caches; Customized caches; Efficient caches; Cache memory; Memory management (Computer science); Energy conservation
…19
Figure 5
Utilization with modern caches . . . . . . . . . . . . . . . . . . . . . . 21… …Figure 6
Performance efficiency with modern caches . . . . . . . . . . . . . . . . 21
Figure… …7
Energy efficiency with modern caches
Figure 8
Conflict set construction… …associative caches. . . . . . . . . . . . . . . . 57
Figure 30 Address translation of direct-mapped… …caches - concept . . . . . . . . . . 58
Figure 31 Address decoding for direct-mapped caches…
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Ramaswamy, S. (2008). Active management of Cache resources. (Doctoral Dissertation). Georgia Tech. Retrieved from http://hdl.handle.net/1853/24663
Chicago Manual of Style (16th Edition):
Ramaswamy, Subramanian. “Active management of Cache resources.” 2008. Doctoral Dissertation, Georgia Tech. Accessed March 06, 2021.
http://hdl.handle.net/1853/24663.
MLA Handbook (7th Edition):
Ramaswamy, Subramanian. “Active management of Cache resources.” 2008. Web. 06 Mar 2021.
Vancouver:
Ramaswamy S. Active management of Cache resources. [Internet] [Doctoral dissertation]. Georgia Tech; 2008. [cited 2021 Mar 06].
Available from: http://hdl.handle.net/1853/24663.
Council of Science Editors:
Ramaswamy S. Active management of Cache resources. [Doctoral Dissertation]. Georgia Tech; 2008. Available from: http://hdl.handle.net/1853/24663

University of Michigan
27.
Ansari, Amin.
Overcoming Hard-Faults in High-Performance Microprocessors.
Degree: PhD, Computer Science & Engineering, 2011, University of Michigan
URL: http://hdl.handle.net/2027.42/86517
► As device density grows, each transistor gets smaller and more fragile leading to an overall higher susceptibility to hard-faults. These hard-faults result in permanent silicon…
(more)
▼ As device density grows, each transistor gets smaller and more fragile leading to an overall higher susceptibility to hard-faults. These hard-faults result in permanent silicon defects and impact manufacturing yield, performance, and lifetime of semiconductor devices. In this thesis, we propose comprehensive, low-cost solutions to tackle reliability problems in high-performance microprocessors. These microprocessors mainly consist of on-chip
caches and core pipeline. We first present two flexible cache architectures, ZerehCache and Archipelago, to protect regular SRAM structures against high failure rates. ZerehCache virtually reorganizes the cache data array using a permutation network to provide higher degrees of freedom for spare allocation. In order to study the impact of fault patterns on the redundancy requirements in a cache, we propose a methodology to model the collision patterns in
caches as a graph problem. Given this model, a graph coloring scheme is employed to minimize the amount of additional redundancy required for protecting the cache. Archipelago targets failures in near-threshold region. It resizes the cache to provide redundancy for repairing faulty cells. Furthermore, a near optimal minimum clique covering configuration algorithm is introduced to minimizes the cache capacity loss.
With proper solutions in place for
caches, a robust and heterogeneous core coupling execution scheme, Necromancer, is presented to protect the general core area against hard-faults. Although a faulty core cannot be trusted, we observe that for most defects, execution traces on a defective core coarsely resemble those of fault-free executions. Necromancer exploits a functionally dead core to improve system throughput by supplying hints regarding high-level program behavior. We partition the cores into multiple groups. Each group shares a lightweight core that can be substantially accelerated. However, due to the presence of defects, a perfect data or instruction stream cannot be provided by the dead core. This necessitates employing low-cost recovery mechanism and generic hints that are more resilient to local abnormalities.
Advisors/Committee Members: Mahlke, Scott (committee member), Austin, Todd M. (committee member), Wenisch, Thomas F. (committee member), Zhang, Zhengya (committee member).
Subjects/Keywords: On-chip Caches, Wearout, Manufacturing Defects, Process Variation, Yield, Heterogeneous Coupled Core Execution; Computer Science; Engineering
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Ansari, A. (2011). Overcoming Hard-Faults in High-Performance Microprocessors. (Doctoral Dissertation). University of Michigan. Retrieved from http://hdl.handle.net/2027.42/86517
Chicago Manual of Style (16th Edition):
Ansari, Amin. “Overcoming Hard-Faults in High-Performance Microprocessors.” 2011. Doctoral Dissertation, University of Michigan. Accessed March 06, 2021.
http://hdl.handle.net/2027.42/86517.
MLA Handbook (7th Edition):
Ansari, Amin. “Overcoming Hard-Faults in High-Performance Microprocessors.” 2011. Web. 06 Mar 2021.
Vancouver:
Ansari A. Overcoming Hard-Faults in High-Performance Microprocessors. [Internet] [Doctoral dissertation]. University of Michigan; 2011. [cited 2021 Mar 06].
Available from: http://hdl.handle.net/2027.42/86517.
Council of Science Editors:
Ansari A. Overcoming Hard-Faults in High-Performance Microprocessors. [Doctoral Dissertation]. University of Michigan; 2011. Available from: http://hdl.handle.net/2027.42/86517
28.
Koller, Ricardo.
Improving Caches in Consolidated Environments.
Degree: PhD, Computer Science, 2012, Florida International University
URL: https://digitalcommons.fiu.edu/etd/708
;
10.25148/etd.FI12080801
;
FI12080801
► Memory (cache, DRAM, and disk) is in charge of providing data and instructions to a computer’s processor. In order to maximize performance, the speeds…
(more)
▼ Memory (cache, DRAM, and disk) is in charge of providing data and instructions to a computer’s processor. In order to maximize performance, the speeds of the memory and the processor should be equal. However, using memory that always match the speed of the processor is prohibitively expensive. Computer hardware designers have managed to drastically lower the cost of the system with the use of memory
caches by sacrificing some performance. A cache is a small piece of fast memory that stores popular data so it can be accessed faster. Modern computers have evolved into a hierarchy of
caches, where a memory level is the cache for a larger and slower memory level immediately below it. Thus, by using
caches, manufacturers are able to store terabytes of data at the cost of cheapest memory while achieving speeds close to the speed of the fastest one.
The most important decision about managing a cache is what data to store in it. Failing to make good decisions can lead to performance overheads and over- provisioning. Surprisingly,
caches choose data to store based on policies that have not changed in principle for decades. However, computing paradigms have changed radically leading to two noticeably different trends. First,
caches are now consol- idated across hundreds to even thousands of processes. And second, caching is being employed at new levels of the storage hierarchy due to the availability of high-performance flash-based persistent media. This brings four problems. First, as the workloads sharing a cache increase, it is more likely that they contain dupli- cated data. Second, consolidation creates contention for
caches, and if not managed carefully, it translates to wasted space and sub-optimal performance. Third, as contented
caches are shared by more workloads, administrators need to carefully estimate specific per-workload requirements across the entire memory hierarchy in order to meet per-workload performance goals. And finally, current cache write poli- cies are unable to simultaneously provide performance and consistency guarantees for the new levels of the storage hierarchy.
We addressed these problems by modeling their impact and by proposing solu- tions for each of them. First, we measured and modeled the amount of duplication at the buffer cache level and contention in real production systems. Second, we created a unified model of workload cache usage under contention to be used by administrators for provisioning, or by process schedulers to decide what processes to run together. Third, we proposed methods for removing cache duplication and to eliminate wasted space because of contention for space. And finally, we pro- posed a technique to improve the consistency guarantees of write-back
caches while preserving their performance benefits.
Advisors/Committee Members: Raju Rangaswami, Murali Vilayannur, Chen Liu, Ming Zhao, Giri Narasimhan.
Subjects/Keywords: Operating systems; storage systems; memory caches; consolidation
…PARTITIONING FOR EXTERNAL MEMORY CACHES
5.1 Background on cache partitioning… …60
61
65
66
66
69
69
70
70
71
71
72
73
74
75
78
6. CONSISTENCY IN WRITE-BACK CACHES
6.1… …ratio for content- and sector- addressed caches for
read operations. The total number of pages… …23
3.6
Comparison of ARC and LRU content-addressed caches for pages read
only (top… …97
xii
CHAPTER 1
INTRODUCTION
Memory (CPU caches, DRAM, and disk) has become…
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Koller, R. (2012). Improving Caches in Consolidated Environments. (Doctoral Dissertation). Florida International University. Retrieved from https://digitalcommons.fiu.edu/etd/708 ; 10.25148/etd.FI12080801 ; FI12080801
Chicago Manual of Style (16th Edition):
Koller, Ricardo. “Improving Caches in Consolidated Environments.” 2012. Doctoral Dissertation, Florida International University. Accessed March 06, 2021.
https://digitalcommons.fiu.edu/etd/708 ; 10.25148/etd.FI12080801 ; FI12080801.
MLA Handbook (7th Edition):
Koller, Ricardo. “Improving Caches in Consolidated Environments.” 2012. Web. 06 Mar 2021.
Vancouver:
Koller R. Improving Caches in Consolidated Environments. [Internet] [Doctoral dissertation]. Florida International University; 2012. [cited 2021 Mar 06].
Available from: https://digitalcommons.fiu.edu/etd/708 ; 10.25148/etd.FI12080801 ; FI12080801.
Council of Science Editors:
Koller R. Improving Caches in Consolidated Environments. [Doctoral Dissertation]. Florida International University; 2012. Available from: https://digitalcommons.fiu.edu/etd/708 ; 10.25148/etd.FI12080801 ; FI12080801

University of Notre Dame
29.
Branden James Moore.
Exploiting Large Shared On-Chip Caches for Chip
Multiprocessors</h1>.
Degree: Computer Science and Engineering, 2005, University of Notre Dame
URL: https://curate.nd.edu/show/tb09j388f4p
► Chip multiprocessors are one of several emerging architectures that address the growing processor memory performance gap. At the same time, advances in chip manufacturing…
(more)
▼ Chip multiprocessors are one of several
emerging architectures that address the growing processor memory
performance gap. At the same time, advances in chip manufacturing
enable the integration of processing logic and dense DRAM on the
same die. This thesis analyzes the use of such merged DRAM as a
shared cache for a large-scale chip multiprocessor. Simulation
results reveal that maximizing concurrency in the cache is of
paramount importance, greater even than cache hit rate. Concurrency
is achieved through ports gained from creating a multi-banked
cache, and multiple paths to main memory. Results demonstrate that
maximizing the number of cache banks is the most important design
goal, followed by providing adequate associativity to minimize miss
rates. Furthermore, the optimal cache block size is heavily
dependent on the workload. The off-chip memory organization impacts
performance to a lesser degree, with providing multiple paths to
memory outperforming a wider memory bus.
Advisors/Committee Members: Douglas Thain, Committee Member, Lambert Schaelicke, Committee Chair, Peter Kogge, Committee Member.
Subjects/Keywords: chip multiprocessors; shared caches; merged logic and DRAM
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Moore, B. J. (2005). Exploiting Large Shared On-Chip Caches for Chip
Multiprocessors</h1>. (Thesis). University of Notre Dame. Retrieved from https://curate.nd.edu/show/tb09j388f4p
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
Chicago Manual of Style (16th Edition):
Moore, Branden James. “Exploiting Large Shared On-Chip Caches for Chip
Multiprocessors</h1>.” 2005. Thesis, University of Notre Dame. Accessed March 06, 2021.
https://curate.nd.edu/show/tb09j388f4p.
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
MLA Handbook (7th Edition):
Moore, Branden James. “Exploiting Large Shared On-Chip Caches for Chip
Multiprocessors</h1>.” 2005. Web. 06 Mar 2021.
Vancouver:
Moore BJ. Exploiting Large Shared On-Chip Caches for Chip
Multiprocessors</h1>. [Internet] [Thesis]. University of Notre Dame; 2005. [cited 2021 Mar 06].
Available from: https://curate.nd.edu/show/tb09j388f4p.
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
Council of Science Editors:
Moore BJ. Exploiting Large Shared On-Chip Caches for Chip
Multiprocessors</h1>. [Thesis]. University of Notre Dame; 2005. Available from: https://curate.nd.edu/show/tb09j388f4p
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
30.
Lorrillere, Maxime.
Caches collaboratifs noyau adaptés aux environnements virtualisés : A kernel cooperative cache for virtualized environments.
Degree: Docteur es, Informatique, 2016, Université Pierre et Marie Curie – Paris VI
URL: http://www.theses.fr/2016PA066036
► Avec l'avènement du cloud computing, la virtualisation est devenue aujourd'hui incontournable. Elle offre isolation et flexibilité, en revanche elle implique une fragmentation des ressources, et…
(more)
▼ Avec l'avènement du cloud computing, la virtualisation est devenue aujourd'hui incontournable. Elle offre isolation et flexibilité, en revanche elle implique une fragmentation des ressources, et notamment de la mémoire. Les performances des applications qui effectuent beaucoup d'entrées/sorties (E/S) en sont particulièrement impactées. En effet, celles-ci reposent en grande partie sur la présence de mémoire libre, utilisée par le système pour faire du cache et ainsi accélérer les E/S. Ajuster dynamiquement les ressources d'une machine virtuelle devient donc un enjeu majeur. Dans cette thèse nous nous intéressons à ce problème, et nous proposons Puma, un cache réparti permettant de mutualiser la mémoire inutilisée des machines virtuelles pour améliorer les performances des applications qui effectuent beaucoup d'E/S. Contrairement aux solutions existantes, notre approche noyau permet à Puma de fonctionner avec les applications sans adaptation ni système de fichiers spécifique. Nous proposons plusieurs métriques, reposant sur des mécanismes existants du noyau Linux, qui permettent de définir le niveau d'activité « cache » du système. Ces métriques sont utilisées par Puma pour automatiser le niveau de contribution d'un noeud au cache réparti. Nos évaluations de Puma montrent qu'il est capable d'améliorer significativement les performances d'applications qui effectuent beaucoup d'E/S et de s'adapter dynamiquement afin de ne pas dégrader leurs performances.
With the advent of cloud architectures, virtualization has become a key mechanism for ensuring isolation and flexibility. However, a drawback of using virtual machines (VMs) is the fragmentation of physical resources. As operating systems leverage free memory for I/O caching, memory fragmentation is particularly problematic for I/O-intensive applications, which suffer a significant performance drop. In this context, providing the ability to dynamically adjust the resources allocated among the VMs is a primary concern.To address this issue, this thesis proposes a distributed cache mechanism called Puma. Puma pools together the free memory left unused by VMs: it enables a VM to entrust clean page-cache pages to other VMs. Puma extends the Linux kernel page cache, and thus remains transparent, to both applications and the rest of the operating system. Puma adjusts itself dynamically to the caching activity of a VM, which Puma evaluates by means of metrics derived from existing Linux kernel memory management mechanisms. Our experiments show that Puma significantly improves the performance of I/O-intensive applications and that it adapts well to dynamically changing conditions.
Advisors/Committee Members: Sens, Pierre (thesis director), Monnet, Sébastien (thesis director).
Subjects/Keywords: Systèmes d'exploitation; Caches répartis; Virtualisation; Mémoire; Systèmes de fichiers; Linux; Operating system; Cooperative caching; Virtualization; 004
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Lorrillere, M. (2016). Caches collaboratifs noyau adaptés aux environnements virtualisés : A kernel cooperative cache for virtualized environments. (Doctoral Dissertation). Université Pierre et Marie Curie – Paris VI. Retrieved from http://www.theses.fr/2016PA066036
Chicago Manual of Style (16th Edition):
Lorrillere, Maxime. “Caches collaboratifs noyau adaptés aux environnements virtualisés : A kernel cooperative cache for virtualized environments.” 2016. Doctoral Dissertation, Université Pierre et Marie Curie – Paris VI. Accessed March 06, 2021.
http://www.theses.fr/2016PA066036.
MLA Handbook (7th Edition):
Lorrillere, Maxime. “Caches collaboratifs noyau adaptés aux environnements virtualisés : A kernel cooperative cache for virtualized environments.” 2016. Web. 06 Mar 2021.
Vancouver:
Lorrillere M. Caches collaboratifs noyau adaptés aux environnements virtualisés : A kernel cooperative cache for virtualized environments. [Internet] [Doctoral dissertation]. Université Pierre et Marie Curie – Paris VI; 2016. [cited 2021 Mar 06].
Available from: http://www.theses.fr/2016PA066036.
Council of Science Editors:
Lorrillere M. Caches collaboratifs noyau adaptés aux environnements virtualisés : A kernel cooperative cache for virtualized environments. [Doctoral Dissertation]. Université Pierre et Marie Curie – Paris VI; 2016. Available from: http://www.theses.fr/2016PA066036
◁ [1] [2] ▶
.