You searched for +publisher:"Georgia Tech" +contributor:("Krishna, Tushar")
.
Showing records 1 – 30 of
37 total matches.
◁ [1] [2] ▶
No search limiters apply to these results.

Georgia Tech
1.
Mannan, Parth.
Exploring opportunities and challenges in enabling neuro-evolutionary algorithms in hardware.
Degree: MS, Electrical and Computer Engineering, 2018, Georgia Tech
URL: http://hdl.handle.net/1853/62255
► Recent advancements in the machine learning algorithms, especially the development of Deep Neural Networks (DNNs) have transformed the landscape of Artificial Intelligence (AI). With every…
(more)
▼ Recent advancements in the machine learning algorithms, especially the development of Deep Neural Networks (DNNs) have transformed the landscape of Artificial Intelligence (AI). With every passing day, deep learning based methods are applied to solve new problems with exceptional results. However true impact of AI could only be fully realized if it interacts with the real world and solves everyday problems. The everyday problem however, is new everyday and subject to increasingly changing requirements. The Deep Learning (DL) landscape today is incapable of solving these dynamic problems as the performance of DL today is heavily tied to the topology which is often task specific and hand-tuned by experts. Not only is rigidity of the solution the problem but also the high memory and compute requirements of DNNs to perform training on terabytes of data acts a huge barrier in bringing true intelligence to the edge which is the true portal to the 'real world'. NeuroEvolution (NE) are a class of algorithms that can circumvent this problem by 'learning on the fly'. These algorithms continuously interact with the environment and update their models based on how fruitful their last interaction proved. This way the solution is not tied to a topology and these algorithms do not need to perform memory and compute intensive backpropagation operations (BP) making them ideal for solving dynamic problems in a robust manner on the edge. However, the barrier to deployment of NE today is the lack of its widespread adoption and understanding of its compute behavior. This thesis attempts to lift that barrier by characterizing the compute and communication behavior a NE algorithm NEAT (NeuroEvolution of Augmenting Topologies) and is an attempt to propel further research in this direction. This Thesis also attempts to bring intelligence to the edge using a distributed system solution. This thesis demonstrates CLAN, Collaborative Learning using Asynchronous Neuro-evolution. It proposes techniques for enabling adaptive intelligence on the edge using NE algorithms collaboratively on Raspberry Pis and demonstrate that CLAN can match performance of higher end computing devices with better energy efficiency at scale. Further, this thesis also propose algorithmic modifications to improve the scalability. The study performed in this work aims to drive key insights to both computer architects and distributed system engineers to enable effort in deploying NE on the modern day compute platform.
Advisors/Committee Members: Krishna, Tushar (advisor), Kim, Hyesoon (committee member), Mukhopadhyay, Saibal (committee member).
Subjects/Keywords: Deep Learning; NeuroEvolution; Architecture; Evolutionary Algorithms; Hardware; Accelerators; Scalability; Distributed System; Collaborative; learning
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Mannan, P. (2018). Exploring opportunities and challenges in enabling neuro-evolutionary algorithms in hardware. (Masters Thesis). Georgia Tech. Retrieved from http://hdl.handle.net/1853/62255
Chicago Manual of Style (16th Edition):
Mannan, Parth. “Exploring opportunities and challenges in enabling neuro-evolutionary algorithms in hardware.” 2018. Masters Thesis, Georgia Tech. Accessed April 13, 2021.
http://hdl.handle.net/1853/62255.
MLA Handbook (7th Edition):
Mannan, Parth. “Exploring opportunities and challenges in enabling neuro-evolutionary algorithms in hardware.” 2018. Web. 13 Apr 2021.
Vancouver:
Mannan P. Exploring opportunities and challenges in enabling neuro-evolutionary algorithms in hardware. [Internet] [Masters thesis]. Georgia Tech; 2018. [cited 2021 Apr 13].
Available from: http://hdl.handle.net/1853/62255.
Council of Science Editors:
Mannan P. Exploring opportunities and challenges in enabling neuro-evolutionary algorithms in hardware. [Masters Thesis]. Georgia Tech; 2018. Available from: http://hdl.handle.net/1853/62255

Georgia Tech
2.
Ko, Sho.
Efficient Pipelined ReRAM-Based Processing-In-Memory Architecture for Convolutional Neural Network Inference.
Degree: MS, Electrical and Computer Engineering, 2020, Georgia Tech
URL: http://hdl.handle.net/1853/62806
► This research work presents a design of an analog ReRAM-based PIM (processing-in-memory) architecture for fast and efficient CNN (convolutional neural network) inference. For the overall…
(more)
▼ This research work presents a design of an analog ReRAM-based PIM (processing-in-memory) architecture for fast and efficient CNN (convolutional neural network) inference. For the overall architecture, we use the basic hardware hierarchy such as node, tile, core, and subarray. On the top of that, we design intra-layer pipelining, inter-layer pipelining, and batch pipelining to further exploit parallelism in the architecture and increase overall throughput for the inference of an input image stream. Our simulator also optimizes the performance of the NoC (network-on-chip) using SMART (single-cycle multi-hop asynchronous repeated traversal) flow control. Finally, we experiment with weight replications for different CNN layers and report throughput, energy efficiency, and speedup of VGG (A-E) for large-scale data set ImageNet.
Advisors/Committee Members: Yu, Shimeng (advisor), Raychowdhury, Arijit (committee member), Krishna, Tushar (committee member).
Subjects/Keywords: hardware accelerator; ReRAM (resistive random access memory); PIM (processing-in-memory); CNN (convolutional neural network); NoC (network-on-chip); SMART flow control
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Ko, S. (2020). Efficient Pipelined ReRAM-Based Processing-In-Memory Architecture for Convolutional Neural Network Inference. (Masters Thesis). Georgia Tech. Retrieved from http://hdl.handle.net/1853/62806
Chicago Manual of Style (16th Edition):
Ko, Sho. “Efficient Pipelined ReRAM-Based Processing-In-Memory Architecture for Convolutional Neural Network Inference.” 2020. Masters Thesis, Georgia Tech. Accessed April 13, 2021.
http://hdl.handle.net/1853/62806.
MLA Handbook (7th Edition):
Ko, Sho. “Efficient Pipelined ReRAM-Based Processing-In-Memory Architecture for Convolutional Neural Network Inference.” 2020. Web. 13 Apr 2021.
Vancouver:
Ko S. Efficient Pipelined ReRAM-Based Processing-In-Memory Architecture for Convolutional Neural Network Inference. [Internet] [Masters thesis]. Georgia Tech; 2020. [cited 2021 Apr 13].
Available from: http://hdl.handle.net/1853/62806.
Council of Science Editors:
Ko S. Efficient Pipelined ReRAM-Based Processing-In-Memory Architecture for Convolutional Neural Network Inference. [Masters Thesis]. Georgia Tech; 2020. Available from: http://hdl.handle.net/1853/62806

Georgia Tech
3.
Bharadwaj, Vedula Venkata.
Scaling address translation in multi-core architectures using low-latency interconnects.
Degree: MS, Electrical and Computer Engineering, 2017, Georgia Tech
URL: http://hdl.handle.net/1853/59300
► Modern systems employ structures known as Translation Lookaside Buffers(TLB) to accelerate the address translation mechanism. As workloads use ever-increasing memory footprints, TLBs are becoming critical…
(more)
▼ Modern systems employ structures known as Translation Lookaside Buffers(TLB) to accelerate the address translation mechanism. As workloads use ever-increasing memory footprints, TLBs are becoming critical to overall system performance. Modern designs use private multi-level TLB hierarchies to balance latency and effective capacity. Unfortunately, private TLB hierarchies have drawbacks, major one being the replication of translations across multiple cores
yielding lower hit rates than shared alternatives. But designing scalable shared TLBs remains a challenge since the benefit of higher capacity is often outweighed by latency overheads for accessing a large monolithic structure.
To counter the access latencies of large TLBs, physically distributed TLBs akin to NUCA caches can be explored. While a physical distributed last level TLB reduces bank access latency, the on-chip access latency to access remote banks and back continues to hamper performance and energy. Such problems hinder the practical adoption of large shared TLBs on modern many-core systems, where higher core counts exacerbate latency and energy problems.
By utilizing a light-weight single-cycle interconnect based on a recently-demonstrated technique called SMART, this thesis demostrates NUTRA, a Non-Uniform TRanslation Access architecture to tackle the scaling challenges of shared distributed last-level TLBs. NUTRA achieves latencies close to those of private L2 TLBs, with hit rates of shared last-level TLBs proposed in previous work. The combination of tight latencies and high hit rates means that NUTRA outperform not only monolithic SLL implementations, but also distributed implementations. Further, this thesis shows that a distributed organization coupled with low-latency interconnects delivers a scalable solution for last level TLBs in multi-core architectures.
Advisors/Committee Members: Krishna, Tushar (advisor), Yalamanchili, Sudhakar (committee member), Kim, Hyesoon (committee member).
Subjects/Keywords: TLB; NUTRA; SMART; Interconnect; Distributed; NUCA
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Bharadwaj, V. V. (2017). Scaling address translation in multi-core architectures using low-latency interconnects. (Masters Thesis). Georgia Tech. Retrieved from http://hdl.handle.net/1853/59300
Chicago Manual of Style (16th Edition):
Bharadwaj, Vedula Venkata. “Scaling address translation in multi-core architectures using low-latency interconnects.” 2017. Masters Thesis, Georgia Tech. Accessed April 13, 2021.
http://hdl.handle.net/1853/59300.
MLA Handbook (7th Edition):
Bharadwaj, Vedula Venkata. “Scaling address translation in multi-core architectures using low-latency interconnects.” 2017. Web. 13 Apr 2021.
Vancouver:
Bharadwaj VV. Scaling address translation in multi-core architectures using low-latency interconnects. [Internet] [Masters thesis]. Georgia Tech; 2017. [cited 2021 Apr 13].
Available from: http://hdl.handle.net/1853/59300.
Council of Science Editors:
Bharadwaj VV. Scaling address translation in multi-core architectures using low-latency interconnects. [Masters Thesis]. Georgia Tech; 2017. Available from: http://hdl.handle.net/1853/59300

Georgia Tech
4.
She, Xueyuan.
Fast and low-precision learning in GPU-accelerated spiking neural network.
Degree: MS, Electrical and Computer Engineering, 2020, Georgia Tech
URL: http://hdl.handle.net/1853/63679
► Spiking neural network (SNN) uses biologically inspired neuron model coupled with Spike-timing-dependent-plasticity (STDP) to enable unsupervised continuous learning in artificial intelligence (AI) platform. However, current…
(more)
▼ Spiking neural network (SNN) uses biologically inspired neuron model coupled with Spike-timing-dependent-plasticity (STDP) to enable unsupervised continuous learning in artificial intelligence (AI) platform. However, current SNN algorithms shows low accuracy in complex problems and are hard to operate at reduced precision. This paper demonstrates a GPU-accelerated SNN architecture that uses stochasticity in the STDP coupled with higher frequency input spike trains. The simulation results demonstrate 2 to 3 times faster learning compared to deterministic SNN architectures while maintaining high accuracy for MNIST (simple) and fashion MNIST (complex) data sets. Further, we show stochastic STDP enables learning even with 2 bits of operation, while deterministic STDP fails.
Advisors/Committee Members: Mukhopadhyay, Saibal (advisor), Kim, Hyesoon (committee member), Krishna, Tushar (committee member).
Subjects/Keywords: Spiking neural network; GPU acceleration; Computer vision
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
She, X. (2020). Fast and low-precision learning in GPU-accelerated spiking neural network. (Masters Thesis). Georgia Tech. Retrieved from http://hdl.handle.net/1853/63679
Chicago Manual of Style (16th Edition):
She, Xueyuan. “Fast and low-precision learning in GPU-accelerated spiking neural network.” 2020. Masters Thesis, Georgia Tech. Accessed April 13, 2021.
http://hdl.handle.net/1853/63679.
MLA Handbook (7th Edition):
She, Xueyuan. “Fast and low-precision learning in GPU-accelerated spiking neural network.” 2020. Web. 13 Apr 2021.
Vancouver:
She X. Fast and low-precision learning in GPU-accelerated spiking neural network. [Internet] [Masters thesis]. Georgia Tech; 2020. [cited 2021 Apr 13].
Available from: http://hdl.handle.net/1853/63679.
Council of Science Editors:
She X. Fast and low-precision learning in GPU-accelerated spiking neural network. [Masters Thesis]. Georgia Tech; 2020. Available from: http://hdl.handle.net/1853/63679

Georgia Tech
5.
Dasari, Nihar.
Modeling of Integrated Voltage Regulator Power delivery systems.
Degree: MS, Electrical and Computer Engineering, 2020, Georgia Tech
URL: http://hdl.handle.net/1853/64139
► Distributed power delivery poses new power design challenges in modern ICs, requiring circuit level techniques to convert and regulate power at points-of-load (POL), methodological solutions…
(more)
▼ Distributed power delivery poses new power design challenges in modern ICs, requiring circuit level techniques to convert and regulate power at points-of-load (POL), methodological solutions for distributing on-chip power supplies, and automated design techniques to co-design distributed power supplies and decoupling capacitors. Integration of on chip inductive DC-DC voltage regulators has become a popular way to design SOCs with improved power efficiency and performance. Such distributed power systems are highly complex because of their multi parametric interactive behavior. The various parameters encompass the voltage sources (input and reference voltages), loads, power semiconductors, and control circuits. Behavioural analysis, prior to prototyping, of such complex is possible only by suitable simulations. This thesis aims to study the the design and construction of a combined IVR and LDO system model using Simulink and MATLAB.
Advisors/Committee Members: Mukhopadhyay, Saibal (advisor), Krishna, Tushar (committee member), Kim, Hyesoon (committee member).
Subjects/Keywords: IVR; LDO; Power Delivery
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Dasari, N. (2020). Modeling of Integrated Voltage Regulator Power delivery systems. (Masters Thesis). Georgia Tech. Retrieved from http://hdl.handle.net/1853/64139
Chicago Manual of Style (16th Edition):
Dasari, Nihar. “Modeling of Integrated Voltage Regulator Power delivery systems.” 2020. Masters Thesis, Georgia Tech. Accessed April 13, 2021.
http://hdl.handle.net/1853/64139.
MLA Handbook (7th Edition):
Dasari, Nihar. “Modeling of Integrated Voltage Regulator Power delivery systems.” 2020. Web. 13 Apr 2021.
Vancouver:
Dasari N. Modeling of Integrated Voltage Regulator Power delivery systems. [Internet] [Masters thesis]. Georgia Tech; 2020. [cited 2021 Apr 13].
Available from: http://hdl.handle.net/1853/64139.
Council of Science Editors:
Dasari N. Modeling of Integrated Voltage Regulator Power delivery systems. [Masters Thesis]. Georgia Tech; 2020. Available from: http://hdl.handle.net/1853/64139

Georgia Tech
6.
Immanuel, Yehowshua U.
PLUG-AND-PLAY FOSS ML ACCELERATOR : FROM CONCEPT TO CONCEPTION.
Degree: MS, Electrical and Computer Engineering, 2020, Georgia Tech
URL: http://hdl.handle.net/1853/64210
► ML accelerators are a fairly new research area and it is important that the archi- tecture community is able to iterate quickly on architectural exploration.…
(more)
▼ ML accelerators are a fairly new research area and it is important that the archi- tecture community is able to iterate quickly on architectural exploration. Although there are a number of commercial Deep Neural Network(DNN) accelerators avail- able on the market and a plethora of creative ML architectures have been proposed in Academia, there exist only a few end-to-end DNN accelerators implementations which academics can readily study and use to inform future DNN accelerator devel- opments.
A number of tools have recently surfaced to help address this need. Some of these include advanced RTL design tools and compilers that consume ML framework out- put and emit instructions for custom accelerators. However, creating an end-to-end accelerator is still quite difficult. There are a number of hurdles to overcome including writing drivers, achieving high transfer speeds between host and accelerator, modi- fying compilers to support custom hardware, choosing a the correct bus to support connecting the accelerator fabric to the chosen memory system, and even choosing the right RTL.
This thesis documents the process of building an end-to-end accelerator complete with a custom compiler in the hopes that highlighting the most difficult parts of creating complete accelerator systems informs the techniques used by future architects and system designers.
Advisors/Committee Members: Krishna, Tushar (advisor), Mukhopadhyay, Saibal (committee member), Kim, Hyesoon (committee member).
Subjects/Keywords: DNN Accelerator; AI
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Immanuel, Y. U. (2020). PLUG-AND-PLAY FOSS ML ACCELERATOR : FROM CONCEPT TO CONCEPTION. (Masters Thesis). Georgia Tech. Retrieved from http://hdl.handle.net/1853/64210
Chicago Manual of Style (16th Edition):
Immanuel, Yehowshua U. “PLUG-AND-PLAY FOSS ML ACCELERATOR : FROM CONCEPT TO CONCEPTION.” 2020. Masters Thesis, Georgia Tech. Accessed April 13, 2021.
http://hdl.handle.net/1853/64210.
MLA Handbook (7th Edition):
Immanuel, Yehowshua U. “PLUG-AND-PLAY FOSS ML ACCELERATOR : FROM CONCEPT TO CONCEPTION.” 2020. Web. 13 Apr 2021.
Vancouver:
Immanuel YU. PLUG-AND-PLAY FOSS ML ACCELERATOR : FROM CONCEPT TO CONCEPTION. [Internet] [Masters thesis]. Georgia Tech; 2020. [cited 2021 Apr 13].
Available from: http://hdl.handle.net/1853/64210.
Council of Science Editors:
Immanuel YU. PLUG-AND-PLAY FOSS ML ACCELERATOR : FROM CONCEPT TO CONCEPTION. [Masters Thesis]. Georgia Tech; 2020. Available from: http://hdl.handle.net/1853/64210

Georgia Tech
7.
Hill, Brennan.
Malware capability reverse engineering via coordination with symbolic analysis.
Degree: MS, Electrical and Computer Engineering, 2018, Georgia Tech
URL: http://hdl.handle.net/1853/62254
► A key feature of cyber attack investigations is to quickly understand the capabilities and payloads of malware so proper countermeasures can be adopted. Unfortunately, due…
(more)
▼ A key feature of cyber attack investigations is to quickly understand the capabilities and payloads of malware so proper countermeasures can be adopted. Unfortunately, due to a lack of execution insight, current techniques for exposing these capabilities are prohibitively limited. Enter FORSEE, a tool developed by CyFI Lab researchers that leverages memory image forensics and symbolic analysis to quickly and efficiently discover capabilities in malware. FORSEE uses the concrete execution state extracted from a malware's memory to explore potential execution paths starting from the point of capture. By coordinating their analysis with FORSEE, malware analysts can simplify and accelerate their reverse engineering efforts. Similar to this use case, the work presented in this thesis coordinates the symbolic analysis from FORSEE with reverse engineering to assess FORSEE's effectiveness and assist in future development.
Advisors/Committee Members: Saltaformaggio, Brendan D. (advisor), Beyah, Raheem A. (committee member), Krishna, Tushar (committee member).
Subjects/Keywords: Malware analysis; Symbolic execution; Memory forensics
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Hill, B. (2018). Malware capability reverse engineering via coordination with symbolic analysis. (Masters Thesis). Georgia Tech. Retrieved from http://hdl.handle.net/1853/62254
Chicago Manual of Style (16th Edition):
Hill, Brennan. “Malware capability reverse engineering via coordination with symbolic analysis.” 2018. Masters Thesis, Georgia Tech. Accessed April 13, 2021.
http://hdl.handle.net/1853/62254.
MLA Handbook (7th Edition):
Hill, Brennan. “Malware capability reverse engineering via coordination with symbolic analysis.” 2018. Web. 13 Apr 2021.
Vancouver:
Hill B. Malware capability reverse engineering via coordination with symbolic analysis. [Internet] [Masters thesis]. Georgia Tech; 2018. [cited 2021 Apr 13].
Available from: http://hdl.handle.net/1853/62254.
Council of Science Editors:
Hill B. Malware capability reverse engineering via coordination with symbolic analysis. [Masters Thesis]. Georgia Tech; 2018. Available from: http://hdl.handle.net/1853/62254

Georgia Tech
8.
Na, Taesik.
Energy efficient, secure and noise robust deep learning for the internet of things.
Degree: PhD, Electrical and Computer Engineering, 2018, Georgia Tech
URL: http://hdl.handle.net/1853/60293
► The objective of this research is to design an energy efficient, secure and noise robust deep learning system for the Internet of Things (IoTs). The…
(more)
▼ The objective of this research is to design an energy efficient, secure and noise robust deep learning system for the Internet of Things (IoTs). The research particularly focuses on energy efficient training of deep learning, adversarial machine learning, and noise robust deep learning. To enable energy efficient training of deep learning, the research studies impact of a limited precision training of various types of neural networks like convolutional neural networks (CNNs) and recurrent neural networks (RNNs). For CNNs, the work proposes dynamic precision scaling algorithm, and precision flexible computing unit to accelerate CNNs training. For RNNs, the work studies impact of various hyper-parameters to enable low precision training of RNNs and proposes low precision computing unit with stochastic rounding. To enhance the security of deep learning, the research proposes cascade adversarial machine learning and additional regularization using a unified embedding for image classification and low level (pixel level) similarity learning. Noise robust and resolution-invariant image classification is also achieved by adding this low level similarity learning. Mixture of pre-processing experts model is proposed for noise robust object detection network without sacrificing accuracy for the clean images.
Advisors/Committee Members: Yalamanchili, Sudhakar (committee member), Krishna, Tushar (committee member), Burger, Doug (committee member), Pande, Santosh (committee member).
Subjects/Keywords: Deep learning; Adversarial machine learning; Energy efficient training; Noise robust machine learning; IoT
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Na, T. (2018). Energy efficient, secure and noise robust deep learning for the internet of things. (Doctoral Dissertation). Georgia Tech. Retrieved from http://hdl.handle.net/1853/60293
Chicago Manual of Style (16th Edition):
Na, Taesik. “Energy efficient, secure and noise robust deep learning for the internet of things.” 2018. Doctoral Dissertation, Georgia Tech. Accessed April 13, 2021.
http://hdl.handle.net/1853/60293.
MLA Handbook (7th Edition):
Na, Taesik. “Energy efficient, secure and noise robust deep learning for the internet of things.” 2018. Web. 13 Apr 2021.
Vancouver:
Na T. Energy efficient, secure and noise robust deep learning for the internet of things. [Internet] [Doctoral dissertation]. Georgia Tech; 2018. [cited 2021 Apr 13].
Available from: http://hdl.handle.net/1853/60293.
Council of Science Editors:
Na T. Energy efficient, secure and noise robust deep learning for the internet of things. [Doctoral Dissertation]. Georgia Tech; 2018. Available from: http://hdl.handle.net/1853/60293

Georgia Tech
9.
Sharma, Hardik.
Accelerated deep learning for the edge-to-cloud continuum: A specialized full stack derived from algorithms.
Degree: PhD, Electrical and Computer Engineering, 2019, Georgia Tech
URL: http://hdl.handle.net/1853/61267
► Advances in high-performance computer architecture design have been a major driver for the rapid evolution of Deep Neural Networks (DNN). Due to their insatiable demand…
(more)
▼ Advances in high-performance computer architecture design have been a major driver for the rapid evolution of Deep Neural Networks (DNN). Due to their insatiable demand for compute power, naturally, both the research community as well the industry have turned to accelerators to accommodate modern DNN computation. Furthermore, DNNs are gaining prevalence and have found applications across a wide spectrum of devices, from commod- ity smartphones to enterprise cloud platforms. However, there is no one-size-fits-all solu- tion for this continuum of devices that can meet the strict energy/power/chip-area budgets for edge devices and meet the high performance requirements for enterprise-grade servers. To this end, this thesis designs a specialized compute stack for DNN acceleration across the edge-to-cloud continuum that flexibly matches the varying constraints for different devices and simultaneously exploits algorithmic properties to maximize the benefits from acceleration. To this end, this thesis first explores a tight integration of Neural Network (NN) accelerators within the massively-parallel GPUs with a minimal area overhead. We show that a tight-coupling of NN-accelerators and GPUs can provide a significant gain in performance and energy efficiency across a diverse set of applications through neural acceleration, by approximating regions of approximation- amenable code using a neural networks. Next, this thesis develops a full-stack for accelerating DNN inference on FPGAs that aims to provide programmability, performance, and efficiency. We call our specialized compute stack DNNWEAVER, which encompasses (1) high-level algorithmic abstractions, (2) a flexible template accelerator architecture, and (3) a compiler that automatically and efficiently optimizes the template architecture to maximize DNN performance using the limited resources available on the FPGA die. The third thrust of this thesis explores scale-out acceleration of training using cloud-scale FPGAs for a wide range of machine learning algorithms, including neural networks. The challenge here is to design an accelerator architecture that can scale up to efficiently use the large pool of compute resources available on modern cloud-grade FPGAs. To tackle this challenge, this thesis explores multi-threading to maximize efficiency from FPGA acceleration by running multiple parallel threads of training. The final thrust of this thesis builds upon the algorithmic insight that bitwidth of operations in DNNs can be reduced without compromising their classification accuracy. However, to prevent loss of accuracy, the bitwidth varies significantly across DNNs and it may even be adjusted for each layer individually. Thus, a fixed-bitwidth accelerator would either offer limited benefits to accommodate the worst-case bitwidth requirements, or inevitably lead to a degradation in final accuracy. To alleviate these deficiencies, the final thrust of this thesis introduces dynamic bit-level fusion/decomposition as a new dimension in the design of DNN accelerators. The…
Advisors/Committee Members: Esmaeilzadeh, Hadi (advisor), Kim, Hyesoon (advisor), Prvulovic, Milos (advisor), Krishna, Tushar (advisor), Chandra, Vikas (advisor).
Subjects/Keywords: Bit level composability; Dynamic composability; Deep neural networks; Accelerators; DNN; Convolutional neural networks; CNN; Long short-term memory; LSTM; Recurrent neural networks; RNN; Quantization; Bit fusion; DnnWeaver; FPGA; ASIC
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Sharma, H. (2019). Accelerated deep learning for the edge-to-cloud continuum: A specialized full stack derived from algorithms. (Doctoral Dissertation). Georgia Tech. Retrieved from http://hdl.handle.net/1853/61267
Chicago Manual of Style (16th Edition):
Sharma, Hardik. “Accelerated deep learning for the edge-to-cloud continuum: A specialized full stack derived from algorithms.” 2019. Doctoral Dissertation, Georgia Tech. Accessed April 13, 2021.
http://hdl.handle.net/1853/61267.
MLA Handbook (7th Edition):
Sharma, Hardik. “Accelerated deep learning for the edge-to-cloud continuum: A specialized full stack derived from algorithms.” 2019. Web. 13 Apr 2021.
Vancouver:
Sharma H. Accelerated deep learning for the edge-to-cloud continuum: A specialized full stack derived from algorithms. [Internet] [Doctoral dissertation]. Georgia Tech; 2019. [cited 2021 Apr 13].
Available from: http://hdl.handle.net/1853/61267.
Council of Science Editors:
Sharma H. Accelerated deep learning for the edge-to-cloud continuum: A specialized full stack derived from algorithms. [Doctoral Dissertation]. Georgia Tech; 2019. Available from: http://hdl.handle.net/1853/61267

Georgia Tech
10.
Jo, Paul K.
Polylithic integration of heterogeneous multi-die enabled by compressible microinterconnects.
Degree: PhD, Electrical and Computer Engineering, 2019, Georgia Tech
URL: http://hdl.handle.net/1853/62301
► This research proposes and demonstrate 1) a new compliant interconnect that can provide cost-effective and simple fabrication process and allow high-degree of freedom in design…
(more)
▼ This research proposes and demonstrate 1) a new compliant interconnect that can provide cost-effective and simple fabrication process and allow high-degree of freedom in design and 2) advanced heterogeneous multi-die integration platform enabled by the new compliant interconnect. Interconnects play a critical role in virtually all microelectronic applications. They are key in influencing microsystem form factor, electrical performance, power consumption, and signal integrity. Of particular importance are first-level interconnects, which are used to electrically interconnect and mechanically bond a die to a package substrate. The density, electrical attributes, and mechanical properties of first-level interconnects impact the overall mechanical integrity, signaling bandwidth density, and power supply noise of microsystems. While solder bumps have become a key technology for first-level interconnects, the technology unfortunately leaves a number of attributes desired in modern microsystems. Compliant interconnects can circumvent many of the challenges in solder bumps as they can compensate for surface non-uniformity on the attaching substrate and CTE mismatch induced warpage and provide non-permanent contact. To this end, novel compliant interconnects for emerging electronic devices and new heterogeneous multi-die integration platform enabled by the compliant interconnects are explored.
Advisors/Committee Members: Bakir, Muhannad S. (advisor), Brand, Oliver (committee member), Krishna, Tushar (committee member), Cardoso, Adilson (committee member), Sitaraman, Suresh (committee member).
Subjects/Keywords: Compliant interconnect; Heterogeneous integration; Package; 2.5D; 3D; System-level integration; System-in-package
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Jo, P. K. (2019). Polylithic integration of heterogeneous multi-die enabled by compressible microinterconnects. (Doctoral Dissertation). Georgia Tech. Retrieved from http://hdl.handle.net/1853/62301
Chicago Manual of Style (16th Edition):
Jo, Paul K. “Polylithic integration of heterogeneous multi-die enabled by compressible microinterconnects.” 2019. Doctoral Dissertation, Georgia Tech. Accessed April 13, 2021.
http://hdl.handle.net/1853/62301.
MLA Handbook (7th Edition):
Jo, Paul K. “Polylithic integration of heterogeneous multi-die enabled by compressible microinterconnects.” 2019. Web. 13 Apr 2021.
Vancouver:
Jo PK. Polylithic integration of heterogeneous multi-die enabled by compressible microinterconnects. [Internet] [Doctoral dissertation]. Georgia Tech; 2019. [cited 2021 Apr 13].
Available from: http://hdl.handle.net/1853/62301.
Council of Science Editors:
Jo PK. Polylithic integration of heterogeneous multi-die enabled by compressible microinterconnects. [Doctoral Dissertation]. Georgia Tech; 2019. Available from: http://hdl.handle.net/1853/62301

Georgia Tech
11.
Nazari, Alireza.
Software profiling via electromagnetic side-channel signal.
Degree: PhD, Computer Science, 2020, Georgia Tech
URL: http://hdl.handle.net/1853/62720
► This thesis develops general methods to exploit information leaked in Electromagnetic (EM) emanations for profiling software applications. A broad range of computing devices and software…
(more)
▼ This thesis develops general methods to exploit information leaked in Electromagnetic (EM) emanations for profiling software applications. A broad range of computing devices and software applications can benefit from these methods. Computers radiate EM emanations when voltage and current flows change as a result of software program activity. EM emanations can be intercepted and analyzed to extract information about corresponding computation. Traditionally, EM side-channel has been leveraged to gather critical information about cryptographic algorithms. This information is used by cryptography researches to extract secret cryptographic keys from computing devices as the devices perform encryption operations. The design and implementation of this analysis is usually done ad-hoc, for a specific implementation of a cryptographic algorithm on a particular machine. The wide range of information that can be gathered from EM emanations signals suggests that it is useful for more purposes than cryptographic analysis. Moreover, there are two major benefits in using these signals. First, they can be received remotely and no contact with device is needed. This specially benefits embedded devices where access to the device is not easy or even possible. Second, the EM signal can be received and processed in a physically separate machine. This also benefits real-time and cyber-physical devices which have very limited computation and memory resources. Until now, only few bodies of work tried to explore the complex relationship between EM emanations, underlying architecture and software application. It is viable to use EM emanation as a tool for profiling application and infer various levels of information from it. This information may span from detailed statistics of an event in the underlying machine to timing information of the software program's code in large granularity. However, profiling this information requires a general approach that can be automatically applied to diverse programs and machines. Toward this goal, this thesis has developed (1) A new approach for profiling software programs that leverages unintentional EM side-channel and allows highly accurate profiling of loops and other repetitive activity, without perturbing the profiled system, (2) A new method for anomaly detection in program execution that monitors application's repetitive behavior, (3) an external memory profiler that infers last-level cache misses from EM side-channel signal, (4) a technique that extends the other proposed methods to multi-core systems by blind separation of EM emanation sources.
Advisors/Committee Members: Prvulovic, Milos (advisor), Zajic, Alenka (advisor), Orso, Alessandro (committee member), Krishna, Tushar (committee member), Qureshi, Moinuddin (committee member).
Subjects/Keywords: Electromagnetic side-channel; Profiling; Memory profiler; Blind source separation; Malware detection
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Nazari, A. (2020). Software profiling via electromagnetic side-channel signal. (Doctoral Dissertation). Georgia Tech. Retrieved from http://hdl.handle.net/1853/62720
Chicago Manual of Style (16th Edition):
Nazari, Alireza. “Software profiling via electromagnetic side-channel signal.” 2020. Doctoral Dissertation, Georgia Tech. Accessed April 13, 2021.
http://hdl.handle.net/1853/62720.
MLA Handbook (7th Edition):
Nazari, Alireza. “Software profiling via electromagnetic side-channel signal.” 2020. Web. 13 Apr 2021.
Vancouver:
Nazari A. Software profiling via electromagnetic side-channel signal. [Internet] [Doctoral dissertation]. Georgia Tech; 2020. [cited 2021 Apr 13].
Available from: http://hdl.handle.net/1853/62720.
Council of Science Editors:
Nazari A. Software profiling via electromagnetic side-channel signal. [Doctoral Dissertation]. Georgia Tech; 2020. Available from: http://hdl.handle.net/1853/62720

Georgia Tech
12.
Srikanth, Sriseshan.
Energy efficient architectures for irregular data streams.
Degree: PhD, Computer Science, 2020, Georgia Tech
URL: http://hdl.handle.net/1853/62757
► An increasing prevalence of data-irregularity is being seen in applications today, particularly in machine learning, graph analytics, high-performance computing and cybersecurity. Faced with fundamental technology…
(more)
▼ An increasing prevalence of data-irregularity is being seen in applications today, particularly in machine learning, graph analytics, high-performance computing and cybersecurity. Faced with fundamental technology constraints, architectures that have been designed around conventional assumptions on spatio-temporal locality are inefficient for these important domain areas. This PhD thesis finds that energy efficiency and performance of such data irregular applications can be improved via near memory and near processor sparse data stream acceleration and address remapping. In particular, this thesis proposes computer architectures that improve energy efficiency and performance by intelligently reducing data movement through the memory hierarchy for applications that exhibit data-irregularities due to sparse accesses or due to computationally error-tolerant post-Moore processing.
Advisors/Committee Members: Conte, Thomas M. (advisor), Kim, Hyesoon (committee member), Krishna, Tushar (committee member), Sarkar, Vivek (committee member), DeBenedictis, Erik P. (committee member).
Subjects/Keywords: Computer architecture; Sparse; Near data processing; Post-Moore computing; Cache; Memory
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Srikanth, S. (2020). Energy efficient architectures for irregular data streams. (Doctoral Dissertation). Georgia Tech. Retrieved from http://hdl.handle.net/1853/62757
Chicago Manual of Style (16th Edition):
Srikanth, Sriseshan. “Energy efficient architectures for irregular data streams.” 2020. Doctoral Dissertation, Georgia Tech. Accessed April 13, 2021.
http://hdl.handle.net/1853/62757.
MLA Handbook (7th Edition):
Srikanth, Sriseshan. “Energy efficient architectures for irregular data streams.” 2020. Web. 13 Apr 2021.
Vancouver:
Srikanth S. Energy efficient architectures for irregular data streams. [Internet] [Doctoral dissertation]. Georgia Tech; 2020. [cited 2021 Apr 13].
Available from: http://hdl.handle.net/1853/62757.
Council of Science Editors:
Srikanth S. Energy efficient architectures for irregular data streams. [Doctoral Dissertation]. Georgia Tech; 2020. Available from: http://hdl.handle.net/1853/62757

Georgia Tech
13.
Long, Yun.
Energy efficient processing in memory architecture for deep learning computing acceleration.
Degree: PhD, Electrical and Computer Engineering, 2019, Georgia Tech
URL: http://hdl.handle.net/1853/62311
► The major objective of this research is to make the processing-in-memory (PIM) based deep learning accelerator more practical and more computing efficient. This research particularly…
(more)
▼ The major objective of this research is to make the processing-in-memory (PIM) based deep learning accelerator more practical and more computing efficient. This research particularly focuses on the emerging non-volatile memory (NVM) based novel architecture design and leverages the software-hardware co-optimization to achieve the optimal computing efficiency without compromising the accuracy. From the emerging memory perspective, this research mainly explores resistive ram (ReRAM) and Ferroelectrical FET (FeFet). A dedicated recurrent neural network (RNN) accelerator is proposed which utilizes ReRAM as the basic computation cell for vector matrix multiplication (VMM). The execution pipeline is specifically optimized to ensure the efficiency for RNN computation. Regarding the challenges stemmed from ReRAM, this research also explores FeFET to replace ReRAM as the basic memory cell in PIM architecture. A dedicated data communication network, named hierarchical network-on-chip (H-NoC), is presented to enhance the data transmission efficiency. To eliminate the power/area hungry analog-digital conversion (ADC and DAC) in existing PIM architecture and further enhance the efficiency, this research proposes an all-digital, flexible precision PIM design where the computation is performed with dynamical bit-precision. Besides the circuit and architecture optimization, algorithms are developed to fully utilize the hardware potentials. This research proposes a genetic algorithm (GA) based evolutionary method for layer-wise DNN quantization. DNN models can be dynamically quantized and deployed on the developed hardware platforms which support flexible bit-precision to achieve the best computing efficiency without compromising the accuracy. To alleviate the accuracy drop caused by the device (such as ReRAM and FeFET) variation, this research proposes hardware noise aware training algorithm, leading to a reliable PIM engine with un-reliable device.
Advisors/Committee Members: Mukhopadhyay, Saibal (advisor), Khan, Asif Islam (committee member), Kim, Hyesoon (committee member), Krishna, Tushar (committee member), Yu, Shimeng (committee member).
Subjects/Keywords: Processing-in-memory; Deep learning; Resistive ram; Ferroelectric FET; Machine learning computing acceleration
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Long, Y. (2019). Energy efficient processing in memory architecture for deep learning computing acceleration. (Doctoral Dissertation). Georgia Tech. Retrieved from http://hdl.handle.net/1853/62311
Chicago Manual of Style (16th Edition):
Long, Yun. “Energy efficient processing in memory architecture for deep learning computing acceleration.” 2019. Doctoral Dissertation, Georgia Tech. Accessed April 13, 2021.
http://hdl.handle.net/1853/62311.
MLA Handbook (7th Edition):
Long, Yun. “Energy efficient processing in memory architecture for deep learning computing acceleration.” 2019. Web. 13 Apr 2021.
Vancouver:
Long Y. Energy efficient processing in memory architecture for deep learning computing acceleration. [Internet] [Doctoral dissertation]. Georgia Tech; 2019. [cited 2021 Apr 13].
Available from: http://hdl.handle.net/1853/62311.
Council of Science Editors:
Long Y. Energy efficient processing in memory architecture for deep learning computing acceleration. [Doctoral Dissertation]. Georgia Tech; 2019. Available from: http://hdl.handle.net/1853/62311

Georgia Tech
14.
Kang, Suk Chan.
Optimizing high locality memory references in cache coherent shared memory multi-core processors.
Degree: PhD, Electrical and Computer Engineering, 2019, Georgia Tech
URL: http://hdl.handle.net/1853/62641
► Optimizing memory references has been a primary research area of computer systems ever since the advent of the stored program computers. The objective of this…
(more)
▼ Optimizing memory references has been a primary research area of computer systems ever since the advent of the stored program computers. The objective of this thesis research is to identify and optimize two classes of high locality data memory reference streams in cache coherent shared memory multi-processors. More specifically, this thesis classifies such memory objects into spatial and temporal false shared memory objects. The underlying hypothesis is that the policy of treating all the memory objects as being permanently shared significantly hinders the optimization of high-locality memory objects in modern cache coherent shared memory multi-processor systems: the policy forces the systems to unconditionally prepare to incur shared-memory-related overheads for every memory reference. To verify the hypothesis, this thesis explores two different schemes to minimize the shared memory abstraction overheads associated with memory reference streams of spatial and temporal false shared memory objects, respectively. The schemes implement the exception rules which enable isolating false memory objects from the shared memory domain, in a spatial and temporal manner. However, the exception rules definitely require special consideration in cache coherent shared memory multi-processors, regarding the data consistency, cache coherence, and memory consistency model. Thus, this thesis not only implements the schemes based on such consideration, but also breaks the chain of the widespread faulty assumption of prior academic work. This high-level approach ultimately aims at upgrading scalability of large scale systems, such as multi-socket cache coherent shared memory multi-processors, throughout improving performance and reducing energy/power consumption. This thesis demonstrates the efficacy and efficiency of the schemes in terms of performance improvement and energy/power reduction.
Advisors/Committee Members: Yalamanchili, Sudhakar (advisor), Wills, Linda (committee member), Gavrilovska, Ada (committee member), Krishna, Tushar (committee member), Pande, Santosh (committee member).
Subjects/Keywords: Shared memory system; Cache coherence; Memory consistency; Synchronization
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Kang, S. C. (2019). Optimizing high locality memory references in cache coherent shared memory multi-core processors. (Doctoral Dissertation). Georgia Tech. Retrieved from http://hdl.handle.net/1853/62641
Chicago Manual of Style (16th Edition):
Kang, Suk Chan. “Optimizing high locality memory references in cache coherent shared memory multi-core processors.” 2019. Doctoral Dissertation, Georgia Tech. Accessed April 13, 2021.
http://hdl.handle.net/1853/62641.
MLA Handbook (7th Edition):
Kang, Suk Chan. “Optimizing high locality memory references in cache coherent shared memory multi-core processors.” 2019. Web. 13 Apr 2021.
Vancouver:
Kang SC. Optimizing high locality memory references in cache coherent shared memory multi-core processors. [Internet] [Doctoral dissertation]. Georgia Tech; 2019. [cited 2021 Apr 13].
Available from: http://hdl.handle.net/1853/62641.
Council of Science Editors:
Kang SC. Optimizing high locality memory references in cache coherent shared memory multi-core processors. [Doctoral Dissertation]. Georgia Tech; 2019. Available from: http://hdl.handle.net/1853/62641

Georgia Tech
15.
Amir, Mohammad Faisal.
Design methodology for 3d-stacked imaging systems with integrated deep learning.
Degree: PhD, Electrical and Computer Engineering, 2018, Georgia Tech
URL: http://hdl.handle.net/1853/61609
► The Internet of Things (IoT) revolution has brought along with it billions of always on, always connected devices and sensors, associated with which are huge…
(more)
▼ The Internet of Things (IoT) revolution has brought along with it billions of always on, always connected devices and sensors, associated with which are huge amounts of data that must be transmitted to an off-chip host for classification. However, sending these large volumes of unprocessed data incurs large latency and energy penalties which impairs the energy efficiency of resource constrained IoT systems. Moving computations to the sensor offers the potential to improve performance and energy efficiency of the end application. The objective of the presented research is to explore sensor integrated computing which allows the deployment of smart sensors capable of performing computations in-field. Initially, we introduce the design of a 3D-stacked image sensor with integrated deep learning, which uses the advantages of 3D integration to increase sensor fill factor, simplify routing, increase parallelism, and enhance memory capacity. Through an exploration of the design space we investigate how the system architecture and resource constraints can dictate system metrics such as the optimum energy efficiency configuration and accuracy-throughput tradeoffs. Next, we examine technology based solutions to further enhance system performance through the use of 3D stacked digital sensors with in-pixel ADCs, and explore how emerging device based processing-in-memory neural accelerators can offer superior energy efficiency. Furthermore, the various circuit issues involved with the design of these sensor based systems are investigated through the discussion of post-silicon results from an image sensor SOC with integrated energy harvesting. The dissertation concludes with a discussion on how energy harvesting sensors can be used to achieve energy neutral self-powered systems capable of operating solely with harvested energy.
Advisors/Committee Members: Mukhopadhyay, Saibal (advisor), Yalamanchili, Sudhakar (committee member), Khan, Asif (committee member), Krishna, Tushar (committee member), Kohl, Paul (committee member).
Subjects/Keywords: Neural networks; Image sensor; Energy harvesting; Deep learning; 3D integration
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Amir, M. F. (2018). Design methodology for 3d-stacked imaging systems with integrated deep learning. (Doctoral Dissertation). Georgia Tech. Retrieved from http://hdl.handle.net/1853/61609
Chicago Manual of Style (16th Edition):
Amir, Mohammad Faisal. “Design methodology for 3d-stacked imaging systems with integrated deep learning.” 2018. Doctoral Dissertation, Georgia Tech. Accessed April 13, 2021.
http://hdl.handle.net/1853/61609.
MLA Handbook (7th Edition):
Amir, Mohammad Faisal. “Design methodology for 3d-stacked imaging systems with integrated deep learning.” 2018. Web. 13 Apr 2021.
Vancouver:
Amir MF. Design methodology for 3d-stacked imaging systems with integrated deep learning. [Internet] [Doctoral dissertation]. Georgia Tech; 2018. [cited 2021 Apr 13].
Available from: http://hdl.handle.net/1853/61609.
Council of Science Editors:
Amir MF. Design methodology for 3d-stacked imaging systems with integrated deep learning. [Doctoral Dissertation]. Georgia Tech; 2018. Available from: http://hdl.handle.net/1853/61609

Georgia Tech
16.
Maass, Steffen Alexander.
Systems abstractions for big data processing on a single machine.
Degree: PhD, Computer Science, 2019, Georgia Tech
URL: http://hdl.handle.net/1853/61679
► Large-scale internet services, such as Facebook or Google, are using clusters of many servers for problems such as search, machine learning, and social networks. However,…
(more)
▼ Large-scale internet services, such as Facebook or Google, are using clusters of many servers for problems such as search, machine learning, and social networks. However, while it may be possible to apply the tools used at this scale to smaller, more common problems as well, this dissertation presents approaches to large-scale data processing on only a single machine. This approach has obvious cost benefits and lowers the barrier of entrance to large-scale data processing. This dissertation approaches this problem by redesigning applications to enable trillion-scale graph processing on a single machine while also enabling the processing of evolving, billion-scale graphs. First, this dissertation presents a new out-of-core graph processing engine, called Mosaic, for executing graph algorithms on trillion-scale datasets on a single machine. Mosaic makes use of many-core processors and fast I/O devices coupled with a novel graph encoding scheme to allow processing of graphs of up to one trillion edges on a single machine. Mosaic also employs a locality-preserving, space-filling curve to allow for high compression and high locality when storing graphs and executing algorithms. Our evaluation shows that for smaller graphs, Mosaic consistently outperforms other state-of-the-art out-of-core engines by 3.2x–58.6x and shows comparable performance to distributed graph engines. Furthermore, Mosaic can complete one iteration of the Pagerank algorithm on a trillion-edge graph in 21 minutes, outperforming a distributed disk-based engine by 9.2x. Second, while Mosaic addresses the setting of processing static graph, this dissertation presents Cytom, a new engine for processing billion-scale evolving graphs based on insights about achieving high compression and locality while improving load-balancing when processing a graph that changes rapidly. Cytom introduces a novel programming model that takes advantage of its subgraph-centric approach coupled with the setting of evolving graphs. This is an important enabling step for emerging workloads when processing graphs that change over time. Cytom’s programming model allows algorithm developers to quickly react to graph updates, discarding uninteresting ones while focusing on updates that, in fact, change the algorithmic result. We show that Cytom is effective in scaling to billion-edge graphs, as well as providing higher throughput when updating the graph structure (2.0x–192x) and higher throughput (1.5x–200x) when additionally processing an algorithm.
Advisors/Committee Members: Kim, Taesoo (advisor), Gavrilovska, Ada (committee member), Ramachandran, Umakishore (committee member), Krishna, Tushar (committee member), Zwaenepoel, Willy (committee member).
Subjects/Keywords: Runtime system; Big data; Graph analytics; Performance optimization; Incremental processing; Heterogeneous computing
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Maass, S. A. (2019). Systems abstractions for big data processing on a single machine. (Doctoral Dissertation). Georgia Tech. Retrieved from http://hdl.handle.net/1853/61679
Chicago Manual of Style (16th Edition):
Maass, Steffen Alexander. “Systems abstractions for big data processing on a single machine.” 2019. Doctoral Dissertation, Georgia Tech. Accessed April 13, 2021.
http://hdl.handle.net/1853/61679.
MLA Handbook (7th Edition):
Maass, Steffen Alexander. “Systems abstractions for big data processing on a single machine.” 2019. Web. 13 Apr 2021.
Vancouver:
Maass SA. Systems abstractions for big data processing on a single machine. [Internet] [Doctoral dissertation]. Georgia Tech; 2019. [cited 2021 Apr 13].
Available from: http://hdl.handle.net/1853/61679.
Council of Science Editors:
Maass SA. Systems abstractions for big data processing on a single machine. [Doctoral Dissertation]. Georgia Tech; 2019. Available from: http://hdl.handle.net/1853/61679

Georgia Tech
17.
Kersey, Chad Daniel.
A multi-paradigm C++-based hardware description language.
Degree: PhD, Electrical and Computer Engineering, 2019, Georgia Tech
URL: http://hdl.handle.net/1853/62342
► A generative hardware description library for C++, the CHDL Hardware Design Library or CHDL, along with a body of supporting libraries and a description of…
(more)
▼ A generative hardware description library for C++, the CHDL Hardware Design Library or CHDL, along with a body of supporting libraries and a description of a core design implemented using this library, are presented. The supporting libraries extend the level of abstraction covered by CHDL from the solely constructive and generative to a range of hardware description paradigms including the register transfer level (RTL), an implementation of Bluespec-like guarded atomic actions (GAA), and a novel pipeline-oriented HDL providing a high-level synthesis flow from algorithmic descriptions of pipelined hardware. Design input using all of these paradigms is converted by CHDL into an in-memory gate level netlist that may be simulated, emitted as synthesizable Verilog, or technology mapped to a standard cell library for area and energy estimation. Access to this netlist, dubbed “netlist introspection”, is provided by the CHDL API, allowing novel optimizations and transformations to be performed by the designer.
Advisors/Committee Members: Yalamanchili, Sudhakar (advisor), Kim, Hyesoon (advisor), Mukhopadhyay, Saibal (committee member), Conte, Thomas (committee member), Vuduc, Richard (committee member), Krishna, Tushar (committee member).
Subjects/Keywords: Hardware description language; HDL; Domain-specific language; High-level synthesis
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Kersey, C. D. (2019). A multi-paradigm C++-based hardware description language. (Doctoral Dissertation). Georgia Tech. Retrieved from http://hdl.handle.net/1853/62342
Chicago Manual of Style (16th Edition):
Kersey, Chad Daniel. “A multi-paradigm C++-based hardware description language.” 2019. Doctoral Dissertation, Georgia Tech. Accessed April 13, 2021.
http://hdl.handle.net/1853/62342.
MLA Handbook (7th Edition):
Kersey, Chad Daniel. “A multi-paradigm C++-based hardware description language.” 2019. Web. 13 Apr 2021.
Vancouver:
Kersey CD. A multi-paradigm C++-based hardware description language. [Internet] [Doctoral dissertation]. Georgia Tech; 2019. [cited 2021 Apr 13].
Available from: http://hdl.handle.net/1853/62342.
Council of Science Editors:
Kersey CD. A multi-paradigm C++-based hardware description language. [Doctoral Dissertation]. Georgia Tech; 2019. Available from: http://hdl.handle.net/1853/62342

Georgia Tech
18.
Hossen, Md Obaidul.
Power delivery and thermal considerations for 2.5-D and 3-D integration technologies.
Degree: PhD, Electrical and Computer Engineering, 2019, Georgia Tech
URL: http://hdl.handle.net/1853/62346
► Owing to advanced technologies, the total power density in a high-performance computing system is expected to increase beyond 100 W/cm2; power delivery becomes a critical…
(more)
▼ Owing to advanced technologies, the total power density in a high-performance computing system is expected to increase beyond 100 W/cm2; power delivery becomes a critical challenge, and advanced cooling solutions are turning into a necessity. Moreover, reduced noise margin determined by the scaling trend of the technology is making power delivery to the chip ever more challenging. Placing dice side-by-side poses thermal coupling issues where heat flows from the high-power die to the low-power die. There are also inter-dependencies among these different domains. Therefore, in this research effort, we investigate and benchmark different 2.5-D and 3-D heterogeneous integration technologies on the thermal and electrical performance and their inter-dependencies. We develop a thermally aware power delivery network (PDN) design framework to investigate power supply noise for emerging 2.5-D and 3-D integration technologies. We also present a novel backside-PDN configuration where the PDN is separated from the signaling network of the die. The research tasks will feed into one another in order to develop more comprehensive pre-design analysis of heterogeneous integration systems.
Advisors/Committee Members: Bakir, Muhannad S. (advisor), Naeemi, Azad (committee member), Raychowdhury, Arijit (committee member), Krishna, Tushar (committee member), Joshi, Yogendra (committee member).
Subjects/Keywords: Power delivery network; Heterogeneous Integration
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Hossen, M. O. (2019). Power delivery and thermal considerations for 2.5-D and 3-D integration technologies. (Doctoral Dissertation). Georgia Tech. Retrieved from http://hdl.handle.net/1853/62346
Chicago Manual of Style (16th Edition):
Hossen, Md Obaidul. “Power delivery and thermal considerations for 2.5-D and 3-D integration technologies.” 2019. Doctoral Dissertation, Georgia Tech. Accessed April 13, 2021.
http://hdl.handle.net/1853/62346.
MLA Handbook (7th Edition):
Hossen, Md Obaidul. “Power delivery and thermal considerations for 2.5-D and 3-D integration technologies.” 2019. Web. 13 Apr 2021.
Vancouver:
Hossen MO. Power delivery and thermal considerations for 2.5-D and 3-D integration technologies. [Internet] [Doctoral dissertation]. Georgia Tech; 2019. [cited 2021 Apr 13].
Available from: http://hdl.handle.net/1853/62346.
Council of Science Editors:
Hossen MO. Power delivery and thermal considerations for 2.5-D and 3-D integration technologies. [Doctoral Dissertation]. Georgia Tech; 2019. Available from: http://hdl.handle.net/1853/62346

Georgia Tech
19.
Hassan, Syed Minhaj.
Exploiting on-chip memory concurrency in 3d manycore architectures.
Degree: PhD, Electrical and Computer Engineering, 2016, Georgia Tech
URL: http://hdl.handle.net/1853/56251
► The objective of this thesis is to optimize the uncore of 3D many-core architectures. More specifically, we note that technology trends point to large increases…
(more)
▼ The objective of this thesis is to optimize the uncore of 3D many-core architectures. More specifically, we note that technology trends point to large increases in memory-level concurrency. This in turn affects the design of the multi-core interconnect and organization of the memory hierarchy. The work addresses the need for re-optimization in the presence of this increase in concurrency of the memory system. First, we observe that 2D network latency and inefficient parallelism management in the current 3D designs are the main bottlenecks to fully exploit the potentials of 3D. To that end, we propose an extremely low-latency, low-power, high-radix router and present its various versions for different network typologies and configurations. We also explore optimizations and techniques to reduce the traffic in the network. Second, we propose a reorganization of the memory hierarchy and use simple address space translations to regulate locality, bandwidth and energy trade-offs in highly concurrent 3D memory systems. Third, we analyze the rise in temperature of 3D memories and propose variable-rate per-bank refresh management that exploits variability in temperature to reduce 3D DRAM's refresh power and extend its operating range to higher temperatures.
Advisors/Committee Members: Yalamanchili, Sudhakar (advisor), Mukhopadhyay, Saibal (committee member), Krishna, Tushar (committee member), Kim, Hyesoon (committee member), Pande, Santosh (committee member), Vuduc, Richard (committee member).
Subjects/Keywords: 3D memory systems; Network-on-chip; 3D system thermal analysis; Memory-level parallelism; DRAM
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Hassan, S. M. (2016). Exploiting on-chip memory concurrency in 3d manycore architectures. (Doctoral Dissertation). Georgia Tech. Retrieved from http://hdl.handle.net/1853/56251
Chicago Manual of Style (16th Edition):
Hassan, Syed Minhaj. “Exploiting on-chip memory concurrency in 3d manycore architectures.” 2016. Doctoral Dissertation, Georgia Tech. Accessed April 13, 2021.
http://hdl.handle.net/1853/56251.
MLA Handbook (7th Edition):
Hassan, Syed Minhaj. “Exploiting on-chip memory concurrency in 3d manycore architectures.” 2016. Web. 13 Apr 2021.
Vancouver:
Hassan SM. Exploiting on-chip memory concurrency in 3d manycore architectures. [Internet] [Doctoral dissertation]. Georgia Tech; 2016. [cited 2021 Apr 13].
Available from: http://hdl.handle.net/1853/56251.
Council of Science Editors:
Hassan SM. Exploiting on-chip memory concurrency in 3d manycore architectures. [Doctoral Dissertation]. Georgia Tech; 2016. Available from: http://hdl.handle.net/1853/56251

Georgia Tech
20.
Wang, Jin.
Acceleration and optimization of dynamic parallelism for irregular applications on GPUs.
Degree: PhD, Electrical and Computer Engineering, 2016, Georgia Tech
URL: http://hdl.handle.net/1853/56294
► The objective of this thesis is the development, implementation and optimization of a GPU execution model extension that efficiently supports time-varying, nested, fine-grained dynamic parallelism…
(more)
▼ The objective of this thesis is the development, implementation and optimization of a GPU execution model extension that efficiently supports time-varying, nested, fine-grained dynamic parallelism occurring in the irregular data intensive applications. These dynamically formed pockets of structured parallelism can utilize the recently introduced device-side nested kernel launch capabilities on GPUs. However, the low utilization of GPU resources and the high cost of the device kernel launch make it still difficult to harness dynamic parallelism on GPUs. This thesis then presents an extension to the common Bulk Synchronous Parallel (BSP) GPU execution model – Dynamic Thread Block Launch (DTBL), which provides the capability of spawning light-weight thread blocks from GPU threads on demand and coalescing them to existing native executing kernels. The finer granularity of a thread block provides effective and efficient control of smaller-scale, dynamically occurring nested pockets of structured parallelism during the computation. Evaluations of DTBL show an average of 1.21x speedup over the baseline implementations. The thesis proposes two classes of optimizations of this model. The first is a thread block scheduling strategy that exploits spatial and temporal reference locality between parent kernels and dynamically launched child kernels. The locality-aware thread block scheduler is able to achieve another 27% increase in the overall performance. The second is an energy efficiency optimization which utilizes the SMX occupancy bubbles during the execution of a DTBL application and converts them to SMX idle period where a flexible DVFS technique can be applied to reduce the dynamic and leakage power to achieve better energy efficiency. By presenting the implementations, measurements and key insights, this thesis takes a step in addressing the challenges and issues in emerging irregular applications.
Advisors/Committee Members: Yalamanchili, Sudhakar (advisor), Kim, Hyesoon (committee member), Vuduc, Richard (committee member), Krishna, Tushar (committee member), Pande, Santosh (committee member).
Subjects/Keywords: General-purpose GPU; Dynamic parallelism; Irregular applications
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Wang, J. (2016). Acceleration and optimization of dynamic parallelism for irregular applications on GPUs. (Doctoral Dissertation). Georgia Tech. Retrieved from http://hdl.handle.net/1853/56294
Chicago Manual of Style (16th Edition):
Wang, Jin. “Acceleration and optimization of dynamic parallelism for irregular applications on GPUs.” 2016. Doctoral Dissertation, Georgia Tech. Accessed April 13, 2021.
http://hdl.handle.net/1853/56294.
MLA Handbook (7th Edition):
Wang, Jin. “Acceleration and optimization of dynamic parallelism for irregular applications on GPUs.” 2016. Web. 13 Apr 2021.
Vancouver:
Wang J. Acceleration and optimization of dynamic parallelism for irregular applications on GPUs. [Internet] [Doctoral dissertation]. Georgia Tech; 2016. [cited 2021 Apr 13].
Available from: http://hdl.handle.net/1853/56294.
Council of Science Editors:
Wang J. Acceleration and optimization of dynamic parallelism for irregular applications on GPUs. [Doctoral Dissertation]. Georgia Tech; 2016. Available from: http://hdl.handle.net/1853/56294

Georgia Tech
21.
Wahby, William.
Theoretical and experimental investigations of connectivity in three-dimensional integrated circuits.
Degree: PhD, Electrical and Computer Engineering, 2018, Georgia Tech
URL: http://hdl.handle.net/1853/61184
► Three-dimensional integration, in which integrated circuits (ICs) are stacked directly atop one another to reduce interconnect length, is an attractive method for achieving continued performance…
(more)
▼ Three-dimensional integration, in which integrated circuits (ICs) are stacked directly atop one another to reduce interconnect length, is an attractive method for achieving continued performance and efficiency improvements as conventional 2D scaling slows. Significant uncertainty remains, however, about the best methods and designs to adopt for optimal system performance and energy efficiency. In order to address these questions, we present improved interconnect and system-level models for 3D IC performance, as well as experimental investigations into potential thermal limits to the stacking of high performance components. Improved modeling methods for ultra-densely-interconnected systems will also be discussed, in order to quantify the performance impacts of restricted connectivity on complex systems.
Advisors/Committee Members: Bakir, Muhannad (advisor), Naeemi, Azad (committee member), Davis, Jeffrey (committee member), Joshi, Yogendra (committee member), Krishna, Tushar (committee member).
Subjects/Keywords: 3DIC; Interconnect; Thermal
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Wahby, W. (2018). Theoretical and experimental investigations of connectivity in three-dimensional integrated circuits. (Doctoral Dissertation). Georgia Tech. Retrieved from http://hdl.handle.net/1853/61184
Chicago Manual of Style (16th Edition):
Wahby, William. “Theoretical and experimental investigations of connectivity in three-dimensional integrated circuits.” 2018. Doctoral Dissertation, Georgia Tech. Accessed April 13, 2021.
http://hdl.handle.net/1853/61184.
MLA Handbook (7th Edition):
Wahby, William. “Theoretical and experimental investigations of connectivity in three-dimensional integrated circuits.” 2018. Web. 13 Apr 2021.
Vancouver:
Wahby W. Theoretical and experimental investigations of connectivity in three-dimensional integrated circuits. [Internet] [Doctoral dissertation]. Georgia Tech; 2018. [cited 2021 Apr 13].
Available from: http://hdl.handle.net/1853/61184.
Council of Science Editors:
Wahby W. Theoretical and experimental investigations of connectivity in three-dimensional integrated circuits. [Doctoral Dissertation]. Georgia Tech; 2018. Available from: http://hdl.handle.net/1853/61184

Georgia Tech
22.
Parasar, Mayank.
Subactive techniques for guaranteeing routing and protocol deadlock freedom in interconnection networks.
Degree: PhD, Electrical and Computer Engineering, 2020, Georgia Tech
URL: http://hdl.handle.net/1853/63654
► Interconnection networks are the communication backbone for any system. They occur at various scales: from on-chip networks between processing cores, to supercomputers between compute nodes,…
(more)
▼ Interconnection networks are the communication backbone for any system. They occur at various scales: from on-chip networks between processing cores, to supercomputers between compute nodes, to data centers between high-end servers. One of the most fundamental challenges in an interconnection network is that of deadlocks. Deadlocks can be of two types: routing level deadlocks and protocol level deadlocks. Routing level deadlocks occur because of cyclic dependency between packets trying to acquire buffers, whereas protocol level deadlock occurs because the response message is stuck indefinitely behind the queue of request messages. Both kinds of deadlock render the forward movement of packets impossible leading to complete system failure. Prior work either restricts the path that packets take in the network or provisions an extra set of buffers to resolve routing level deadlocks. For protocol level deadlocks, separate sets of buffers are reserved at every router for each message class. Naturally, proposed solutions either restrict the packet movement resulting in lower performance or require higher area and power. In this thesis, we propose a new set of efficient techniques for providing both routing and protocol level deadlock freedom. Our techniques provide periodic forced movement to the packets in the network, which breaks any cyclic dependency of packets. Breaking this cyclic dependency results in resolving routing level deadlocks. Moreover, because of periodic forced movement, the response message is never stuck indefinitely behind the queue of request messages; therefore, our techniques also resolve protocol level deadlocks. We use the term `subactive' for these new class of techniques
Advisors/Committee Members: Krishna, Tushar (advisor), Kim, Hyesoon (committee member), Daglis, Alexandros (committee member), Gratz, Paul V. (committee member), Qureshi, Moinuddin K. (committee member).
Subjects/Keywords: Interconnection network; Routing deadlock; Protocol deadlock; Proactive; Reactive; Subactive; Network on chip; Computer architecture
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Parasar, M. (2020). Subactive techniques for guaranteeing routing and protocol deadlock freedom in interconnection networks. (Doctoral Dissertation). Georgia Tech. Retrieved from http://hdl.handle.net/1853/63654
Chicago Manual of Style (16th Edition):
Parasar, Mayank. “Subactive techniques for guaranteeing routing and protocol deadlock freedom in interconnection networks.” 2020. Doctoral Dissertation, Georgia Tech. Accessed April 13, 2021.
http://hdl.handle.net/1853/63654.
MLA Handbook (7th Edition):
Parasar, Mayank. “Subactive techniques for guaranteeing routing and protocol deadlock freedom in interconnection networks.” 2020. Web. 13 Apr 2021.
Vancouver:
Parasar M. Subactive techniques for guaranteeing routing and protocol deadlock freedom in interconnection networks. [Internet] [Doctoral dissertation]. Georgia Tech; 2020. [cited 2021 Apr 13].
Available from: http://hdl.handle.net/1853/63654.
Council of Science Editors:
Parasar M. Subactive techniques for guaranteeing routing and protocol deadlock freedom in interconnection networks. [Doctoral Dissertation]. Georgia Tech; 2020. Available from: http://hdl.handle.net/1853/63654

Georgia Tech
23.
Kwon, Hyouk Jun.
Data- and communication-centric approaches to model and design flexible deep neural network accelerators.
Degree: PhD, Computer Science, 2020, Georgia Tech
URL: http://hdl.handle.net/1853/63663
► Deep neural network (DNN) accelerators, which are specialized hardware for DNN inferences, enabled energy-efficient and low-latency DNN inferences. To maximize the efficiency (energy efficiency, latency,…
(more)
▼ Deep neural network (DNN) accelerators, which are specialized hardware for DNN inferences, enabled energy-efficient and low-latency DNN inferences. To maximize the efficiency (energy efficiency, latency, and throughput) of DNN accelerators, DNN accelerator designers optimize DNN accelerator and mapping of target DNN models on the accelerator. However, designing DNN accelerators for recent DNN models that contain diverse layer operations and size is challenging since optimizing accelerator and mapping for the average case of the layers in target DNN workloads often lead to uniformly inefficient design points. Therefore, this thesis proposes to design flexible mapping DNN accelerators that can run multiple mappings to adapt to diverse DNN layers in DNN workloads. This thesis first quantifies the costs and benefits of mapping using a data-centric approach. Based on the observation that no single mapping is ideal for all layers, this thesis explores two approaches to design flexible mapping accelerators: reconfigurability and heterogeneity. Reconfigurable accelerators are based on communication-centric approach that implements flexible network-on-chip (NoC) to enable to configure accelerator at runtime for any mapping styles. Heterogeneous accelerators employ multiple sub-accelerators with fixed but diverse mapping styles within an accelerator chip to provide coarser-grained flexibility with lower area and power cost than the reconfigurability. Case studies show that both approaches provide Pareto-optimal design points with different strengths.
Advisors/Committee Members: Krishna, Tushar (advisor), Pellauer, Michael (committee member), Sarkar, Vivek (committee member), Kim, Hyesoon (committee member), Tumanov, Alexey (committee member).
Subjects/Keywords: DNN accelerator; DNN dataflow; DNN mapping; Flexible mapping accelerator
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Kwon, H. J. (2020). Data- and communication-centric approaches to model and design flexible deep neural network accelerators. (Doctoral Dissertation). Georgia Tech. Retrieved from http://hdl.handle.net/1853/63663
Chicago Manual of Style (16th Edition):
Kwon, Hyouk Jun. “Data- and communication-centric approaches to model and design flexible deep neural network accelerators.” 2020. Doctoral Dissertation, Georgia Tech. Accessed April 13, 2021.
http://hdl.handle.net/1853/63663.
MLA Handbook (7th Edition):
Kwon, Hyouk Jun. “Data- and communication-centric approaches to model and design flexible deep neural network accelerators.” 2020. Web. 13 Apr 2021.
Vancouver:
Kwon HJ. Data- and communication-centric approaches to model and design flexible deep neural network accelerators. [Internet] [Doctoral dissertation]. Georgia Tech; 2020. [cited 2021 Apr 13].
Available from: http://hdl.handle.net/1853/63663.
Council of Science Editors:
Kwon HJ. Data- and communication-centric approaches to model and design flexible deep neural network accelerators. [Doctoral Dissertation]. Georgia Tech; 2020. Available from: http://hdl.handle.net/1853/63663

Georgia Tech
24.
Mururu, Girish.
Compiler Guided Scheduling : A Cross-Stack Approach For Performance Elicitation.
Degree: PhD, Computer Science, 2020, Georgia Tech
URL: http://hdl.handle.net/1853/64096
► Modern software executes on multi-core systems that share resources like several levels of memory hierarchy (caches, main memory, secondary storage), I/O devices, and network interfaces.…
(more)
▼ Modern software executes on multi-core systems that share resources like several levels of memory hierarchy (caches, main memory, secondary storage), I/O devices, and network interfaces. In such a co-execution environment, the performance of modern software is critically affected because of resource conflicts arising from sharing of these resources. The resource requirements vary not only across the processes but also during the execution of a process. Current resource management techniques involving OS schedulers have evolved from and mainly rely on the principles of fairness (achieved through time-multiplexing) and load-balancing and are oblivious to the dynamic resource requirements of individual processes. On the other hand, compiler research has traditionally evolved around optimizing single and multi-threaded programs limited to one process. However, compilers can analyze the process resource requirements. This thesis contends that a significant performance enhancement can be achieved through the compiler guidance of schedulers in terms of dynamic program characteristics and resource needs.
Towards compiler guided scheduling, we first look at the problem of process migration. For load-balancing purposes, OS schedulers such as CFS can migrate threads when they are in the middle of an intense memory reuse region thus destroying warmed up caches, TLBs. To solve this problem while providing enough flexibility for load-balancing, we propose PinIt, which first determines the regions of a program in which the process should be pinned onto a core so that adverse migrations causing excessive cache and TLB misses are avoided. The thesis proposes new measures such as unique memory reuse and memory reuse density, that capture the performance penalties incurred due to migration. Such regions with high penalties are encapsulated by the compiler with pin/unpin calls that prevent migrations. In an overloaded environment, compared to priority-cfs, PinIt speeds up high-priority applications in mediabench workloads by 1.16x and 2.12x and in computer vision-based workloads by 1.35x and 1.23x on 8 cores and 16 cores, respectively, with almost same or better throughput for low-priority applications.
The problem of co-scheduling and co-location of processes that share resources must be solved for efficiency in a co-execution environment. Towards this, several approaches proposed in the literature rely on static profile data or dynamic performance counter based information, which inherently cannot be used in an anticipatory (proactive) manner leading to suboptimal scheduling. This thesis proposes Beacons, a generic framework that instruments the programs with generated models or equations of specific characteristics of the program and provides a runtime counterpart that delivers the dynamically generated information to the scheduler. We develop a novel timing analysis for the duration of the loop that is on average 84% accurate on Polybench and Rodinia benchmarks and embed that along with memory footprint, and locality…
Advisors/Committee Members: Pande, Santosh (advisor), Gavrilovska, Ada (committee member), Ramachandran, Umakishore (committee member), Sarkar, Vivek (committee member), Krishna, Tushar (committee member).
Subjects/Keywords: Compilers; Scheduling; Co-location
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Mururu, G. (2020). Compiler Guided Scheduling : A Cross-Stack Approach For Performance Elicitation. (Doctoral Dissertation). Georgia Tech. Retrieved from http://hdl.handle.net/1853/64096
Chicago Manual of Style (16th Edition):
Mururu, Girish. “Compiler Guided Scheduling : A Cross-Stack Approach For Performance Elicitation.” 2020. Doctoral Dissertation, Georgia Tech. Accessed April 13, 2021.
http://hdl.handle.net/1853/64096.
MLA Handbook (7th Edition):
Mururu, Girish. “Compiler Guided Scheduling : A Cross-Stack Approach For Performance Elicitation.” 2020. Web. 13 Apr 2021.
Vancouver:
Mururu G. Compiler Guided Scheduling : A Cross-Stack Approach For Performance Elicitation. [Internet] [Doctoral dissertation]. Georgia Tech; 2020. [cited 2021 Apr 13].
Available from: http://hdl.handle.net/1853/64096.
Council of Science Editors:
Mururu G. Compiler Guided Scheduling : A Cross-Stack Approach For Performance Elicitation. [Doctoral Dissertation]. Georgia Tech; 2020. Available from: http://hdl.handle.net/1853/64096

Georgia Tech
25.
Chatarasi, Prasanth.
ADVANCING COMPILER OPTIMIZATIONS FOR GENERAL-PURPOSE & DOMAIN-SPECIFIC PARALLEL ARCHITECTURES.
Degree: PhD, Computer Science, 2020, Georgia Tech
URL: http://hdl.handle.net/1853/64099
► Computer hardware is undergoing a major disruption as we approach the end of Moore’s law, in the form of new advancements to general-purpose and domain-specific…
(more)
▼ Computer hardware is undergoing a major disruption as we approach the end of Moore’s law, in the form of new advancements to general-purpose and domain-specific parallel architectures. Contemporaneously, the demand for higher performance is broadening across multiple application domains ranging from scientific computing applications to deep learning and graph analytics. These trends raise a plethora of challenges to the de-facto approach to achieving higher performance, namely application development using high-performance libraries. Some of the challenges include porting/adapting to multiple parallel architectures, supporting rapidly advancing domains, and also inhibiting optimizations across library calls. Hence, there is a renewed focus on advancing optimizing compilers from industry and academia to address the above trends, but doing so requires enabling compilers to work effectively on a wide range of applications and also to exploit current and future parallel architectures better. As summarized below, this thesis focuses on compiler advancements for current and future hardware trends.
First, we observe that software with explicit parallelism for general-purpose multi-core CPUs and GPUs is on the rise, but the foundation of current compiler frameworks is based on optimizing sequential code. Our approach uses explicit parallelism specified by the programmer as logical parallelism to refine the conservative dependence analysis inherent in compilers (arising from the presence of program constructs such as pointer aliasing, unknown function calls, non-affine subscript expressions, recursion, and unstructured control flow). This approach makes it possible to combine user-specified parallelism and compiler-generated parallelism in a new unified polyhedral compilation framework (PoPP).
Second, despite the fact that compiler technologies for automatic vectorization for general-purpose vector processing (SIMD) units have been under development for over four decades, there are still considerable gaps in the capabilities of modern compilers to perform automatic vectorization. One such gap can be found in the handling of loops with dependence cycles that involve memory-based anti (write-after-read) and output (write-after-write) dependences. A significant limitation in past work is the lack of a unified formulation that synergistically integrates multiple storage transformations to break the cycles and further unify the formulation with loop transformations to enable vectorization. To address this limitation, we propose the PolySIMD approach.
Third, the efficiency of domain-specific spatial accelerators for Deep Learning (DL) solutions depends heavily on the compiler's ability to generate optimized mappings or code for various DL operators (building blocks of DL models, e.g., CONV2D, GEMM) on the accelerator's compute and memory resources. However, the rapid emergence of new operators and new accelerators pose two key challenges/requirements to the existing compilers: 1) Ability to perform fine-grained reasoning of…
Advisors/Committee Members: Sarkar, Vivek (advisor), Shirako, Jun (advisor), Pande, Santosh (committee member), Krishna, Tushar (committee member), Vuduc, Richard (committee member).
Subjects/Keywords: Compiler Optimizations; General-Purpose Architectures; Domain-Specific Architectures; Deep Learning; Graph Analytics; Accelerators; Polyhedral Model
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Chatarasi, P. (2020). ADVANCING COMPILER OPTIMIZATIONS FOR GENERAL-PURPOSE & DOMAIN-SPECIFIC PARALLEL ARCHITECTURES. (Doctoral Dissertation). Georgia Tech. Retrieved from http://hdl.handle.net/1853/64099
Chicago Manual of Style (16th Edition):
Chatarasi, Prasanth. “ADVANCING COMPILER OPTIMIZATIONS FOR GENERAL-PURPOSE & DOMAIN-SPECIFIC PARALLEL ARCHITECTURES.” 2020. Doctoral Dissertation, Georgia Tech. Accessed April 13, 2021.
http://hdl.handle.net/1853/64099.
MLA Handbook (7th Edition):
Chatarasi, Prasanth. “ADVANCING COMPILER OPTIMIZATIONS FOR GENERAL-PURPOSE & DOMAIN-SPECIFIC PARALLEL ARCHITECTURES.” 2020. Web. 13 Apr 2021.
Vancouver:
Chatarasi P. ADVANCING COMPILER OPTIMIZATIONS FOR GENERAL-PURPOSE & DOMAIN-SPECIFIC PARALLEL ARCHITECTURES. [Internet] [Doctoral dissertation]. Georgia Tech; 2020. [cited 2021 Apr 13].
Available from: http://hdl.handle.net/1853/64099.
Council of Science Editors:
Chatarasi P. ADVANCING COMPILER OPTIMIZATIONS FOR GENERAL-PURPOSE & DOMAIN-SPECIFIC PARALLEL ARCHITECTURES. [Doctoral Dissertation]. Georgia Tech; 2020. Available from: http://hdl.handle.net/1853/64099

Georgia Tech
26.
Tannu, Swamit.
Software Techniques to Mitigate Errors on Noisy Quantum Computers.
Degree: PhD, Electrical and Computer Engineering, 2020, Georgia Tech
URL: http://hdl.handle.net/1853/64123
► Quantum computers are domain-specific accelerators that can provide a large speedup for important problems. Quantum computers with few tens of qubits have already been demonstrated,…
(more)
▼ Quantum computers are domain-specific accelerators that can provide a large speedup
for important problems. Quantum computers with few tens of qubits have already been
demonstrated, and machines with 100+ qubits are expected soon. These machines face
significant reliability and scalability challenges. The high hardware error rates limit quantum
computers. To enable quantum speedup, it is essential to mitigate hardware errors.
Our first work exploits the variability in the error rates of qubits to steer more operations
towards qubits with lower error rates and avoid error-prone qubits. Our second work looks at
executing different versions of the programs tuned to cause diverse mistakes so that the
machine is less vulnerable to correlated errors, thereby making it easier to infer the correct
answer. Our third work looks at exploiting the state-dependent bias in measurement errors
(state 1 is more error-prone than state 0) and dynamically flips the state of the qubit to measure
the stronger state. We perform our evaluations on real quantum machines from IBM and
demonstrate significant improvement in the overall system reliability.
Advisors/Committee Members: Qureshi, Moinuddin (advisor), Sarkar, Vivek (committee member), Khan, Asif (committee member), Krishna, Tushar (committee member), Brown, Kenneth (committee member).
Subjects/Keywords: Quantum Computing; Quantum Software; Quantum Compilation; Noisy Intermediate Scale Quantum Computers
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Tannu, S. (2020). Software Techniques to Mitigate Errors on Noisy Quantum Computers. (Doctoral Dissertation). Georgia Tech. Retrieved from http://hdl.handle.net/1853/64123
Chicago Manual of Style (16th Edition):
Tannu, Swamit. “Software Techniques to Mitigate Errors on Noisy Quantum Computers.” 2020. Doctoral Dissertation, Georgia Tech. Accessed April 13, 2021.
http://hdl.handle.net/1853/64123.
MLA Handbook (7th Edition):
Tannu, Swamit. “Software Techniques to Mitigate Errors on Noisy Quantum Computers.” 2020. Web. 13 Apr 2021.
Vancouver:
Tannu S. Software Techniques to Mitigate Errors on Noisy Quantum Computers. [Internet] [Doctoral dissertation]. Georgia Tech; 2020. [cited 2021 Apr 13].
Available from: http://hdl.handle.net/1853/64123.
Council of Science Editors:
Tannu S. Software Techniques to Mitigate Errors on Noisy Quantum Computers. [Doctoral Dissertation]. Georgia Tech; 2020. Available from: http://hdl.handle.net/1853/64123

Georgia Tech
27.
Bak, Seonmyeong.
Runtime Approaches to Improve the Efficiency of Hybrid and Irregular Applications.
Degree: PhD, Computer Science, 2020, Georgia Tech
URL: http://hdl.handle.net/1853/64198
► On-node parallelism has increased significantly in high-performance computing systems. This huge amount of parallelism can be used to speed up regular paral- lel applications relatively…
(more)
▼ On-node parallelism has increased significantly in high-performance computing systems. This huge amount of parallelism can be used to speed up regular paral- lel applications relatively easily because straightforward approaches usually suffice to map their computation patterns and data layouts on to available on-node parallelism. However, irregular parallel applications require considerable effort to run on the mod- ern processors with massive amounts of intra-node parallelism. Parallel programming models and runtime approaches have been proposed to help programmers to write those applications quickly, but it’s still not easy to write efficient irregular paral- lel applications. Two key challenges in mapping irregular applications onto on-node parallelism are load balance and computation-communication overlap. In this thesis proposal, we address these challenges through new runtime approaches and new APIs that enable users to provide minimal information for application-aware scheduling.
First, we introduce new algorithms to improve the scheduling of irregular task graphs containing a mix of communication and computation tasks with data-parallelism and blocking operations. We combine gang-scheduling with work-stealing for data- parallel tasks with frequent inter/intra-node communication in the task graphs so as to reduce interference and expensive context switching operations. We also propose improved victim selection policies for work-stealing to improve the load balance and overlap of ready tasks that have child tasks.
Next, we propose an efficient integrated runtime system to handle load balancing of irregular applications written in hybrid parallel programming models. We introduce a unified runtime system that integrates distributed and shared-memory programming, as exemplified by the combination of Charm++ and OpenMP. In this approach, all processing resources (cores) can be used flexibly across both the distributed and shared-memory levels, thereby enabling more efficient load balancing at the intra-node level and reduced waiting times for global synchronization at the inter-node level.
Finally, we propose a set of APIs that enable users to specify functions used to decompose a target loop into subspaces and to create chunks within each subspace for application-specific load balancing. Our runtime leverages the information provided in the APIs to create user-defined chunks and store balanced groups of chunks in a shared data structure indexed by static loop constructs. In this way, the stored information from one invocation of a loop can be reused in following invocations for an improved initial load balance.
Advisors/Committee Members: Sarkar, Vivek (advisor), Çatalyürek, Ümit (committee member), Gavrilovska, Ada (committee member), Krishna, Tushar (committee member), Tumanov, Alexey (committee member).
Subjects/Keywords: High Performance Computing; Runtime Systems; Load Balancing; Task-based programming models
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Bak, S. (2020). Runtime Approaches to Improve the Efficiency of Hybrid and Irregular Applications. (Doctoral Dissertation). Georgia Tech. Retrieved from http://hdl.handle.net/1853/64198
Chicago Manual of Style (16th Edition):
Bak, Seonmyeong. “Runtime Approaches to Improve the Efficiency of Hybrid and Irregular Applications.” 2020. Doctoral Dissertation, Georgia Tech. Accessed April 13, 2021.
http://hdl.handle.net/1853/64198.
MLA Handbook (7th Edition):
Bak, Seonmyeong. “Runtime Approaches to Improve the Efficiency of Hybrid and Irregular Applications.” 2020. Web. 13 Apr 2021.
Vancouver:
Bak S. Runtime Approaches to Improve the Efficiency of Hybrid and Irregular Applications. [Internet] [Doctoral dissertation]. Georgia Tech; 2020. [cited 2021 Apr 13].
Available from: http://hdl.handle.net/1853/64198.
Council of Science Editors:
Bak S. Runtime Approaches to Improve the Efficiency of Hybrid and Irregular Applications. [Doctoral Dissertation]. Georgia Tech; 2020. Available from: http://hdl.handle.net/1853/64198
28.
Ko, Jong Hwan.
Resource-aware and robust image processing for intelligent sensor systems.
Degree: PhD, Electrical and Computer Engineering, 2018, Georgia Tech
URL: http://hdl.handle.net/1853/60198
► The objective of this research is to design resource-aware and robust image processing algorithms, system architecture, and hardware implementation for intelligent image sensor systems in…
(more)
▼ The objective of this research is to design resource-aware and robust image processing algorithms, system architecture, and hardware implementation for intelligent image sensor systems in the Internet-of-Things (IoT) environment. The research explores the design of a wireless image sensor system with low-overhead pre-processing, which is integrated with a reconfigurable energy-harvesting image sensor array to implement a self-powered image sensor system. For reliable delivery of region-of-interest (ROI) under dynamic environment, the research designs low-power moving object detection with enhanced noise robustness. The system energy is further optimized by a low-power ROI-based coding scheme, whose parameters are dynamically controlled by a low-power rate controller to minimize required buffer size with minimum computational overhead. To enable machine learning based intelligent image processing at the IoT edge devices, the research proposes resource-efficient neural networks. The storage demand is reduced by compressing the neural network weights with an adaptive image encoding algorithm, and the computation demand is optimized by mapping the entire network parameters and operations into the frequency domain. To further improve the energy-efficiency and throughput of the edge device, the research explores inference partitioning of a DNN between the edge and the host platforms.
Advisors/Committee Members: Yalamanchili, Sudhakar (committee member), Raychowdhury, Arijit (committee member), Philipose, Matthai (committee member), Krishna, Tushar (committee member).
Subjects/Keywords: Image processing; Deep learning; Sensor systems
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Ko, J. H. (2018). Resource-aware and robust image processing for intelligent sensor systems. (Doctoral Dissertation). Georgia Tech. Retrieved from http://hdl.handle.net/1853/60198
Chicago Manual of Style (16th Edition):
Ko, Jong Hwan. “Resource-aware and robust image processing for intelligent sensor systems.” 2018. Doctoral Dissertation, Georgia Tech. Accessed April 13, 2021.
http://hdl.handle.net/1853/60198.
MLA Handbook (7th Edition):
Ko, Jong Hwan. “Resource-aware and robust image processing for intelligent sensor systems.” 2018. Web. 13 Apr 2021.
Vancouver:
Ko JH. Resource-aware and robust image processing for intelligent sensor systems. [Internet] [Doctoral dissertation]. Georgia Tech; 2018. [cited 2021 Apr 13].
Available from: http://hdl.handle.net/1853/60198.
Council of Science Editors:
Ko JH. Resource-aware and robust image processing for intelligent sensor systems. [Doctoral Dissertation]. Georgia Tech; 2018. Available from: http://hdl.handle.net/1853/60198
29.
Garg, Kartikay.
Near-memory primitive support and infratructure for sparse algorithm.
Degree: MS, Electrical and Computer Engineering, 2017, Georgia Tech
URL: http://hdl.handle.net/1853/58343
► This thesis introduces an approach to solving the problem of memory latency performance penalties with traditional accelerators. By introducing simple near-data-processing (NDP) accelerators for primitives…
(more)
▼ This thesis introduces an approach to solving the problem of memory latency performance
penalties with traditional accelerators. By introducing simple near-data-processing
(NDP) accelerators for primitives such as SpMV (Sparse Matrix Multiplication of Vectors)
and DGEMM (Double Precision Dense Matrix Multiplication) kernels, applications can
achieve a considerable performance boost. We evaluate our work for SuperLU application for the HPC community.
Thesis Statement: Reevaluating core primitives such as DGEMM, SCATTER, and
GATHER for 3D-stacked PIM architectures that incorporate re-configurable fabrics can
deliver multi-fold performance improvements for SUPERLU and other sparse algorithms.
Advisors/Committee Members: Yalamanchili, Sudhakar (advisor), Young, Jeffrey (advisor), Krishna, Tushar (committee member), Vuduc, Richard (committee member).
Subjects/Keywords: Processing in memory (PIM); Near data processing (NDP); 3D-stacked memory; HMC; FPGA; SuperLU
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Garg, K. (2017). Near-memory primitive support and infratructure for sparse algorithm. (Masters Thesis). Georgia Tech. Retrieved from http://hdl.handle.net/1853/58343
Chicago Manual of Style (16th Edition):
Garg, Kartikay. “Near-memory primitive support and infratructure for sparse algorithm.” 2017. Masters Thesis, Georgia Tech. Accessed April 13, 2021.
http://hdl.handle.net/1853/58343.
MLA Handbook (7th Edition):
Garg, Kartikay. “Near-memory primitive support and infratructure for sparse algorithm.” 2017. Web. 13 Apr 2021.
Vancouver:
Garg K. Near-memory primitive support and infratructure for sparse algorithm. [Internet] [Masters thesis]. Georgia Tech; 2017. [cited 2021 Apr 13].
Available from: http://hdl.handle.net/1853/58343.
Council of Science Editors:
Garg K. Near-memory primitive support and infratructure for sparse algorithm. [Masters Thesis]. Georgia Tech; 2017. Available from: http://hdl.handle.net/1853/58343
30.
Ramrakhyani, Aniruddh.
Aniruddh Ramrakhyani Thesis.
Degree: MS, Electrical and Computer Engineering, 2017, Georgia Tech
URL: http://hdl.handle.net/1853/58331
► The demise of Dennard Scaling and the continuance of Moore’s law has provided us with shrinking chip dimensions and higher on-chip transistor density at the…
(more)
▼ The demise of Dennard Scaling and the continuance of Moore’s law has provided us with shrinking chip dimensions and higher on-chip transistor density at the cost of increas- ing power density. Chips today are highly power-constrained and often operate close to their melt-down energy thresholds. To avert the thermal meltdown of chip, designers use intelligent power-gating techniques. Here, the mode of operation is to power-up only a sub- set of IP blocks at a time. In addition to the power-density problem, decreasing transistor size has lead to decreasing silicon reliability which has led to increasing instances of on- chip faults. Both these effects lead to irregular on-chip topologies that change at runtime. Chip designers and architects today face the problem of routing packets over a dynamically changing irregular topology without sacrificing performance and more importantly without running into routing deadlocks.
Another trend in the semi-conductor industry that has contibuted to the significance of this problem is the increasing use of heterogenous System-on-Chip (SoC). SoCs in most instances are tailored to the application needs. To maximise performance, these SoCs em- ploy custom-built irregular topologies to connect IP blocks. SoC designers have to to run a large number of simulations to understand the network traffic flows of the application it is being designed for. These simulation studies are carried out to ensure the absence of rout- ing deadlocks. This leads to increase in design time and consequently the time to market, leading to increase in costs and decrease in profits.
Prior works in power-gating, resiliency and SoC design domains have addressed the routing deadlock problem by constructing a spanning-tree over the irregular topology and using it either as a deadlock avoidance mechanism (spanning-tree based routing) or as a deadlock-recovery mechanism (escape-vc) to route packets. However, this spanning-tree
xi
based solutions leads to significant loss in throughput and performance as shown in this work. In addition, a new spanning-tree construction is required every time the topology changes due to a fault in or power-gating of a network element.
In this work, a new deadlock recovery framework called Static Bubble is proposed to achieve deadlock freedom in a static or dynamically changing irregular on-chip topology that doesn’t require any tree construction and thus is able to eliminate any overhead or limitations associated with the spanning-tree based solutions. Compared to the other state of the art works, static bubble provides upto 30% less latency, 4x more throughput and 50% less network EDP at less than 1% hardware overhead
Advisors/Committee Members: Krishna, Tushar (advisor), Gavrilovska, Ada (committee member), Yalamanchili, Sudhakar (committee member).
Subjects/Keywords: Deadlocks; NoC; Routing; Computer architecture; Dark Silicon; Power Gating; Resiliency; Topology
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Ramrakhyani, A. (2017). Aniruddh Ramrakhyani Thesis. (Masters Thesis). Georgia Tech. Retrieved from http://hdl.handle.net/1853/58331
Chicago Manual of Style (16th Edition):
Ramrakhyani, Aniruddh. “Aniruddh Ramrakhyani Thesis.” 2017. Masters Thesis, Georgia Tech. Accessed April 13, 2021.
http://hdl.handle.net/1853/58331.
MLA Handbook (7th Edition):
Ramrakhyani, Aniruddh. “Aniruddh Ramrakhyani Thesis.” 2017. Web. 13 Apr 2021.
Vancouver:
Ramrakhyani A. Aniruddh Ramrakhyani Thesis. [Internet] [Masters thesis]. Georgia Tech; 2017. [cited 2021 Apr 13].
Available from: http://hdl.handle.net/1853/58331.
Council of Science Editors:
Ramrakhyani A. Aniruddh Ramrakhyani Thesis. [Masters Thesis]. Georgia Tech; 2017. Available from: http://hdl.handle.net/1853/58331
◁ [1] [2] ▶
.