You searched for +publisher:"University of Texas – Austin" +contributor:("Pingali, Keshav")
.
Showing records 1 – 30 of
41 total matches.
◁ [1] [2] ▶
1.
Dhanapal, Manoj.
Design and implementation of distributed Galois.
Degree: MSin Computer Sciences, Computer Science, 2013, University of Texas – Austin
URL: http://hdl.handle.net/2152/21643
► The Galois system provides a solution to the hard problem of parallelizing irregular algorithms using amorphous data-parallelism. The present system works on the shared-memory programming…
(more)
▼ The Galois system provides a solution to the hard problem of parallelizing irregular algorithms using amorphous data-parallelism. The present system works on the shared-memory programming model. The programming model has limitations on the memory and processing power available to the application. A scalable distributed parallelization tool would give the application access to a very large amount of memory and processing power by interconnecting computers through a network.
This thesis presents the design for a distributed execution programming model for the Galois system. This distributed Galois system is capable of executing irregular graph based algorithms on a distributed environment. The API and programming model of the new distributed system has been designed to mirror that of the existing shared-memory Galois. This was done to enable existing applications on shared memory applications to run on distributed Galois with minimal porting effort. Finally, two existing test cases have been implemented on distributed Galois and shown to scale with increasing number of hosts and threads.
Advisors/Committee Members: Pingali, Keshav (advisor).
Subjects/Keywords: Galois; Parallel computing; Distributed computing
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Dhanapal, M. (2013). Design and implementation of distributed Galois. (Masters Thesis). University of Texas – Austin. Retrieved from http://hdl.handle.net/2152/21643
Chicago Manual of Style (16th Edition):
Dhanapal, Manoj. “Design and implementation of distributed Galois.” 2013. Masters Thesis, University of Texas – Austin. Accessed January 24, 2021.
http://hdl.handle.net/2152/21643.
MLA Handbook (7th Edition):
Dhanapal, Manoj. “Design and implementation of distributed Galois.” 2013. Web. 24 Jan 2021.
Vancouver:
Dhanapal M. Design and implementation of distributed Galois. [Internet] [Masters thesis]. University of Texas – Austin; 2013. [cited 2021 Jan 24].
Available from: http://hdl.handle.net/2152/21643.
Council of Science Editors:
Dhanapal M. Design and implementation of distributed Galois. [Masters Thesis]. University of Texas – Austin; 2013. Available from: http://hdl.handle.net/2152/21643

University of Texas – Austin
2.
Xie, Chao, Ph. D.
High-performance transactional storage.
Degree: PhD, Computer science, 2016, University of Texas – Austin
URL: http://hdl.handle.net/2152/42014
► Developers face a fundamental tension between performance and ease of programming when building complex applications. On the one hand, by freeing developers from having to…
(more)
▼ Developers face a fundamental tension between performance and ease of programming when building complex applications. On the one hand, by freeing developers from having to worry about concurrency and failures, ACID transactions can greatly simplify the task of building applications. These benefits, however, have a cost: in a distributed setting, ACID transactions can limit performance and make systems hard to scale in the presence of contention. On the other hand, the BASE approach can achieve higher performance by providing weakened semantics. Embracing the BASE paradigm, however, exacts its own heavy price: once one renounces consistency guarantees, it is up to developers to explicitly code in their applications the logic necessary to en- sure consistency in the presence of concurrency and faults, making this task complex and error-prone.
This dissertation aims to resolve this tension by building database systems that provide both the ease of programming of the ACID approach and the performance that the BASE approach can provide. Our approach depends on the observation that different transactions affect overall performance of applications in different ways.
Traditional ACID databases ignore this diversity: they provide a single implementation of the same uniform abstraction across all transactions. This dissertation explores two different ways to leverage the previews key observation to combine performance and ease of programming, and presents two systems, Salt and Callas, inspired by them.
Advisors/Committee Members: Alvisi, Lorenzo (advisor), Witchel, Emmett (committee member), Pingali, Keshav (committee member), Aguilera, Marcos (committee member).
Subjects/Keywords: Transaction; High-performance; Distributed database
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Xie, Chao, P. D. (2016). High-performance transactional storage. (Doctoral Dissertation). University of Texas – Austin. Retrieved from http://hdl.handle.net/2152/42014
Chicago Manual of Style (16th Edition):
Xie, Chao, Ph D. “High-performance transactional storage.” 2016. Doctoral Dissertation, University of Texas – Austin. Accessed January 24, 2021.
http://hdl.handle.net/2152/42014.
MLA Handbook (7th Edition):
Xie, Chao, Ph D. “High-performance transactional storage.” 2016. Web. 24 Jan 2021.
Vancouver:
Xie, Chao PD. High-performance transactional storage. [Internet] [Doctoral dissertation]. University of Texas – Austin; 2016. [cited 2021 Jan 24].
Available from: http://hdl.handle.net/2152/42014.
Council of Science Editors:
Xie, Chao PD. High-performance transactional storage. [Doctoral Dissertation]. University of Texas – Austin; 2016. Available from: http://hdl.handle.net/2152/42014
3.
Celik, Ahmet.
Proof engineering for large-scale verification projects.
Degree: PhD, Computer Science, 2019, University of Texas – Austin
URL: http://dx.doi.org/10.26153/tsw/5348
► Software controls many aspects of our daily lives, thus, software correctness is of utmost importance. One way to develop correct-by-construction software is by using proof…
(more)
▼ Software controls many aspects of our daily lives, thus, software correctness is of utmost importance. One way to develop correct-by-construction software is by using proof assistants, i.e., writing machine-checked proofs of correctness at the level of executable code. Although the obtained guarantees via such development are highly desirable, proof assistants are not currently well adapted to large-scale software development, and are expensive to use in terms of both time and expertise. In particular, the productivity of proof engineers is lowered by inadequate interfaces, processes, and tool support, which lone expert users may not be hindered by, but become serious problems in large-scale projects with many contributors.
This dissertation shows that research in traditional software engineering can improve productivity considerably in large-scale verification projects that use proof assistants and facilitate proliferation of formally verified software. Specifically, this dissertation, inspired by research in the software engineering area on regression testing, software evolution, and mutation testing, presents three main bodies of research with the goal to speed up proof checking and help proof engineers to evaluate the quality of their verification projects.
First, this dissertation introduces regression proof selection, a technique that tracks fine-grained dependencies between Coq definitions, propositions, and proofs, and only checks those proofs affected by changes between two revisions. We instantiate the technique in a tool dubbed iCoq. We applied iCoq to track dependencies across many revisions in several large Coq projects and measured the time savings compared to proof checking from scratch and when using Coq’s timestamp-based toolchain for incremental proof checking. Our results show that proof checking with iCoq is up to 10 times faster than the former and up to 3 times faster than the latter.
Second, this dissertation describes the design and implementation of piCoq, a set of techniques that blend the power of parallel proof checking and proof selection. piCoq can track dependencies between files, definitions, and lemmas and perform parallel checking of only those files or proofs affected by changes between two project revisions. We applied piCoq to perform regression proving over many revisions of several large open source projects. Our results indicate that proof-level parallelism and proof selection is consistently much faster than both sequential checking from scratch and sequential checking with proof selection. In particular, 4-way parallelization is up to 28.6 times faster than the former, and up to 2.8 times faster than the latter.
Third, this dissertation introduces mutation proving, a technique for analyzing quality of verification projects that use proof assistants. We implemented our technique for the Coq proof assistant in a tool dubbed mCoq. mCoq applies a set of mutation operators to Coq functions and datatypes, and then checks proofs of lemmas affected by operator application. We…
Advisors/Committee Members: Gligoric, Milos (advisor), Pingali, Keshav (committee member), Rossbach, Christopher J (committee member), Khurshid, Sarfraz (committee member).
Subjects/Keywords: Proof assistants; Verification; Parallelism; Proof engineering; Proof selection; Regression testing; Mutation testing; Coq
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Celik, A. (2019). Proof engineering for large-scale verification projects. (Doctoral Dissertation). University of Texas – Austin. Retrieved from http://dx.doi.org/10.26153/tsw/5348
Chicago Manual of Style (16th Edition):
Celik, Ahmet. “Proof engineering for large-scale verification projects.” 2019. Doctoral Dissertation, University of Texas – Austin. Accessed January 24, 2021.
http://dx.doi.org/10.26153/tsw/5348.
MLA Handbook (7th Edition):
Celik, Ahmet. “Proof engineering for large-scale verification projects.” 2019. Web. 24 Jan 2021.
Vancouver:
Celik A. Proof engineering for large-scale verification projects. [Internet] [Doctoral dissertation]. University of Texas – Austin; 2019. [cited 2021 Jan 24].
Available from: http://dx.doi.org/10.26153/tsw/5348.
Council of Science Editors:
Celik A. Proof engineering for large-scale verification projects. [Doctoral Dissertation]. University of Texas – Austin; 2019. Available from: http://dx.doi.org/10.26153/tsw/5348
4.
Yaghmazadeh, Navid.
Automated synthesis of data extraction and transformation programs.
Degree: PhD, Computer Science, 2017, University of Texas – Austin
URL: http://hdl.handle.net/2152/68138
► Due to the abundance of data in today’s data-rich world, end-users increasingly need to perform various data extraction and transformation tasks. While many of these…
(more)
▼ Due to the abundance of data in today’s data-rich world, end-users increasingly need to perform various data extraction and transformation tasks. While many of these tedious tasks can be performed in a programmatic way, most end-users lack the required programming expertise to automate them and end up spending their valuable time in manually performing various data- related tasks. The field of program synthesis aims to overcome this problem by automatically generating programs from informal specifications, such as input-output examples or natural language.
This dissertation focuses on the design and implementation of new systems for automating important classes of data transformation and extraction tasks. It introduces solutions for automating data manipulation tasks on fully- structured data formats like relational tables, or on semi-structured formats such as XML and JSON documents.
First, we describe a novel algorithm for synthesizing hierarchical data transformations from input-output examples. A key novelty of our approach is that it reduces the synthesis of tree transformations to the simpler problem of synthesizing transformations over the paths of the tree. We also describe a new and effective algorithm for learning path transformations that combines logical SMT-based reasoning with machine learning techniques based on decision trees.
Next, we present a new methodology for learning programs that migrate tree-structured documents to relational table representations from input-output examples. Our approach achieves its goal by decomposing the synthesis task to two subproblems of (A) learning the column extraction logic, and (B) learning the row extraction logic. We propose a technique for learning column extraction programs using deterministic finite automata, and a new algorithm for predicate learning which combines integer linear programing and logic minimization.
Finally, we address the problem of automating data extraction tasks from natural language. Specifically, we focus on data retrieval from relational databases and describe a novel approach for learning SQL queries from English descriptions. The method we describe is fully automatic and database-agnostic
(i.e., does not require customization for each database). Our method combines semantic parsing techniques from the NLP community with novel programming languages ideas involving probabilistic type inhabitation and automated sketch repair.
Advisors/Committee Members: Dillig, Isil (advisor), Pingali, Keshav (committee member), Mooney, Raymond (committee member), Solar-Lezama, Armando (committee member).
Subjects/Keywords: Program synthesis; Programming-by-examples; Programming-by-natural-language; Databases
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Yaghmazadeh, N. (2017). Automated synthesis of data extraction and transformation programs. (Doctoral Dissertation). University of Texas – Austin. Retrieved from http://hdl.handle.net/2152/68138
Chicago Manual of Style (16th Edition):
Yaghmazadeh, Navid. “Automated synthesis of data extraction and transformation programs.” 2017. Doctoral Dissertation, University of Texas – Austin. Accessed January 24, 2021.
http://hdl.handle.net/2152/68138.
MLA Handbook (7th Edition):
Yaghmazadeh, Navid. “Automated synthesis of data extraction and transformation programs.” 2017. Web. 24 Jan 2021.
Vancouver:
Yaghmazadeh N. Automated synthesis of data extraction and transformation programs. [Internet] [Doctoral dissertation]. University of Texas – Austin; 2017. [cited 2021 Jan 24].
Available from: http://hdl.handle.net/2152/68138.
Council of Science Editors:
Yaghmazadeh N. Automated synthesis of data extraction and transformation programs. [Doctoral Dissertation]. University of Texas – Austin; 2017. Available from: http://hdl.handle.net/2152/68138

University of Texas – Austin
5.
Dathathri, Roshan.
Compiler and runtime systems for homomorphic encryption and graph processing on distributed and heterogeneous architectures.
Degree: PhD, Computer Science, 2020, University of Texas – Austin
URL: http://dx.doi.org/10.26153/tsw/10165
► Distributed and heterogeneous architectures are tedious to program because devices such as CPUs, GPUs, and FPGAs provide different programming abstractions and may have disjoint memories,…
(more)
▼ Distributed and heterogeneous architectures are tedious to program because devices such as CPUs, GPUs, and FPGAs provide different programming abstractions and may have disjoint memories, even if they are on the same machine. In this thesis, I present compiler and runtime systems that make it easier to develop efficient programs for privacy-preserving computation and graph processing applications on such architectures. Fully Homomorphic Encryption (FHE) refers to a set of encryption schemes that allow computations on encrypted data without requiring a secret key. Recent cryptographic advances have pushed FHE into the realm of practical applications. However, programming these applications remains a huge challenge, as it requires cryptographic domain expertise to ensure correctness, security, and performance. This thesis introduces a domain-specific compiler for fully-homomorphic deep neural network (DNN) inferencing as well as a general-purpose language and compiler for fully-homomorphic computation: 1. I present CHET, a domain-specific optimizing compiler, that is designed to make the task of programming DNN inference applications using FHE easier. CHET automates many laborious and error prone programming tasks including encryption parameter selection to guarantee security and accuracy of the computation, determining efficient data layouts, and performing scheme-specific optimizations. Our evaluation of CHET on a collection of popular DNNs shows that CHET-generated programs outperform expert-tuned ones by an order of magnitude. 2. I present a new FHE language called Encrypted Vector Arithmetic (EVA), which includes an optimizing compiler that generates correct and secure FHE programs, while hiding all the complexities of the target FHE scheme. Bolstered by our optimizing compiler, programmers can develop efficient general-purpose FHE applications directly in EVA. EVA is designed to also work as an intermediate representation that can be a target for compiling higher-level domain-specific languages. To demonstrate this, we have re-targeted CHET onto EVA. Due to the novel optimizations in EVA, its programs are on average ~ 5.3x faster than those generated by the unmodified version of CHET. These languages and compilers enable a wider adoption of FHE. Applications in several areas like machine learning, bioinformatics, and security need to process and analyze very large graphs. Distributed clusters are essential in processing such graphs in reasonable time. I present a novel approach to building distributed graph analytics systems that exploits heterogeneity in processor types, partitioning policies, and programming models. The key to this approach is Gluon, a domain-specific communication-optimizing substrate. Programmers write applications in a shared-memory programming system of their choice and interface these applications with Gluon using a lightweight API. Gluon enables these programs to run on heterogeneous clusters in the bulk-synchronous parallel (BSP) model and optimizes communication in a novel way by…
Advisors/Committee Members: Pingali, Keshav (advisor), Musuvathi, Madanlal (committee member), Ramachandran, Vijaya (committee member), Rossbach, Christopher (committee member), Snir, Marc (committee member).
Subjects/Keywords: Compiler; Runtime system; Big data; Graph analytics; Distributed-memory; Heterogeneous architectures; GPUs; Communication optimizations; Homomorphic encryption; Neural networks; Privacy-preserving machine learning
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Dathathri, R. (2020). Compiler and runtime systems for homomorphic encryption and graph processing on distributed and heterogeneous architectures. (Doctoral Dissertation). University of Texas – Austin. Retrieved from http://dx.doi.org/10.26153/tsw/10165
Chicago Manual of Style (16th Edition):
Dathathri, Roshan. “Compiler and runtime systems for homomorphic encryption and graph processing on distributed and heterogeneous architectures.” 2020. Doctoral Dissertation, University of Texas – Austin. Accessed January 24, 2021.
http://dx.doi.org/10.26153/tsw/10165.
MLA Handbook (7th Edition):
Dathathri, Roshan. “Compiler and runtime systems for homomorphic encryption and graph processing on distributed and heterogeneous architectures.” 2020. Web. 24 Jan 2021.
Vancouver:
Dathathri R. Compiler and runtime systems for homomorphic encryption and graph processing on distributed and heterogeneous architectures. [Internet] [Doctoral dissertation]. University of Texas – Austin; 2020. [cited 2021 Jan 24].
Available from: http://dx.doi.org/10.26153/tsw/10165.
Council of Science Editors:
Dathathri R. Compiler and runtime systems for homomorphic encryption and graph processing on distributed and heterogeneous architectures. [Doctoral Dissertation]. University of Texas – Austin; 2020. Available from: http://dx.doi.org/10.26153/tsw/10165
6.
-2502-2946.
Theory and practice of classical matrix-matrix multiplication for hierarchical memory architectures.
Degree: PhD, Computer Science, 2018, University of Texas – Austin
URL: http://hdl.handle.net/2152/63352
► Matrix-matrix multiplication is perhaps the most important operation used as a basic building block in dense linear algebra. A computer with a hierarchical memory architectures…
(more)
▼ Matrix-matrix multiplication is perhaps the most important operation used as a
basic building block in dense linear algebra. A computer with a hierarchical memory architectures has memory that is organized in layers, with small and fast memories close to the processor, and big and slow memories further away from it.
Classical matrix-matrix multiplication is an operation particularly suited for such architectures, as it exhibits a large degree of data reuse, so expensive data movements can be amortized over a lot of computation. This dissertation advances the theory of
how to optimally reuse data during matrix-matrix multiplication on hierarchical memory architectures, and it uses this understanding to develop new practical algorithms for matrix-matrix multiplication that exhibit improved properties related to data movement.
Advisors/Committee Members: van de Geijn, Robert A. (advisor), Quintana-Ortí, Enrique S. (advisor), Fussell, Donald S (committee member), Pingali, Keshav (committee member).
Subjects/Keywords: Linear algebra; Scientific computing; High-performance; Matrix; I/O complexity
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
-2502-2946. (2018). Theory and practice of classical matrix-matrix multiplication for hierarchical memory architectures. (Doctoral Dissertation). University of Texas – Austin. Retrieved from http://hdl.handle.net/2152/63352
Note: this citation may be lacking information needed for this citation format:
Author name may be incomplete
Chicago Manual of Style (16th Edition):
-2502-2946. “Theory and practice of classical matrix-matrix multiplication for hierarchical memory architectures.” 2018. Doctoral Dissertation, University of Texas – Austin. Accessed January 24, 2021.
http://hdl.handle.net/2152/63352.
Note: this citation may be lacking information needed for this citation format:
Author name may be incomplete
MLA Handbook (7th Edition):
-2502-2946. “Theory and practice of classical matrix-matrix multiplication for hierarchical memory architectures.” 2018. Web. 24 Jan 2021.
Note: this citation may be lacking information needed for this citation format:
Author name may be incomplete
Vancouver:
-2502-2946. Theory and practice of classical matrix-matrix multiplication for hierarchical memory architectures. [Internet] [Doctoral dissertation]. University of Texas – Austin; 2018. [cited 2021 Jan 24].
Available from: http://hdl.handle.net/2152/63352.
Note: this citation may be lacking information needed for this citation format:
Author name may be incomplete
Council of Science Editors:
-2502-2946. Theory and practice of classical matrix-matrix multiplication for hierarchical memory architectures. [Doctoral Dissertation]. University of Texas – Austin; 2018. Available from: http://hdl.handle.net/2152/63352
Note: this citation may be lacking information needed for this citation format:
Author name may be incomplete
7.
Wang, Xinyu.
An efficient programming-by-example framework.
Degree: PhD, Computer Science, 2019, University of Texas – Austin
URL: http://dx.doi.org/10.26153/tsw/5854
► Due to the ubiquity of computing, programming has started to become an essential skill for an increasing number of people, including data scientists, financial analysts,…
(more)
▼ Due to the ubiquity of computing, programming has started to become an essential skill for an increasing number of people, including data scientists, financial analysts, and spreadsheet users. While it is well known that building any complex and reliable software is difficult, writing even simple scripts is challenging for novices with no formal programming background. Therefore, there is an increasing need for technology that can provide basic programming support to non-expert computer end-users.
Program synthesis, as a technique for generating programs from high-level specifications such as input-output examples, has been used to automate many real-world programming tasks in a number of application domains such as spreadsheet programming and data science. However, developing specialized synthesizers for these application domains is notoriously hard.
This dissertation aims to make the development of program synthesizers easier so that we can expand the applicability of program synthesis to more application domains. In particular, this dissertation describes a programming-by-example framework that is both generic and efficient. This framework can be applied broadly to automating tasks across different application domains. It is also efficient and achieves orders of magnitude improvement in terms of the synthesis speed compared to existing state-of-the-art techniques.
Advisors/Committee Members: Dillig, Isil (advisor), Durrett, Gregory (committee member), Pingali, Keshav (committee member), Jhala, Ranjit (committee member), Naik, Mayur (committee member).
Subjects/Keywords: Programming languages; Program synthesis
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Wang, X. (2019). An efficient programming-by-example framework. (Doctoral Dissertation). University of Texas – Austin. Retrieved from http://dx.doi.org/10.26153/tsw/5854
Chicago Manual of Style (16th Edition):
Wang, Xinyu. “An efficient programming-by-example framework.” 2019. Doctoral Dissertation, University of Texas – Austin. Accessed January 24, 2021.
http://dx.doi.org/10.26153/tsw/5854.
MLA Handbook (7th Edition):
Wang, Xinyu. “An efficient programming-by-example framework.” 2019. Web. 24 Jan 2021.
Vancouver:
Wang X. An efficient programming-by-example framework. [Internet] [Doctoral dissertation]. University of Texas – Austin; 2019. [cited 2021 Jan 24].
Available from: http://dx.doi.org/10.26153/tsw/5854.
Council of Science Editors:
Wang X. An efficient programming-by-example framework. [Doctoral Dissertation]. University of Texas – Austin; 2019. Available from: http://dx.doi.org/10.26153/tsw/5854

University of Texas – Austin
8.
Ma, Xiaoyu, Ph. D.
Large-scale transactional execution of FPGA-accelerated irregular applications.
Degree: PhD, Electrical and Computer Engineering, 2017, University of Texas – Austin
URL: http://hdl.handle.net/2152/63014
► Irregular workloads are programs organized around pointer-based data structures such as graphs. They are widely used in many fields such as computer/human network analysis, machine…
(more)
▼ Irregular workloads are programs organized around pointer-based data
structures such as graphs. They are widely used in many fields such as computer/human network analysis, machine learning, data mining, graphics, electronic design automation, and so on. Many irregular applications have massive
data-level parallelism because they iterate over a large number of graph
nodes or edges using the same operators. This dissertation proposes large-scale transactional execution, as well as an architecture to achieve this approach. We apply this approach to irregular applications by executing a large number of graph operations concurrently and as transactions to deal with potential conflicts. Before this work, large-scale transactional execution was generally considered impractical because doing so might incur too many conflicts that would negate the potential benefits of the parallelization. We propose a set of techniques to address the high conflict issue and argue that given the large
size and topology of many modern graphs, large-scale, multi-threaded, transactional execution can provide performance, energy efficiency, and programability benefits. We present challenges of realizing such an architecture, including the requirement of scalable conflict detection, livelock avoidance and transactional state overflow handling, and propose solutions. While the proposed techniques are also applicable to CPUs and ASICs, we focus on using FPGAs and implement the proposed architecture as a synthesizable FPGA RTL design. We compare our implementation in performance and energy efficiency against an Intel Haswell-based baseline platform. In addition, we perform an extensive study of various micro-architectural design choices in large-scale transactional execution architecture and evaluate their impact on performance. Finally, we explore the use of fine-grained locks to replace transactions to further improve performance.
Advisors/Committee Members: Chiou, Derek (advisor), Pingali, Keshav (committee member), Gerstlauer, Andreas (committee member), Erez, Mattan (committee member), Hofstee, Harm Peter (committee member).
Subjects/Keywords: Transactional memory; Irregular application; Graph; FPGA; High-throughput compute
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Ma, Xiaoyu, P. D. (2017). Large-scale transactional execution of FPGA-accelerated irregular applications. (Doctoral Dissertation). University of Texas – Austin. Retrieved from http://hdl.handle.net/2152/63014
Chicago Manual of Style (16th Edition):
Ma, Xiaoyu, Ph D. “Large-scale transactional execution of FPGA-accelerated irregular applications.” 2017. Doctoral Dissertation, University of Texas – Austin. Accessed January 24, 2021.
http://hdl.handle.net/2152/63014.
MLA Handbook (7th Edition):
Ma, Xiaoyu, Ph D. “Large-scale transactional execution of FPGA-accelerated irregular applications.” 2017. Web. 24 Jan 2021.
Vancouver:
Ma, Xiaoyu PD. Large-scale transactional execution of FPGA-accelerated irregular applications. [Internet] [Doctoral dissertation]. University of Texas – Austin; 2017. [cited 2021 Jan 24].
Available from: http://hdl.handle.net/2152/63014.
Council of Science Editors:
Ma, Xiaoyu PD. Large-scale transactional execution of FPGA-accelerated irregular applications. [Doctoral Dissertation]. University of Texas – Austin; 2017. Available from: http://hdl.handle.net/2152/63014

University of Texas – Austin
9.
Lavasani, Maysam.
Generating irregular data-stream accelerators : methodology and applications.
Degree: PhD, Electrical and Computer Engineering, 2015, University of Texas – Austin
URL: http://hdl.handle.net/2152/31409
► This thesis presents Gorilla++, a language and a compiler for generating customized hardware accelerators that process input streams of data. Gorilla++ uses a hierarchical programming…
(more)
▼ This thesis presents Gorilla++, a language and a compiler for generating customized hardware accelerators that process input streams of data. Gorilla++ uses a hierarchical programming model with sequential engines run in parallel and communicate through FIFO interfaces. It also incorporates offload and lock constructs in the language to support safe accesses to global resources. Beside conventional compiler optimizations for regular streaming, the programming model opens up new optimization opportunities including (i) multi-threading to share computation resources by different execution contexts inside an engine, (ii) offload-sharing to share resources between different engines, and (iii) pipe-offloading to pipeline part of a computation that is not efficiently pipelinable as a whole. Due to the dynamic nature of Gorilla++ target applications, closedform formulations are not sufficient for exploring the design space of accelerators. Instead, the design space is explored iteratively using a rule-based refinement process. In each iteration, the rules capture inefficiencies in the design, either bottlenecks or under-utilized resources, and change the design to eliminate the inefficiencies. Gorilla++ is evaluated by generating a set of FPGA-based networking and big-data accelerators. The experimental results demonstrate (i) the expressiveness and generality of Gorilla++ language, (ii) the effectiveness of Gorilla++ compiler optimizations, and (iii) the improvement in the design space exploration (DSE) using rule-based refinement process.
Advisors/Committee Members: Chiou, Derek (advisor), Abraham, Jacob (committee member), Chung, Eric (committee member), Gerstlauer, Andreas (committee member), Pingali, Keshav (committee member).
Subjects/Keywords: Hardware accelerators; High-level synthesis; Auto-refinement; Big-data
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Lavasani, M. (2015). Generating irregular data-stream accelerators : methodology and applications. (Doctoral Dissertation). University of Texas – Austin. Retrieved from http://hdl.handle.net/2152/31409
Chicago Manual of Style (16th Edition):
Lavasani, Maysam. “Generating irregular data-stream accelerators : methodology and applications.” 2015. Doctoral Dissertation, University of Texas – Austin. Accessed January 24, 2021.
http://hdl.handle.net/2152/31409.
MLA Handbook (7th Edition):
Lavasani, Maysam. “Generating irregular data-stream accelerators : methodology and applications.” 2015. Web. 24 Jan 2021.
Vancouver:
Lavasani M. Generating irregular data-stream accelerators : methodology and applications. [Internet] [Doctoral dissertation]. University of Texas – Austin; 2015. [cited 2021 Jan 24].
Available from: http://hdl.handle.net/2152/31409.
Council of Science Editors:
Lavasani M. Generating irregular data-stream accelerators : methodology and applications. [Doctoral Dissertation]. University of Texas – Austin; 2015. Available from: http://hdl.handle.net/2152/31409
10.
-3217-4051.
Fine-grained containment domains for throughput processors.
Degree: PhD, Electrical and Computer Engineering, 2015, University of Texas – Austin
URL: http://hdl.handle.net/2152/46990
► Continued scaling of semiconductor technology has made modern processors rely on large design margins to guarantee correct operation under worst case conditions. Design margins appear…
(more)
▼ Continued scaling of semiconductor technology has made modern processors rely on large design margins to guarantee correct operation under worst case conditions. Design margins appear in the form of higher supply voltage or lower clock frequency, leading to inefficiency. In practice, it is rare to observe such worst-case conditions and the processor can run at a reduced voltage or higher frequency experiencing only few infrequent errors. Recent proposals have used hardware error detectors and recovery mechanisms to detect and re- cover from these rare errors, a technique known as timing speculation. While this is effective for out-of-order processors with inherent capability to recover from misspeculation, implementing similar hardware for throughput processors such as the Graphics Processing Units (GPUs) is prohibitively costly due to the massive amount of thread context that needs to be preserved. Further- more, recovery overhead is much higher since the SIMD (Single Instruction Multiple Data) execution model of GPUs require multiple threads to roll back together in case of an error. In this dissertation, I develop a hardware/software co-design approach to enable reduced-margin operation on GPUs that overcomes the limitations of existing techniques. The proposed scheme leverages the hierarchical programming model of GPUs to provide hierarchical and uncoordinated local checkpoint-recovery. By decomposing a program into a hierarchically nested tree of code blocks which I refer to as containment domains (CDs), the pro- gram becomes amenable to automatic analysis and tuning, and an optimum trade-off can be made between preservation and recovery overhead. To aid this optimization process, an analytical model is developed to estimate the performance efficiency of a given application setting at a given error rate. With the analytical model, an exhaustive search can be performed to find the optimal solution. The tunability also allows the proposed scheme to easily adapt to a wide range of error rates making it future proof against emerging uncertainties in semiconductor design. The proposed scheme combines software and hardware components to achieve the highest efficiency in preservation, restoration, and recovery. The software components include: 1) an API and runtime that lets the programmers describe the hierarchy of containment domains within an application and preserve the state required for rollback recovery, and 2) a compiler analysis that automatically inserts preservation routines for register variables. The hardware components include: 1) a stack structure to keep track of recovery program counters (PC), 2) a set of error containment mechanisms to guarantee that no erroneous data is propagated outside of a containment domain and 3) an error reporting architecture that keeps track of affected threads and initiate recovery of them.
Advisors/Committee Members: Erez, Mattan (advisor), Parker, Mike (committee member), Pingali, Keshav (committee member), Reddi, Vijay J (committee member), Touba, Nur A (committee member).
Subjects/Keywords: Resilience; Containment domains; Computer architecture; Timing speculation; GPU; Design margin; Voltage droop; Checkpoint-recovery
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
-3217-4051. (2015). Fine-grained containment domains for throughput processors. (Doctoral Dissertation). University of Texas – Austin. Retrieved from http://hdl.handle.net/2152/46990
Note: this citation may be lacking information needed for this citation format:
Author name may be incomplete
Chicago Manual of Style (16th Edition):
-3217-4051. “Fine-grained containment domains for throughput processors.” 2015. Doctoral Dissertation, University of Texas – Austin. Accessed January 24, 2021.
http://hdl.handle.net/2152/46990.
Note: this citation may be lacking information needed for this citation format:
Author name may be incomplete
MLA Handbook (7th Edition):
-3217-4051. “Fine-grained containment domains for throughput processors.” 2015. Web. 24 Jan 2021.
Note: this citation may be lacking information needed for this citation format:
Author name may be incomplete
Vancouver:
-3217-4051. Fine-grained containment domains for throughput processors. [Internet] [Doctoral dissertation]. University of Texas – Austin; 2015. [cited 2021 Jan 24].
Available from: http://hdl.handle.net/2152/46990.
Note: this citation may be lacking information needed for this citation format:
Author name may be incomplete
Council of Science Editors:
-3217-4051. Fine-grained containment domains for throughput processors. [Doctoral Dissertation]. University of Texas – Austin; 2015. Available from: http://hdl.handle.net/2152/46990
Note: this citation may be lacking information needed for this citation format:
Author name may be incomplete

University of Texas – Austin
11.
-8499-7554.
Galois : a system for parallel execution of irregular algorithms.
Degree: PhD, Computer Sciences, 2015, University of Texas – Austin
URL: http://hdl.handle.net/2152/30536
► A programming model which allows users to program with high productivity and which produces high performance executions has been a goal for decades. This dissertation…
(more)
▼ A programming model which allows users to program with high productivity and which produces high performance executions has been a goal for decades. This dissertation makes progress towards this elusive goal by describing the design and implementation of the Galois system, a parallel programming model for shared-memory, multicore machines. Central to the design is the idea that scheduling of a program can be decoupled from the core computational operator and data structures. However, efficient programs often require application-specific scheduling to achieve best performance. To bridge this gap, an extensible and abstract scheduling policy language is proposed, which allows programmers to focus on selecting high-level scheduling policies while delegating the tedious task of implementing the policy to a scheduler synthesizer and runtime system. Implementations of deterministic and prioritized scheduling also are described. An evaluation of a well-studied benchmark suite reveals that factoring programs into operators, schedulers and data structures can produce significant performance improvements over unfactored approaches. Comparison of the Galois system with existing programming models for graph analytics shows significant performance improvements, often orders of magnitude more, due to (1) better support for the restrictive programming models of existing systems and (2) better support for more sophisticated algorithms and scheduling, which cannot be expressed in other systems.
Advisors/Committee Members: Pingali, Keshav (advisor), Alvisi, Lorenzo (committee member), Lethin, Richard (committee member), Padua, David (committee member), Witchel, Emmett (committee member).
Subjects/Keywords: Parallel programming; Multicore
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
-8499-7554. (2015). Galois : a system for parallel execution of irregular algorithms. (Doctoral Dissertation). University of Texas – Austin. Retrieved from http://hdl.handle.net/2152/30536
Note: this citation may be lacking information needed for this citation format:
Author name may be incomplete
Chicago Manual of Style (16th Edition):
-8499-7554. “Galois : a system for parallel execution of irregular algorithms.” 2015. Doctoral Dissertation, University of Texas – Austin. Accessed January 24, 2021.
http://hdl.handle.net/2152/30536.
Note: this citation may be lacking information needed for this citation format:
Author name may be incomplete
MLA Handbook (7th Edition):
-8499-7554. “Galois : a system for parallel execution of irregular algorithms.” 2015. Web. 24 Jan 2021.
Note: this citation may be lacking information needed for this citation format:
Author name may be incomplete
Vancouver:
-8499-7554. Galois : a system for parallel execution of irregular algorithms. [Internet] [Doctoral dissertation]. University of Texas – Austin; 2015. [cited 2021 Jan 24].
Available from: http://hdl.handle.net/2152/30536.
Note: this citation may be lacking information needed for this citation format:
Author name may be incomplete
Council of Science Editors:
-8499-7554. Galois : a system for parallel execution of irregular algorithms. [Doctoral Dissertation]. University of Texas – Austin; 2015. Available from: http://hdl.handle.net/2152/30536
Note: this citation may be lacking information needed for this citation format:
Author name may be incomplete

University of Texas – Austin
12.
Nedunuri, Srinivas.
Theory and techniques for synthesizing efficient breadth-first search algorithms.
Degree: PhD, Computer Science, 2012, University of Texas – Austin
URL: http://hdl.handle.net/2152/ETD-UT-2012-08-6242
► The development of efficient algorithms to solve a wide variety of combinatorial and planning problems is a significant achievement in computer science. Traditionally each algorithm…
(more)
▼ The development of efficient algorithms to solve a wide variety of combinatorial and planning problems is a significant achievement in computer science. Traditionally each algorithm is developed individually, based on flashes of insight or experience, and then (optionally) verified for correctness. While computer science has formalized the analysis and verification of algorithms, the process of algorithm development remains largely ad-hoc. The ad-hoc nature of algorithm development is especially limiting when developing algorithms for a family of related problems. Guided program synthesis is an existing methodology for systematic development of algorithms. Specific algorithms are viewed as instances of very general algorithm schemas. For example, the Global Search schema generalizes traditional branch-and-bound search, and includes both depth-first and breadth-first strategies. Algorithm development involves systematic specialization of the
algorithm schema based on problem-specific constraints to create efficient algorithms that are correct by construction, obviating the need for a separate verification step. Guided program synthesis has been applied to a wide range of algorithms, but there is still no systematic process for the synthesis of large search programs such as AI planners. Our first contribution is the specialization of Global Search to a class we call Efficient Breadth-First Search (EBFS), by incorporating dominance relations to constrain the size of the frontier of the search to be polynomially bounded. Dominance relations allow two search spaces to be compared to determine whether one dominates the other, thus allowing the dominated space to be eliminated from the search. We further show that EBFS is an effective characterization of greedy algorithms, when the breadth bound is set to one. Surprisingly, the resulting characterization is more general than the well-known characterization of greedy
algorithms, namely the Greedy Algorithm parametrized over algebraic structures called greedoids. Our second contribution is a methodology for systematically deriving dominance relations, not just for individual problems but for families of related problems. The techniques are illustrated on numerous well-known problems. Combining this with the program schema for EBFS results in efficient greedy algorithms. Our third contribution is application of the theory and methodology to the practical problem of synthesizing fast planners. Nearly all the state-of-the-art planners in the planning literature are heuristic domain-independent planners. They generally do not scale well and their space requirements also become quite prohibitive. Planners such as TLPlan that incorporate domain-specific information in the form of control rules are orders of magnitude faster. However, devising the control rules is labor-intensive task and requires domain expertise and insight. The correctness of the
rules is also not guaranteed. We introduce a method by which domain-specific…
Advisors/Committee Members: Cook, William Randall (advisor), Batory, Don (committee member), Baxter, Ira (committee member), Pingali, Keshav (committee member), Smith, Douglas R. (committee member).
Subjects/Keywords: Program synthesis; Algorithms; Formal methods; AI; Search algorithms
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Nedunuri, S. (2012). Theory and techniques for synthesizing efficient breadth-first search algorithms. (Doctoral Dissertation). University of Texas – Austin. Retrieved from http://hdl.handle.net/2152/ETD-UT-2012-08-6242
Chicago Manual of Style (16th Edition):
Nedunuri, Srinivas. “Theory and techniques for synthesizing efficient breadth-first search algorithms.” 2012. Doctoral Dissertation, University of Texas – Austin. Accessed January 24, 2021.
http://hdl.handle.net/2152/ETD-UT-2012-08-6242.
MLA Handbook (7th Edition):
Nedunuri, Srinivas. “Theory and techniques for synthesizing efficient breadth-first search algorithms.” 2012. Web. 24 Jan 2021.
Vancouver:
Nedunuri S. Theory and techniques for synthesizing efficient breadth-first search algorithms. [Internet] [Doctoral dissertation]. University of Texas – Austin; 2012. [cited 2021 Jan 24].
Available from: http://hdl.handle.net/2152/ETD-UT-2012-08-6242.
Council of Science Editors:
Nedunuri S. Theory and techniques for synthesizing efficient breadth-first search algorithms. [Doctoral Dissertation]. University of Texas – Austin; 2012. Available from: http://hdl.handle.net/2152/ETD-UT-2012-08-6242

University of Texas – Austin
13.
-9048-1017.
Exploiting long-term behavior for improved memory system performance.
Degree: PhD, Computer science, 2016, University of Texas – Austin
URL: http://hdl.handle.net/2152/42015
► Memory latency is a key bottleneck for many programs. Caching and prefetching are two popular hardware mechanisms to alleviate the impact of long memory latencies,…
(more)
▼ Memory latency is a key bottleneck for many programs. Caching and prefetching are two popular hardware mechanisms to alleviate the impact of long memory latencies, but despite decades of research, significant headroom remains. In this thesis, we show how we can significantly improve caching and prefetching by exploiting a long history of the program's behavior. Towards this end, we define new learning goals that fully exploit long-term information, and we propose history representations that make it feasible to track and manipulate long histories. Based on these insights, we advance the state-of-the-art for three important memory system optimizations. For cache replacement, where existing solutions have relied on simplistic heuristics, our solution pursues the new goal of learning from the optimal solution for past references to predict caching decisions for future references. For irregular prefetching, where previous solutions are limited in scope due to their inefficient management of long histories, our goal is to realize the previously unattainable combination of two popular learning techniques, namely address correlation and PC-localization. Finally, for regular prefetching, where recent solutions learn increasingly complex patterns, we leverage long histories to simplify the learning goal and to produce more timely and accurate prefetches. Our results are significant. For cache replacement, our solution reduces misses for memory-intensive SPEC 2006 benchmarks by 17.4% compared to 11.4% for the previous best. For irregular prefetching, our prefetcher obtains 23.1% speedup (vs. 14.1% for the previous best) with 93.7% accuracy, and it comes close to the performance of an idealized prefetcher with no resource constraints. Finally, for regular prefetching, our prefetcher improves performance by 102.3% over a baseline with no prefetching compared to the 90% speedup for the previous state-of-the-art prefetcher; our solution also incurs 10% less traffic than the previous best regular prefetcher.
Advisors/Committee Members: Lin, Yun Calvin (advisor), Burger, Doug (committee member), Fussell, Donald S (committee member), Patt, Yale N (committee member), Pingali, Keshav (committee member).
Subjects/Keywords: Caches; Replacement policy; Prefetching; Memory system
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
-9048-1017. (2016). Exploiting long-term behavior for improved memory system performance. (Doctoral Dissertation). University of Texas – Austin. Retrieved from http://hdl.handle.net/2152/42015
Note: this citation may be lacking information needed for this citation format:
Author name may be incomplete
Chicago Manual of Style (16th Edition):
-9048-1017. “Exploiting long-term behavior for improved memory system performance.” 2016. Doctoral Dissertation, University of Texas – Austin. Accessed January 24, 2021.
http://hdl.handle.net/2152/42015.
Note: this citation may be lacking information needed for this citation format:
Author name may be incomplete
MLA Handbook (7th Edition):
-9048-1017. “Exploiting long-term behavior for improved memory system performance.” 2016. Web. 24 Jan 2021.
Note: this citation may be lacking information needed for this citation format:
Author name may be incomplete
Vancouver:
-9048-1017. Exploiting long-term behavior for improved memory system performance. [Internet] [Doctoral dissertation]. University of Texas – Austin; 2016. [cited 2021 Jan 24].
Available from: http://hdl.handle.net/2152/42015.
Note: this citation may be lacking information needed for this citation format:
Author name may be incomplete
Council of Science Editors:
-9048-1017. Exploiting long-term behavior for improved memory system performance. [Doctoral Dissertation]. University of Texas – Austin; 2016. Available from: http://hdl.handle.net/2152/42015
Note: this citation may be lacking information needed for this citation format:
Author name may be incomplete

University of Texas – Austin
14.
Lee, Dongwook.
Learning-based system-level power modeling of hardware IPs.
Degree: PhD, Electrical and Computer Engineering, 2017, University of Texas – Austin
URL: http://hdl.handle.net/2152/63013
► Accurate power models for hardware components at high levels of abstraction are a critical component to enable system-level power analysis and optimization. Virtual platform prototypes…
(more)
▼ Accurate power models for hardware components at high levels of abstraction are a critical component to enable system-level power analysis and optimization. Virtual platform prototypes are widely utilized to support early system-level design space exploration. There is, however, a lack of accurate and fast power models of hardware components at such high-levels of abstraction.
In this dissertation, we present novel learning‑based approaches for extending fast functional simulation models of white-, gray-, and black-box custom hardware intellectual property components (IPs) with accurate power estimates. Depending on the observability, we extend high-level functional models with the capability to capture data-dependent resource, block, or I/O activity without a significant loss in simulation speed. We further leverage state-of-the-art machine learning techniques to synthesize abstract power models that can predict cycle-, block-, and invocation-level power from low-level hardware implementations, where we introduce novel structural decomposition techniques to reduce model complexities and increase estimation accuracy.
Our white-box approach integrates with existing high-level synthesis (HLS) tools to automatically extract resource mapping information, which is used to trace data-dependent resource-level activity and drive a cycle-accurate online power-performance model during functional simulation. Our gray-box approach supports power estimation at coarser basic block granularity. It uses only limited information about block inputs and outputs to extract light-weight block-level activity from a functional simulation and drive a basic block-level power model that utilizes a control flow decomposition to improve accuracy and speed. It is faster than cycle-level models, while providing a finer granularity than invocation-level models, which allows to further navigate accuracy and speed trade-offs. We finally propose a novel approach for extending behavioral models of black-box hardware IPs with an invocation-level power estimate. Our black-box model only uses input and output history to track data-dependent pipeline behavior, where we introduce a specialized ensemble learning that is composed out of individually selected cycle-by-cycle models with reduced complexity and increased accuracy. The proposed approaches are fully automated by integrating with existing, commercial HLS tools for custom hardware synthesized by HLS. Results of applying our approaches to various industrial‑strength design examples show that our power models can predict cycle‑, basic block-, and invocation-level power consumption to within 10%, 9%, and 3% of a commercial gate-level power estimation tool, respectively, all while running at several order of magnitude faster speeds of 1-10Mcycles/sec.
Advisors/Committee Members: Gerstlauer, Andreas, 1970- (advisor), Abraham, Jacob (committee member), John, Lizy K. (committee member), Pingali, Keshav (committee member), Kim, Taemin (committee member).
Subjects/Keywords: Power estimation; High-level synthesis
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Lee, D. (2017). Learning-based system-level power modeling of hardware IPs. (Doctoral Dissertation). University of Texas – Austin. Retrieved from http://hdl.handle.net/2152/63013
Chicago Manual of Style (16th Edition):
Lee, Dongwook. “Learning-based system-level power modeling of hardware IPs.” 2017. Doctoral Dissertation, University of Texas – Austin. Accessed January 24, 2021.
http://hdl.handle.net/2152/63013.
MLA Handbook (7th Edition):
Lee, Dongwook. “Learning-based system-level power modeling of hardware IPs.” 2017. Web. 24 Jan 2021.
Vancouver:
Lee D. Learning-based system-level power modeling of hardware IPs. [Internet] [Doctoral dissertation]. University of Texas – Austin; 2017. [cited 2021 Jan 24].
Available from: http://hdl.handle.net/2152/63013.
Council of Science Editors:
Lee D. Learning-based system-level power modeling of hardware IPs. [Doctoral Dissertation]. University of Texas – Austin; 2017. Available from: http://hdl.handle.net/2152/63013

University of Texas – Austin
15.
Whang, Joyce Jiyoung.
Overlapping community detection in massive social networks.
Degree: PhD, Computer science, 2015, University of Texas – Austin
URL: http://hdl.handle.net/2152/33272
► Massive social networks have become increasingly popular in recent years. Community detection is one of the most important techniques for the analysis of such complex…
(more)
▼ Massive social networks have become increasingly popular in recent years. Community detection is one of the most important techniques for the analysis of such complex networks. A community is a set of cohesive vertices that has more connections inside the set than outside. In many social and information networks, these communities naturally overlap. For instance, in a social network, each vertex in a graph corresponds to an individual who usually participates in multiple communities. In this thesis, we propose scalable overlapping community detection algorithms that effectively identify high quality overlapping communities in various real-world networks.
We first develop an efficient overlapping community detection algorithm using a seed set expansion approach. The key idea of this algorithm is to find good seeds and then greedily expand these seeds using a personalized PageRank clustering scheme. Experimental results show that our algorithm significantly outperforms other state-of-the-art overlapping community detection methods in terms of run time, cohesiveness of communities, and ground-truth accuracy.
To develop more principled methods, we formulate the overlapping community detection problem as a non-exhaustive, overlapping graph clustering problem where clusters are allowed to overlap with each other, and some nodes are allowed to be outside of any cluster. To tackle this non-exhaustive, overlapping clustering problem, we propose a simple and intuitive objective function that captures the issues of overlap and non-exhaustiveness in a unified manner. To optimize the objective, we develop not only fast iterative algorithms but also more sophisticated algorithms using a low-rank semidefinite programming technique. Our experimental results show that the new objective and the algorithms are effective in finding ground-truth clusterings that have varied overlap and non-exhaustiveness.
We extend our non-exhaustive, overlapping clustering techniques to co-clustering where the goal is to simultaneously identify a clustering of the rows as well as the columns of a data matrix. As an example application, consider recommender systems where users have ratings on items. This can be represented by a bipartite graph where users and items are denoted by two different types of nodes, and the ratings are denoted by weighted edges between the users and the items. In this case, co-clustering would be a simultaneous clustering of users and items. We propose a new co-clustering objective function and an efficient co-clustering algorithm that is able to identify overlapping clusters as well as outliers on both types of the nodes in the bipartite graph. We show that our co-clustering algorithm is able to effectively capture the underlying co-clustering structure of the data, which results in boosting the performance of a standard one-dimensional clustering.
Finally, we study the design of parallel data-driven algorithms, which enables us to further increase the scalability of our overlapping community detection algorithms. Using…
Advisors/Committee Members: Dhillon, Inderjit S. (advisor), Grauman, Kristen (committee member), Mooney, Raymond J (committee member), Pingali, Keshav (committee member), Gleich, David F (committee member).
Subjects/Keywords: Community detection; Clustering; Social networks; Overlapping communities; Overlapping clusters; Non-exhaustive clustering; Seed expansion; K-means; Semidefinite programming; Co-clustering; PageRank; Data-driven algorithm; Scalable computing
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Whang, J. J. (2015). Overlapping community detection in massive social networks. (Doctoral Dissertation). University of Texas – Austin. Retrieved from http://hdl.handle.net/2152/33272
Chicago Manual of Style (16th Edition):
Whang, Joyce Jiyoung. “Overlapping community detection in massive social networks.” 2015. Doctoral Dissertation, University of Texas – Austin. Accessed January 24, 2021.
http://hdl.handle.net/2152/33272.
MLA Handbook (7th Edition):
Whang, Joyce Jiyoung. “Overlapping community detection in massive social networks.” 2015. Web. 24 Jan 2021.
Vancouver:
Whang JJ. Overlapping community detection in massive social networks. [Internet] [Doctoral dissertation]. University of Texas – Austin; 2015. [cited 2021 Jan 24].
Available from: http://hdl.handle.net/2152/33272.
Council of Science Editors:
Whang JJ. Overlapping community detection in massive social networks. [Doctoral Dissertation]. University of Texas – Austin; 2015. Available from: http://hdl.handle.net/2152/33272

University of Texas – Austin
16.
Sui, Xin, Ph.D.
Principled control of approximate programs.
Degree: PhD, Computer science, 2015, University of Texas – Austin
URL: http://hdl.handle.net/2152/43722
► In conventional computing, most programs are treated as implementations of mathematical functions for which there is an exact output that must computed from a given…
(more)
▼ In conventional computing, most programs are treated as implementations of mathematical functions for which there is an exact output that must computed from a given input. However, in many problem domains, it is sufficient to produce some approximation of this output. For example, when rendering a scene in graphics, it is acceptable to take computational short-cuts if human beings cannot tell the difference in the rendered scene. In other problem domains like machine learning, programs are often implementations of heuristic approaches to solving problems and therefore already compute approximate solutions to the original problem.
This is the key insight for the new research area, approximate computing, which attempts to trade-off such approximations against the cost of computational resources such as program execution time, energy consumption, and memory usage. We believe that approximate computing is an important step towards a more fundamental and comprehensive goal that we call information-efficiency. Current applications compute more information (bits) than are needed to produce their outputs, and since producing and transporting bits of information inside a computer requires energy/computation time/memory usage, information-inefficient computing leads directly to resources inefficiency.
Although there is now a fairly large literature on approximate computing, system researchers have focused mostly on what we can call the forward problem; that is, they have explored different ways in both hardware and software to introduce approximations in a program and have demonstrated that these approximations can enable significant execution speedups and energy savings with some quality degradation of the result. However, these efforts do not provide any guarantee on the amount of the quality degradation. Since the acceptable amount of degradation usually depends on the scenario in which the application is deployed, it is very important to be able to control the degree of approximation. In this dissertation, we refer to this problem as the inverse problem. Relatively little is known about how to solve the inverse problem in a disciplined way.
This dissertation makes two contributions towards solving the inverse problem. First, we investigate a large set of approximate algorithms from a variety of domains in order to understand how approximation is used in real-world applications. From this investigation, we determine that many approximate programs are tunable approximate programs. Tunable approximate programs have one or more parameters called knobs that can be changed to vary the quality of the output of the approximate computation as well as the corresponding cost. For example, an iterative linear equation solver can vary the number of iterations to trade quality of the solution versus the execution time, a Monte Carlo path tracer can change the number of sampling light paths to trade the quality of the resulting image against execution time, etc. Tunable approximate programs provide many opportunities for trading…
Advisors/Committee Members: Pingali, Keshav (advisor), Chiou, Derek (committee member), Dhillon, Inderjit (committee member), Fussell, Donald S. (committee member), Ramachandran, Vijaya (committee member).
Subjects/Keywords: Approximate computing; Error model; Cost model; Tunable approximate algorithms; Energy efficiency; Information efficiency; Systematic; Control; Tunable approximate programs
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Sui, Xin, P. D. (2015). Principled control of approximate programs. (Doctoral Dissertation). University of Texas – Austin. Retrieved from http://hdl.handle.net/2152/43722
Chicago Manual of Style (16th Edition):
Sui, Xin, Ph D. “Principled control of approximate programs.” 2015. Doctoral Dissertation, University of Texas – Austin. Accessed January 24, 2021.
http://hdl.handle.net/2152/43722.
MLA Handbook (7th Edition):
Sui, Xin, Ph D. “Principled control of approximate programs.” 2015. Web. 24 Jan 2021.
Vancouver:
Sui, Xin PD. Principled control of approximate programs. [Internet] [Doctoral dissertation]. University of Texas – Austin; 2015. [cited 2021 Jan 24].
Available from: http://hdl.handle.net/2152/43722.
Council of Science Editors:
Sui, Xin PD. Principled control of approximate programs. [Doctoral Dissertation]. University of Texas – Austin; 2015. Available from: http://hdl.handle.net/2152/43722

University of Texas – Austin
17.
Mir arabbaygi, Siavash.
Novel scalable approaches for multiple sequence alignment and phylogenomic reconstruction.
Degree: PhD, Computer science, 2015, University of Texas – Austin
URL: http://hdl.handle.net/2152/31377
► The amount of biological sequence data is increasing rapidly, a promising development that would transform biology if we can develop methods that can analyze large-scale…
(more)
▼ The amount of biological sequence data is increasing rapidly, a promising development that would transform biology if we can develop methods that can analyze large-scale data efficiently and accurately. A fundamental question in evolutionary biology is building the tree of life: a reconstruction of relationships between organisms in evolutionary time. Reconstructing phylogenetic trees from molecular data is an optimization problem that involves many steps. In this dissertation, we argue that to answer long-standing phylogenetic questions with large-scale data, several challenges need to be addressed in various steps of the pipeline. One challenges is aligning large number of sequences so that evolutionarily related positions in all sequences are put in the same column. Constructing alignments is necessary for phylogenetic reconstruction, but also for many other types of evolutionary analyses. In response to this challenge, we introduce PASTA, a scalable and accurate algorithm that can align datasets with up to a million sequences. A second challenge is related to the interesting fact that various parts of the genome can have different evolutionary histories. Reconstructing a species tree from genome-scale data needs to account for these differences. A main approach for species tree reconstruction is to first reconstruct a set of ``gene trees'' from different parts of the genome, and to then summarize these gene trees into a single species tree. We argue that this approach can suffer from two challenges: reconstruction of individual gene trees from limited data can be plagued by estimation error, which translates to errors in the species tree, and also, methods that summarize gene trees are not scalable or accurate enough under some conditions. To address the first challenge, we introduce statistical binning, a method that re-estimates gene trees by grouping them into bins. We show that binning improves gene tree accuracy, and consequently the species tree accuracy. To address the second challenge, we introduce ASTRAL, a new summary method that can run on a thousand genes and a thousand species in a day and has outstanding accuracy. We show that the development of these methods has enabled biological analyses that were otherwise not possible.
Advisors/Committee Members: Pingali, Keshav (advisor), Warnow, Tandy, 1955- (advisor), Hillis, David (committee member), Gosh, Joydeep (committee member), Berger, Bonnie (committee member), Mooney, Ray (committee member).
Subjects/Keywords: Phylogenetics; Multi-species coalescent model; Multiple sequence alignment; Statistical binning; Species tree estimation
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Mir arabbaygi, S. (2015). Novel scalable approaches for multiple sequence alignment and phylogenomic reconstruction. (Doctoral Dissertation). University of Texas – Austin. Retrieved from http://hdl.handle.net/2152/31377
Chicago Manual of Style (16th Edition):
Mir arabbaygi, Siavash. “Novel scalable approaches for multiple sequence alignment and phylogenomic reconstruction.” 2015. Doctoral Dissertation, University of Texas – Austin. Accessed January 24, 2021.
http://hdl.handle.net/2152/31377.
MLA Handbook (7th Edition):
Mir arabbaygi, Siavash. “Novel scalable approaches for multiple sequence alignment and phylogenomic reconstruction.” 2015. Web. 24 Jan 2021.
Vancouver:
Mir arabbaygi S. Novel scalable approaches for multiple sequence alignment and phylogenomic reconstruction. [Internet] [Doctoral dissertation]. University of Texas – Austin; 2015. [cited 2021 Jan 24].
Available from: http://hdl.handle.net/2152/31377.
Council of Science Editors:
Mir arabbaygi S. Novel scalable approaches for multiple sequence alignment and phylogenomic reconstruction. [Doctoral Dissertation]. University of Texas – Austin; 2015. Available from: http://hdl.handle.net/2152/31377

University of Texas – Austin
18.
Gopinath, Divya.
Systematic techniques for more effective fault localization and program repair.
Degree: PhD, Electrical and Computer engineering, 2015, University of Texas – Austin
URL: http://hdl.handle.net/2152/33386
► Debugging faulty code is a tedious process that is often quite expensive and can require much manual effort. Developers typically perform debugging in two key…
(more)
▼ Debugging faulty code is a tedious process that is often quite expensive and can require much manual effort. Developers typically perform debugging in two key steps: (1) fault localization, i.e., identifying the location of faulty line(s) of code; and (2) program repair, i.e., modifying the code to remove the fault(s). Automating debugging to reduce its cost has been the focus of a number of research projects during the last decade, which have introduced a variety of techniques.
However, existing techniques suffer from two basic limitations. One, they lack accuracy to handle real programs. Two, they focus on automating only one of the two key steps, thereby leaving the other key step to the developer.
Our thesis is that an approach that integrates systematic search based on state-of-the-art constraint solvers with techniques to analyze artifacts that describe application specific properties and behaviors, provides the basis for developing more effective debugging techniques. We focus on faults in programs that operate on structurally complex inputs, such as heap-allocated data or relational databases.
Our approach lays the foundation for a unified framework for localization and repair of faults in programs. We embody our thesis in a suite of integrated techniques based on propositional satisfiability solving, correctness specifications analysis, test-spectra analysis, and rule-learning algorithms from machine learning, implement them as a prototype tool-set, and evaluate them using several subject programs.
Advisors/Committee Members: Khurshid, Sarfraz (advisor), Perry, Dewayne (committee member), Pingali, Keshav (committee member), Julien, Christine (committee member), Bias, Randolph (committee member).
Subjects/Keywords: Software debugging; Program repair; Fault localization; SAT; Correctness specifications; Classifier learning
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Gopinath, D. (2015). Systematic techniques for more effective fault localization and program repair. (Doctoral Dissertation). University of Texas – Austin. Retrieved from http://hdl.handle.net/2152/33386
Chicago Manual of Style (16th Edition):
Gopinath, Divya. “Systematic techniques for more effective fault localization and program repair.” 2015. Doctoral Dissertation, University of Texas – Austin. Accessed January 24, 2021.
http://hdl.handle.net/2152/33386.
MLA Handbook (7th Edition):
Gopinath, Divya. “Systematic techniques for more effective fault localization and program repair.” 2015. Web. 24 Jan 2021.
Vancouver:
Gopinath D. Systematic techniques for more effective fault localization and program repair. [Internet] [Doctoral dissertation]. University of Texas – Austin; 2015. [cited 2021 Jan 24].
Available from: http://hdl.handle.net/2152/33386.
Council of Science Editors:
Gopinath D. Systematic techniques for more effective fault localization and program repair. [Doctoral Dissertation]. University of Texas – Austin; 2015. Available from: http://hdl.handle.net/2152/33386

University of Texas – Austin
19.
-5275-5590.
Elixir: synthesis of parallel irregular algorithms.
Degree: PhD, Computer Science, 2015, University of Texas – Austin
URL: http://hdl.handle.net/2152/33278
► Algorithms in new application areas like machine learning and data analytics usually operate on unstructured sparse graphs. Writing efficient parallel code to implement these algorithms…
(more)
▼ Algorithms in new application areas like machine learning and data analytics usually operate on unstructured sparse graphs. Writing efficient parallel code to implement these algorithms is very challenging for a number of reasons.
First, there may be many algorithms to solve a problem and each algorithm may have many implementations. Second, synchronization, which is necessary for correct parallel execution, introduces potential problems such as data-races and deadlocks. These issues interact in subtle ways, making the best solution dependent both on the parallel platform and on properties of the input graph. Consequently, implementing and selecting the best parallel solution can be a daunting task for non-experts, since we have few performance models for predicting the performance of parallel sparse graph programs on parallel hardware.
This dissertation presents a synthesis methodology and a system, Elixir, that addresses these problems by (i) allowing programmers to specify solutions at a high level of abstraction, and (ii) generating many parallel implementations automatically and using search to find the best one. An Elixir specification consists of a set of operators capturing the main algorithm logic and a schedule specifying how to efficiently apply the operators. Elixir employs sophisticated automated reasoning to merge these two components, and uses techniques based on automated planning to insert synchronization and synthesize efficient parallel code.
Experimental evaluation of our approach demonstrates that the performance of the Elixir generated code is competitive to, and can even outperform, hand-optimized code written by expert programmers for many interesting graph benchmarks.
Advisors/Committee Members: Pingali, Keshav (advisor), Misra, Jayadev (committee member), Batory, Don (committee member), Cook, William (committee member), Sagiv, Mooly (committee member), Gulwani, Sumit (committee member).
Subjects/Keywords: Parallel programming; Programming languages; Compilers; Static analysis; Synthesis
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
-5275-5590. (2015). Elixir: synthesis of parallel irregular algorithms. (Doctoral Dissertation). University of Texas – Austin. Retrieved from http://hdl.handle.net/2152/33278
Note: this citation may be lacking information needed for this citation format:
Author name may be incomplete
Chicago Manual of Style (16th Edition):
-5275-5590. “Elixir: synthesis of parallel irregular algorithms.” 2015. Doctoral Dissertation, University of Texas – Austin. Accessed January 24, 2021.
http://hdl.handle.net/2152/33278.
Note: this citation may be lacking information needed for this citation format:
Author name may be incomplete
MLA Handbook (7th Edition):
-5275-5590. “Elixir: synthesis of parallel irregular algorithms.” 2015. Web. 24 Jan 2021.
Note: this citation may be lacking information needed for this citation format:
Author name may be incomplete
Vancouver:
-5275-5590. Elixir: synthesis of parallel irregular algorithms. [Internet] [Doctoral dissertation]. University of Texas – Austin; 2015. [cited 2021 Jan 24].
Available from: http://hdl.handle.net/2152/33278.
Note: this citation may be lacking information needed for this citation format:
Author name may be incomplete
Council of Science Editors:
-5275-5590. Elixir: synthesis of parallel irregular algorithms. [Doctoral Dissertation]. University of Texas – Austin; 2015. Available from: http://hdl.handle.net/2152/33278
Note: this citation may be lacking information needed for this citation format:
Author name may be incomplete

University of Texas – Austin
20.
-1624-1733.
Exploiting structure in large-scale optimization for machine learning.
Degree: PhD, Computer science, 2015, University of Texas – Austin
URL: http://hdl.handle.net/2152/31381
► With an immense growth of data, there is a great need for solving large-scale machine learning problems. Classical optimization algorithms usually cannot scale up due…
(more)
▼ With an immense growth of data, there is a great need for solving large-scale machine learning problems. Classical optimization algorithms usually cannot scale up due to huge amount of data and/or model parameters. In this thesis, we will show that the scalability issues can often be resolved by exploiting three types of structure in machine learning problems: problem structure, model structure, and data distribution. This central idea can be applied to many machine learning problems. In this thesis, we will describe in detail how to exploit structure for kernel classification and regression, matrix factorization for recommender systems, and structure learning for graphical models. We further provide comprehensive theoretical analysis for the proposed algorithms to show both local and global convergent rate for a family of in-exact first-order and second-order optimization methods.
Advisors/Committee Members: Dhillon, Inderjit S. (advisor), Ravikumar, Pradeep (committee member), Pingali, Keshav (committee member), Tewari, Ambuj (committee member), Wright, Stephen J (committee member).
Subjects/Keywords: Machine learning; Optimization
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
-1624-1733. (2015). Exploiting structure in large-scale optimization for machine learning. (Doctoral Dissertation). University of Texas – Austin. Retrieved from http://hdl.handle.net/2152/31381
Note: this citation may be lacking information needed for this citation format:
Author name may be incomplete
Chicago Manual of Style (16th Edition):
-1624-1733. “Exploiting structure in large-scale optimization for machine learning.” 2015. Doctoral Dissertation, University of Texas – Austin. Accessed January 24, 2021.
http://hdl.handle.net/2152/31381.
Note: this citation may be lacking information needed for this citation format:
Author name may be incomplete
MLA Handbook (7th Edition):
-1624-1733. “Exploiting structure in large-scale optimization for machine learning.” 2015. Web. 24 Jan 2021.
Note: this citation may be lacking information needed for this citation format:
Author name may be incomplete
Vancouver:
-1624-1733. Exploiting structure in large-scale optimization for machine learning. [Internet] [Doctoral dissertation]. University of Texas – Austin; 2015. [cited 2021 Jan 24].
Available from: http://hdl.handle.net/2152/31381.
Note: this citation may be lacking information needed for this citation format:
Author name may be incomplete
Council of Science Editors:
-1624-1733. Exploiting structure in large-scale optimization for machine learning. [Doctoral Dissertation]. University of Texas – Austin; 2015. Available from: http://hdl.handle.net/2152/31381
Note: this citation may be lacking information needed for this citation format:
Author name may be incomplete

University of Texas – Austin
21.
-7791-5469.
Performance, power, and confidence modeling of digital designs.
Degree: PhD, Electrical and Computer engineering, 2015, University of Texas – Austin
URL: http://hdl.handle.net/2152/31420
► This dissertation presents three modeling methodologies. The first methodology constructs power models for system-on-chip (SoC) designs. The modeling approach uses a gate-level netlist description of…
(more)
▼ This dissertation presents three modeling methodologies. The first methodology constructs power models for system-on-chip (SoC) designs. The modeling approach uses a gate-level netlist description of the design and training data to automatically select a handful of nets to use as model parameters. The resulting models use the cycle-by-cycle values of the chosen nets to predict cycle-by-cycle power consumption. These models are compact, fast to evaluate, and accurate. The second methodology constructs confidence models to complement the SoC power models. One confidence model is constructed for each power model. At run-time, each confidence model is evaluated in parallel with its corresponding power model. For each power prediction made, the confidence model generates a confidence value. These confidence values provide feedback to the model user that can be used to predict the power prediction's expected error. The third methodology constructs performance and power models for general-purpose computation on graphics processing units (GPGPU). The approach uses K-means clustering and neural networks to construct a model that can predict the scaling behavior of perviously unseen kernels across a range of graphics processing unit (GPU) hardware configurations. The model is capable of predicting scalability across three GPU parameters: core frequency, memory frequency, and compute unit count. The SoC power and confidence models were evaluated using the Rocket Core, which is a pipelined processor that implements the RISC-V Instruction Set Architecture (ISA), and benchmarks from the MiBench suite. Across all modules modeled and benchmarks run, the power models achieved an average cycle-by-cycle prediction error of 5.8%. An average confidence and predicted power value was generated for each benchmark. Correlating these two values resulted in a 0.77 correlation coefficient. The GPGPU scalability models were evaluated against real hardware measurements. The GPGPU performance and power scalability models were shown to be accurate to within 15% and 10%, respectively.
Advisors/Committee Members: Chiou, Derek (advisor), Patt, Yale (committee member), Abraham, Jacob (committee member), Gerstlauer, Andreas (committee member), Pingali, Keshav (committee member), Kishinevsky, Michael (committee member).
Subjects/Keywords: Power modeling; Confidence modeling; Machine learning
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
-7791-5469. (2015). Performance, power, and confidence modeling of digital designs. (Doctoral Dissertation). University of Texas – Austin. Retrieved from http://hdl.handle.net/2152/31420
Note: this citation may be lacking information needed for this citation format:
Author name may be incomplete
Chicago Manual of Style (16th Edition):
-7791-5469. “Performance, power, and confidence modeling of digital designs.” 2015. Doctoral Dissertation, University of Texas – Austin. Accessed January 24, 2021.
http://hdl.handle.net/2152/31420.
Note: this citation may be lacking information needed for this citation format:
Author name may be incomplete
MLA Handbook (7th Edition):
-7791-5469. “Performance, power, and confidence modeling of digital designs.” 2015. Web. 24 Jan 2021.
Note: this citation may be lacking information needed for this citation format:
Author name may be incomplete
Vancouver:
-7791-5469. Performance, power, and confidence modeling of digital designs. [Internet] [Doctoral dissertation]. University of Texas – Austin; 2015. [cited 2021 Jan 24].
Available from: http://hdl.handle.net/2152/31420.
Note: this citation may be lacking information needed for this citation format:
Author name may be incomplete
Council of Science Editors:
-7791-5469. Performance, power, and confidence modeling of digital designs. [Doctoral Dissertation]. University of Texas – Austin; 2015. Available from: http://hdl.handle.net/2152/31420
Note: this citation may be lacking information needed for this citation format:
Author name may be incomplete

University of Texas – Austin
22.
Yu, Hsiang-Fu.
Scalable algorithms for latent variable models in machine learning.
Degree: PhD, Computer science, 2016, University of Texas – Austin
URL: http://hdl.handle.net/2152/41635
► Latent variable modeling (LVM) is a popular approach in many machine learning applications, such as recommender systems and topic modeling, due to its ability to…
(more)
▼ Latent variable modeling (LVM) is a popular approach in many machine learning applications, such as recommender systems and topic modeling, due to its ability to succinctly represent data, even in the presence of several missing entries. Existing learning methods for LVMs, while attractive, are infeasible for the large-scale datasets required in modern big data applications. In addition, such applications often come with various types of side information such as the text description of items and the social network among users in a recommender system. In this thesis, we present scalable learning algorithms for a wide range of latent variable models such as low-rank matrix factorization and latent Dirichlet allocation. We also develop simple but effective techniques to extend existing LVMs to exploit various types of side information and make better predictions in many machine learning applications such as recommender systems, multi-label learning, and high-dimensional time-series prediction. In addition, we also propose a novel approach for the maximum inner product search problem to accelerate the prediction phase of many latent variable models.
Advisors/Committee Members: Dhillon, Inderjit S. (advisor), Pingali, Keshav (committee member), Qiu, Lili (committee member), Vishwananthan, S.V.N. (committee member), Lin, Chih-Jen (committee member).
Subjects/Keywords: Latent variable modeling; Matrix factorization; Algorithms; Data
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Yu, H. (2016). Scalable algorithms for latent variable models in machine learning. (Doctoral Dissertation). University of Texas – Austin. Retrieved from http://hdl.handle.net/2152/41635
Chicago Manual of Style (16th Edition):
Yu, Hsiang-Fu. “Scalable algorithms for latent variable models in machine learning.” 2016. Doctoral Dissertation, University of Texas – Austin. Accessed January 24, 2021.
http://hdl.handle.net/2152/41635.
MLA Handbook (7th Edition):
Yu, Hsiang-Fu. “Scalable algorithms for latent variable models in machine learning.” 2016. Web. 24 Jan 2021.
Vancouver:
Yu H. Scalable algorithms for latent variable models in machine learning. [Internet] [Doctoral dissertation]. University of Texas – Austin; 2016. [cited 2021 Jan 24].
Available from: http://hdl.handle.net/2152/41635.
Council of Science Editors:
Yu H. Scalable algorithms for latent variable models in machine learning. [Doctoral Dissertation]. University of Texas – Austin; 2016. Available from: http://hdl.handle.net/2152/41635

University of Texas – Austin
23.
-5112-1839.
Lightweight offload engines for worklist management and worklist-directed prefetching.
Degree: PhD, Electrical and Computer Engineering, 2018, University of Texas – Austin
URL: http://hdl.handle.net/2152/68554
► The importance of irregular applications such as graph analytics is rapidly growing with the rise of Big Data. However, parallel graph workloads tend to perform…
(more)
▼ The importance of irregular applications such as graph analytics is rapidly growing with the rise of Big Data. However, parallel graph workloads tend to perform poorly on general-purpose chip multiprocessors (CMPs) due to poor cache locality, low compute intensity, frequent synchronization, uneven task sizes, and dynamic task generation. At high thread counts, execution time is dominated by worklist synchronization overhead and cache misses. Researchers have proposed hardware worklist accelerators to address scheduling costs, but these proposals often harden a specific scheduling policy and do not address high cache miss rates.
This thesis presents Minnow, a technique that addresses these bottlenecks by augmenting each core in a CMP with a memory throughput-optimized lightweight engine connected through an accelerator interface. These engines offload worklist operations from worker threads, reducing synchronization costs and improving scalability. The engines also perform worklist-directed prefetching, a software prefetching technique that exploits knowledge of upcoming tasks to perform nearly perfectly accurate and timely prefetch operations.
In this thesis, we first characterize several graph applications within a popular graph analytics framework to determine their performance and bottlenecks. Next, Minnow and worklist-directed prefetching are discussed in detail, including the Minnow accelerator interface, microarchitecture, and prefetch flow control mechanism. Finally, the benefits of Minnow and worklist-directed prefetching are evaluated within a cycle-accurate microarchitectural simulator.
Advisors/Committee Members: Chiou, Derek (advisor), Erez, Mattan (advisor), Gerstlauer, Andreas (committee member), Pingali, Keshav (committee member), Khubaib, Khubaib (committee member).
Subjects/Keywords: Computer architecture; Graph algorithms; Parallel processors; Parallel programming; Prefetching; Accelerators; Scheduling; Task partitioning
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
-5112-1839. (2018). Lightweight offload engines for worklist management and worklist-directed prefetching. (Doctoral Dissertation). University of Texas – Austin. Retrieved from http://hdl.handle.net/2152/68554
Note: this citation may be lacking information needed for this citation format:
Author name may be incomplete
Chicago Manual of Style (16th Edition):
-5112-1839. “Lightweight offload engines for worklist management and worklist-directed prefetching.” 2018. Doctoral Dissertation, University of Texas – Austin. Accessed January 24, 2021.
http://hdl.handle.net/2152/68554.
Note: this citation may be lacking information needed for this citation format:
Author name may be incomplete
MLA Handbook (7th Edition):
-5112-1839. “Lightweight offload engines for worklist management and worklist-directed prefetching.” 2018. Web. 24 Jan 2021.
Note: this citation may be lacking information needed for this citation format:
Author name may be incomplete
Vancouver:
-5112-1839. Lightweight offload engines for worklist management and worklist-directed prefetching. [Internet] [Doctoral dissertation]. University of Texas – Austin; 2018. [cited 2021 Jan 24].
Available from: http://hdl.handle.net/2152/68554.
Note: this citation may be lacking information needed for this citation format:
Author name may be incomplete
Council of Science Editors:
-5112-1839. Lightweight offload engines for worklist management and worklist-directed prefetching. [Doctoral Dissertation]. University of Texas – Austin; 2018. Available from: http://hdl.handle.net/2152/68554
Note: this citation may be lacking information needed for this citation format:
Author name may be incomplete
24.
Hung, Wei-Lun.
Asynchronous automatic-signal monitors with multi-object synchronization.
Degree: PhD, Electrical and Computer engineering, 2016, University of Texas – Austin
URL: http://hdl.handle.net/2152/46269
► One of the fundamental problems in parallel programming is that there is no simple programming paradigm that provides mutual exclusion and synchronization with efficient implementation…
(more)
▼ One of the fundamental problems in parallel programming is that there is no simple programming paradigm that provides mutual exclusion and synchronization with efficient implementation at the same time. For monitor [Hoa74,Han75] (lock-based) systems, only experienced programmers can develop high-performance fine-grained lock-based implementations. Programmers frequently introduce bugs with traditional monitors. Researchers have proposed transactional memory [HM93,ST95], which provides a simple and elegant mechanism for programmers to atomically execute a set of memory operations so that there is no deadlock in transactional memory systems. However, most of transactional memory systems lack conditional synchronization support [WLS14,LW14]. Hence, writing multi-threaded programs with conditional synchronization is rather difficult. In this dissertation, we develop a parallel programming framework that provides simple constructs for mutual exclusion and synchronization as well as efficient implementation.
Advisors/Committee Members: Garg, Vijay K. (Vijay Kumar), 1963- (advisor), Julien, Christine (committee member), Khurshid, Sarfraz (committee member), Mittal, Neeraj (committee member), Perry, Dewayne E (committee member), Pingali, Keshav (committee member).
Subjects/Keywords: Automatic signal; Implicit signal; Monitor; Synchronization; Concurrency; Parallel programming
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Hung, W. (2016). Asynchronous automatic-signal monitors with multi-object synchronization. (Doctoral Dissertation). University of Texas – Austin. Retrieved from http://hdl.handle.net/2152/46269
Chicago Manual of Style (16th Edition):
Hung, Wei-Lun. “Asynchronous automatic-signal monitors with multi-object synchronization.” 2016. Doctoral Dissertation, University of Texas – Austin. Accessed January 24, 2021.
http://hdl.handle.net/2152/46269.
MLA Handbook (7th Edition):
Hung, Wei-Lun. “Asynchronous automatic-signal monitors with multi-object synchronization.” 2016. Web. 24 Jan 2021.
Vancouver:
Hung W. Asynchronous automatic-signal monitors with multi-object synchronization. [Internet] [Doctoral dissertation]. University of Texas – Austin; 2016. [cited 2021 Jan 24].
Available from: http://hdl.handle.net/2152/46269.
Council of Science Editors:
Hung W. Asynchronous automatic-signal monitors with multi-object synchronization. [Doctoral Dissertation]. University of Texas – Austin; 2016. Available from: http://hdl.handle.net/2152/46269

University of Texas – Austin
25.
Kim, Sangman.
Networking abstractions for GPU programs.
Degree: PhD, Computer Sciences, 2015, University of Texas – Austin
URL: http://hdl.handle.net/2152/46765
► Graphics Processing Units (GPUs) are becoming major general-purpose computing hardware for high-performance parallel computation. Despite their general computation capability and impressive performance, GPUs still lack…
(more)
▼ Graphics Processing Units (GPUs) are becoming major general-purpose computing hardware for high-performance parallel computation. Despite their general computation capability and impressive performance, GPUs still lack important operating system (OS) services like networking, which makes building networking services or distributed systems on GPUs challenging. This thesis presents GPUnet, a native GPU networking layer that provides a socket abstraction and high-level networking APIs for GPU programs. GPUnet abstracts complicated coordination of processors and network interfaces from GPU applications, and streamlines the development of server applications on GPUs. We develop several applications that harness the benefit of GPUnet: our matrix multiplication server with GPUnet's performance matches or surpasses the performance of the server without GPUnet, with only 24-43% of lines of code. We also show the scalability of in-GPU-memory MapReduce (GimMR) applications across multiple GPUs. Its word count and K-means workloads can scale to four GPUs with speedups of 2.9-3.5x over one GPU. GPUnet addresses three key challenges: massive parallelism of GPU programs, memory copy overhead between CPU memory and GPU memory, and slow single-threaded performance of GPUs. To better support massively parallel GPU programs, the networking API invocations from multiple threads at the same point in a data-parallel code are coalesced. Direct communication between GPUs and the network devices reduces the copy overhead, and, to minimize the amount of time spent in the single-threaded operation, the control-intensive networking operations are offloaded to the network device.
Advisors/Committee Members: Witchel, Emmett (advisor), Alvisi, Lorenzo (committee member), Ford, Bryan (committee member), Pingali, Keshav (committee member), Porter, Donald E. (committee member), Silberstein, Mark (committee member).
Subjects/Keywords: GPU; Operating systems; Network; Socket; RDMA; CUDA; Infiniband
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Kim, S. (2015). Networking abstractions for GPU programs. (Doctoral Dissertation). University of Texas – Austin. Retrieved from http://hdl.handle.net/2152/46765
Chicago Manual of Style (16th Edition):
Kim, Sangman. “Networking abstractions for GPU programs.” 2015. Doctoral Dissertation, University of Texas – Austin. Accessed January 24, 2021.
http://hdl.handle.net/2152/46765.
MLA Handbook (7th Edition):
Kim, Sangman. “Networking abstractions for GPU programs.” 2015. Web. 24 Jan 2021.
Vancouver:
Kim S. Networking abstractions for GPU programs. [Internet] [Doctoral dissertation]. University of Texas – Austin; 2015. [cited 2021 Jan 24].
Available from: http://hdl.handle.net/2152/46765.
Council of Science Editors:
Kim S. Networking abstractions for GPU programs. [Doctoral Dissertation]. University of Texas – Austin; 2015. Available from: http://hdl.handle.net/2152/46765
26.
-6059-0490.
Distributed tensor computations: formalizing distributions, redistributions, and algorithm derivation.
Degree: PhD, Computer science, 2015, University of Texas – Austin
URL: http://hdl.handle.net/2152/33276
► A goal of computer science is to develop practical methods to automate tasks that are otherwise too complex or tedious to perform manually. Complex tasks…
(more)
▼ A goal of computer science is to develop practical methods to automate tasks
that are otherwise too complex or tedious to perform manually. Complex
tasks can include determining a practical algorithm and creating the
associated implementation for a given problem specification. Goal-oriented
programming can make this systematic. Therefore, we can rely on automated tools
to create implementations by expressing the task of creating implementations in
terms of goal-oriented programming. To do so, pertinent knowledge must be
encoded which requires a notation and language to define relevant abstractions.
This dissertation focuses on distributed-memory parallel tensor
computations arising from computational chemistry. Specifically, we focus on
applications based on the tensor contraction operation of dense, non-symmetric
tensors. Creating an efficient algorithm for a given problem specification in
this domain is complex; creating an optimized implementation of a developed
algorithm is even more complex, tedious, and error-prone. To this end, we
encode pertinent knowledge for distributed-memory parallel algorithms for tensor
contractions of dense non-symmetric tensors. We do this by developing a
notation for data distribution and redistribution that exposes a systematic
procedure for deriving a family of algorithms for this operation for which
efficient implementations exist.
We validate the developed ideas by implementing them in the Redistribution
Operations and Tensor Expressions application programming interface (ROTE API)
and encoding them into an automated system, DxTer, for systematically
generating efficient implementations from problem specifications. Experiments
performed on the IBM Blue Gene/Q and Cray XC30 architectures testing generated
implementations for the spin-adapted coupled cluster singles and doubles method
from computational chemistry demonstrate impact both in terms of performance and
storage requirements.
Advisors/Committee Members: Van de Geijn, Robert A. (advisor), Kolda, Tamara G. (advisor), Stanton, John F (committee member), Pingali, Keshav (committee member), Hammond, Jeff R (committee member), Batory, Don S (committee member).
Subjects/Keywords: High-performance computing; Parallel programming; Tensor computations
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
-6059-0490. (2015). Distributed tensor computations: formalizing distributions, redistributions, and algorithm derivation. (Doctoral Dissertation). University of Texas – Austin. Retrieved from http://hdl.handle.net/2152/33276
Note: this citation may be lacking information needed for this citation format:
Author name may be incomplete
Chicago Manual of Style (16th Edition):
-6059-0490. “Distributed tensor computations: formalizing distributions, redistributions, and algorithm derivation.” 2015. Doctoral Dissertation, University of Texas – Austin. Accessed January 24, 2021.
http://hdl.handle.net/2152/33276.
Note: this citation may be lacking information needed for this citation format:
Author name may be incomplete
MLA Handbook (7th Edition):
-6059-0490. “Distributed tensor computations: formalizing distributions, redistributions, and algorithm derivation.” 2015. Web. 24 Jan 2021.
Note: this citation may be lacking information needed for this citation format:
Author name may be incomplete
Vancouver:
-6059-0490. Distributed tensor computations: formalizing distributions, redistributions, and algorithm derivation. [Internet] [Doctoral dissertation]. University of Texas – Austin; 2015. [cited 2021 Jan 24].
Available from: http://hdl.handle.net/2152/33276.
Note: this citation may be lacking information needed for this citation format:
Author name may be incomplete
Council of Science Editors:
-6059-0490. Distributed tensor computations: formalizing distributions, redistributions, and algorithm derivation. [Doctoral Dissertation]. University of Texas – Austin; 2015. Available from: http://hdl.handle.net/2152/33276
Note: this citation may be lacking information needed for this citation format:
Author name may be incomplete

University of Texas – Austin
27.
-4372-6891.
Predicate detection for parallel computations.
Degree: PhD, Electrical and Computer engineering, 2016, University of Texas – Austin
URL: http://hdl.handle.net/2152/44038
► One of the fundamental problems in runtime verification of parallel program is to check if a predicate could become true in any global state of…
(more)
▼ One of the fundamental problems in runtime verification of parallel program is to check if a predicate could become true in any global state of the system. The problem is challenging because of the nondeterministic process or thread scheduling of the system. Predicate detection alleviates this problem by analyzing the computation of the program and predicting whether the predicate could become true by exercising an alternative process schedule. The technique was first introduced by Cooper et al. and Garg et al. for distributed debugging. Later, jPredictor applies this technique for concurrent debugging. We improve the technique of predicate detection in three ways. The first part of this dissertation presents the first online-and-parallel predicate detector for general-purpose predicate detection, named ParaMount. ParaMount partitions the set of consistent global states and each subset can be enumerated in parallel using existing sequential enumeration algorithms. Our experimental results show that ParaMount speeds up the existing sequential algorithms by a factor of 6 with 8 threads. Moreover, Paramount can run along with the execution of users’ program and hence it is applicable even to non-terminating programs. The second part develops a fast enumeration algorithm, named QuickLex, for consistent global states. In comparison with the original lexical algorithm (Lex), QuickLex uses an additional O(n2) space to reduce the time complexity from O(n2) to O(n·∆(P)), where n is the number of processes or threads in the computation and ∆(P) is the maximal number of incoming edges of any event. The third part introduces Loset — a new model for parallel computations with locking constraints. We show that the reachability problem in a loset is NP-complete. To tackle the NP-completeness, we present several useful properties. Specifically, if the final global state is reachable, then all lock-free feasible global states are reachable. In addition, we show that the reachability of a global state G can be determined using a sub-computation instead of the entire computation. Moreover, we introduce the strong feasibility of a global state, which is an upper approximation of reachability that can be calculated efficiently. Our experiments show that the property accurately models the reachability for all 11 benchmarks.
Advisors/Committee Members: Garg, Vijay K. (Vijay Kumar), 1963- (advisor), Chase, Craig M. (committee member), Julien, Christine (committee member), Khurshid, Sarfraz (committee member), Pingali, Keshav (committee member), Zhang, Lingming (committee member).
Subjects/Keywords: Predicate detection; Parallel computation; Reachability; Consistent global states; Global states enumeration
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
-4372-6891. (2016). Predicate detection for parallel computations. (Doctoral Dissertation). University of Texas – Austin. Retrieved from http://hdl.handle.net/2152/44038
Note: this citation may be lacking information needed for this citation format:
Author name may be incomplete
Chicago Manual of Style (16th Edition):
-4372-6891. “Predicate detection for parallel computations.” 2016. Doctoral Dissertation, University of Texas – Austin. Accessed January 24, 2021.
http://hdl.handle.net/2152/44038.
Note: this citation may be lacking information needed for this citation format:
Author name may be incomplete
MLA Handbook (7th Edition):
-4372-6891. “Predicate detection for parallel computations.” 2016. Web. 24 Jan 2021.
Note: this citation may be lacking information needed for this citation format:
Author name may be incomplete
Vancouver:
-4372-6891. Predicate detection for parallel computations. [Internet] [Doctoral dissertation]. University of Texas – Austin; 2016. [cited 2021 Jan 24].
Available from: http://hdl.handle.net/2152/44038.
Note: this citation may be lacking information needed for this citation format:
Author name may be incomplete
Council of Science Editors:
-4372-6891. Predicate detection for parallel computations. [Doctoral Dissertation]. University of Texas – Austin; 2016. Available from: http://hdl.handle.net/2152/44038
Note: this citation may be lacking information needed for this citation format:
Author name may be incomplete

University of Texas – Austin
28.
-7595-5539.
Practical fast matrix multiplication algorithms.
Degree: PhD, Computer Science, 2018, University of Texas – Austin
URL: http://hdl.handle.net/2152/69013
► Matrix multiplication is a core building block for numerous scientific computing and, more recently, machine learning applications. Strassen's algorithm, the original Fast Matrix Multiplication (FMM)…
(more)
▼ Matrix multiplication is a core building block for numerous scientific computing and, more recently, machine learning applications. Strassen's algorithm, the original Fast Matrix Multiplication (FMM) algorithm, has long fascinated computer scientists due to its startling property of reducing the number of computations required for multiplying n x n matrices from O(n³) to O(n [superscript 2.807]). Over the last half century, this has fueled many theoretical improvements such as other variations of Strassen-like FMM algorithms. Previous implementations of these FMM algorithms led to the "street wisdom" that they are only practical for large, relatively square matrices, that they require considerable workspace, and that they are difficult to achieve thread-level parallelism. The thesis of this work dispels these notions by demonstrating significant benefits for small and non-square matrices, requiring no workspace beyond what is already incorporated in high-performance implementations of matrix multiplication, and achieving performance benefits on multi-core, many-core, and distributed memory architectures.
Advisors/Committee Members: Van de Geijn, Robert A. (advisor), Batory, Don S (committee member), Biros, George (committee member), Henry, Greg M (committee member), Pingali, Keshav K (committee member).
Subjects/Keywords: High performance computing; Matrix multiplication; Strassen's algorithm; Numerical algorithm; Mathematical software; Linear algebra library; BLAS; Performance model; Tensor contraction; GPU; Performance optimization; Machine learning; Parallel computing
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
-7595-5539. (2018). Practical fast matrix multiplication algorithms. (Doctoral Dissertation). University of Texas – Austin. Retrieved from http://hdl.handle.net/2152/69013
Note: this citation may be lacking information needed for this citation format:
Author name may be incomplete
Chicago Manual of Style (16th Edition):
-7595-5539. “Practical fast matrix multiplication algorithms.” 2018. Doctoral Dissertation, University of Texas – Austin. Accessed January 24, 2021.
http://hdl.handle.net/2152/69013.
Note: this citation may be lacking information needed for this citation format:
Author name may be incomplete
MLA Handbook (7th Edition):
-7595-5539. “Practical fast matrix multiplication algorithms.” 2018. Web. 24 Jan 2021.
Note: this citation may be lacking information needed for this citation format:
Author name may be incomplete
Vancouver:
-7595-5539. Practical fast matrix multiplication algorithms. [Internet] [Doctoral dissertation]. University of Texas – Austin; 2018. [cited 2021 Jan 24].
Available from: http://hdl.handle.net/2152/69013.
Note: this citation may be lacking information needed for this citation format:
Author name may be incomplete
Council of Science Editors:
-7595-5539. Practical fast matrix multiplication algorithms. [Doctoral Dissertation]. University of Texas – Austin; 2018. Available from: http://hdl.handle.net/2152/69013
Note: this citation may be lacking information needed for this citation format:
Author name may be incomplete

University of Texas – Austin
29.
Hassaan, Muhammad Amber.
Parallelization of ordered irregular algorithms.
Degree: PhD, Electrical and Computer engineering, 2016, University of Texas – Austin
URL: http://hdl.handle.net/2152/46175
► Parallel computing hardware is ubiquitous, ranging from cell-phones with multiple cores to super-computers with hundreds of thousands of cores. Applications need to take advantage of…
(more)
▼ Parallel computing hardware is ubiquitous, ranging from cell-phones with multiple cores to super-computers with hundreds of thousands of cores. Applications need to take advantage of this parallelism to reduce the time required for their execution as well as to scale to larger input sizes. This has proved to be very challenging because programmers are used to thinking serially, and because current programming notations for writing parallel software are very low level and hard to use in general. At the same time, new problem domains like social network analysis and data analytics are giving rise to "irregular'" applications that have far more complicated patterns of parallelism and locality than applications in more traditional parallel computing fields like computational science. This dissertation is focused on ordered irregular applications, which are applications in which computations must appear to have been performed in an application-specific order. One example considered in this dissertation is the simulation of colliding balls on a billiards table. Collisions in different parts of the table may be performed in parallel as long as in the end, all collisions appear to have been performed in time order. Ordered algorithms are more complex to parallelize than unordered algorithms, such as some mesh refinement algorithms, which lack a notion of algorithmic order. Existing languages and runtime systems are limited in their ability to express and parallelize ordered applications. To address these problems, we introduce a new abstraction for understanding and exploiting parallelism in ordered applications, called the Kinetic Dependence Graph (KDG). We have implemented a high-level user interface for programmers that permits them to write ordered algorithms without needing to understand the Kinetic Dependence Graph or its implementation details. This interface permits programmers to write sequential C++ programs with implicitly parallel loop constructs, using our parallel data structures and runtime libraries. We have also implemented several runtimes that use the KDG to run these applications in parallel. One runtime uses the KDG directly to find and exploit parallelism. The other runtime is based on optimistic (speculative) execution, and it uses the KDG to improve the efficiency of speculation. Our results show a speedup of up to 33 on a 40 core Intel server, performing better than or at par with third party implementations, most of which have been implemented and optimized by hand using low level tools and intricate knowledge of algorithms and hardware.
Advisors/Committee Members: Pingali, Keshav (advisor), Patt, Yale N (committee member), Arvind (committee member), Garg, Vijay K (committee member), Chiou, Derek (committee member), Chase, Craig M (committee member).
Subjects/Keywords: Ordered; Irregular; Parallelization; Runtime; Galois; Kinetic Dependence Graph; Dependence graph; Parallel languages
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Hassaan, M. A. (2016). Parallelization of ordered irregular algorithms. (Doctoral Dissertation). University of Texas – Austin. Retrieved from http://hdl.handle.net/2152/46175
Chicago Manual of Style (16th Edition):
Hassaan, Muhammad Amber. “Parallelization of ordered irregular algorithms.” 2016. Doctoral Dissertation, University of Texas – Austin. Accessed January 24, 2021.
http://hdl.handle.net/2152/46175.
MLA Handbook (7th Edition):
Hassaan, Muhammad Amber. “Parallelization of ordered irregular algorithms.” 2016. Web. 24 Jan 2021.
Vancouver:
Hassaan MA. Parallelization of ordered irregular algorithms. [Internet] [Doctoral dissertation]. University of Texas – Austin; 2016. [cited 2021 Jan 24].
Available from: http://hdl.handle.net/2152/46175.
Council of Science Editors:
Hassaan MA. Parallelization of ordered irregular algorithms. [Doctoral Dissertation]. University of Texas – Austin; 2016. Available from: http://hdl.handle.net/2152/46175
30.
Chauhan, Himanshu.
Algorithms for analyzing parallel computations.
Degree: PhD, Electrical and Computer Engineering, 2018, University of Texas – Austin
URL: http://hdl.handle.net/2152/63631
► Predicate detection is a powerful technique to verify parallel programs. Verifying correctness of programs using this technique involves two steps: first we create a partial…
(more)
▼ Predicate detection is a powerful technique to verify parallel programs. Verifying correctness of programs using this technique involves two steps: first we create a partial order based model, called a computation, of an execution of a parallel program, and then we check all possible global states of this model against a predicate that encodes a faulty behavior. A partial order encodes many total orders, and thus even with one execution of the program we can reason over multiple possible alternate execution scenarios. This dissertation makes algorithmic contributions to predicate detection in three directions.
Enumerating all consistent global states of a computation is a fundamental problem requirement in predicate detection. Multiple algorithms have been proposed to perform this enumeration. Among these, the breadth-first search (BFS) enumeration algorithm is especially useful as it finds an erroneous consistent global state with the least number of events possible. The traditional algorithm for BFS enumeration of consistent global states was given more than two decades ago and is still widely used. This algorithm, however, requires space that in the worst case may be exponential in the number of processes in the computation. We give the first algorithm that performs BFS based enumeration of consistent global states of a computation in space that is polynomial in the number of processes.
Detecting a predicate on a computation is a hard problem in general. Thus, in order to devise efficient detection and analysis algorithms it becomes necessary to use the knowledge about the properties of the predicate. We present algorithms that exploit the properties of two classes of predicates, called stable and counting predicates, and provide significant reduction — exponential in many cases — in time and space required to detect them.
The technique of computation slicing creates a compact representation, called slice, of all global states that satisfy a class of predicates called regular predicate. We present the first distributed and online algorithm to create a slice of a computation with respect a regular predicate. In addition, we give efficient algorithms to create slices of two important temporal logic formulas even when their underlying predicate is not regular but either the predicate or its negation is efficiently detectable.
Advisors/Committee Members: Garg, Vijay K. (Vijay Kumar), 1963- (advisor), Julien, Christine (committee member), Mittal, Neeraj (committee member), Nikolova, Evdokia (committee member), Pingali, Keshav (committee member), Reddi, Vijay Janapa (committee member).
Subjects/Keywords: Algorithms; Theory; Distributed computing; Verification; Predication detection
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Chauhan, H. (2018). Algorithms for analyzing parallel computations. (Doctoral Dissertation). University of Texas – Austin. Retrieved from http://hdl.handle.net/2152/63631
Chicago Manual of Style (16th Edition):
Chauhan, Himanshu. “Algorithms for analyzing parallel computations.” 2018. Doctoral Dissertation, University of Texas – Austin. Accessed January 24, 2021.
http://hdl.handle.net/2152/63631.
MLA Handbook (7th Edition):
Chauhan, Himanshu. “Algorithms for analyzing parallel computations.” 2018. Web. 24 Jan 2021.
Vancouver:
Chauhan H. Algorithms for analyzing parallel computations. [Internet] [Doctoral dissertation]. University of Texas – Austin; 2018. [cited 2021 Jan 24].
Available from: http://hdl.handle.net/2152/63631.
Council of Science Editors:
Chauhan H. Algorithms for analyzing parallel computations. [Doctoral Dissertation]. University of Texas – Austin; 2018. Available from: http://hdl.handle.net/2152/63631
◁ [1] [2] ▶
.