You searched for subject:(hardware acceleration)
.
Showing records 1 – 30 of
111 total matches.
◁ [1] [2] [3] [4] ▶

University of Arkansas
1.
Ding, Hongyuan.
Exploiting Hardware Abstraction for Parallel Programming Framework: Platform and Multitasking.
Degree: PhD, 2017, University of Arkansas
URL: https://scholarworks.uark.edu/etd/1985
► With the help of the parallelism provided by the fine-grained architecture, hardware accelerators on Field Programmable Gate Arrays (FPGAs) can significantly improve the performance…
(more)
▼ With the help of the parallelism provided by the fine-grained architecture,
hardware accelerators on Field Programmable Gate Arrays (FPGAs) can significantly improve the performance of many applications. However, designers are required to have excellent
hardware programming skills and unique optimization techniques to explore the potential of FPGA resources fully. Intermediate frameworks above
hardware circuits are proposed to improve either performance or productivity by leveraging parallel programming models beyond the multi-core era.
In this work, we propose the PolyPC (Polymorphic Parallel Computing) framework, which targets enhancing productivity without losing performance. It helps designers develop parallelized applications and implement them on FPGAs. The PolyPC framework implements a custom
hardware platform, on which programs written in an OpenCL-like programming model can launch. Additionally, the PolyPC framework extends vendor-provided tools to provide a complete development environment including intermediate software framework, and automatic system builders. Designers' programs can be either synthesized as
hardware processing elements (PEs) or compiled to executable files running on software PEs. Benefiting from nontrivial features of re-loadable PEs, and independent group-level schedulers, the multitasking is enabled for both software and
hardware PEs to improve the efficiency of utilizing
hardware resources.
The PolyPC framework is evaluated regarding performance, area efficiency, and multitasking. The results show a maximum 66 times speedup over a dual-core ARM processor and 1043 times speedup over a high-performance MicroBlaze with 125 times of area efficiency. It delivers a significant improvement in response time to high-priority tasks with the priority-aware scheduling. Overheads of multitasking are evaluated to analyze trade-offs. With the help of the design flow, the OpenCL application programs are converted into executables through the front-end source-to-source transformation and back-end synthesis/compilation to run on PEs, and the framework is generated from users' specifications.
Advisors/Committee Members: Miaoqing Huang, David Andrews, Wing Ning Li.
Subjects/Keywords: FPGA; Hardware Abstraction; Hardware Acceleration; Hardware Multitasking; MPSoC; OpenCL; Hardware Systems
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Ding, H. (2017). Exploiting Hardware Abstraction for Parallel Programming Framework: Platform and Multitasking. (Doctoral Dissertation). University of Arkansas. Retrieved from https://scholarworks.uark.edu/etd/1985
Chicago Manual of Style (16th Edition):
Ding, Hongyuan. “Exploiting Hardware Abstraction for Parallel Programming Framework: Platform and Multitasking.” 2017. Doctoral Dissertation, University of Arkansas. Accessed December 09, 2019.
https://scholarworks.uark.edu/etd/1985.
MLA Handbook (7th Edition):
Ding, Hongyuan. “Exploiting Hardware Abstraction for Parallel Programming Framework: Platform and Multitasking.” 2017. Web. 09 Dec 2019.
Vancouver:
Ding H. Exploiting Hardware Abstraction for Parallel Programming Framework: Platform and Multitasking. [Internet] [Doctoral dissertation]. University of Arkansas; 2017. [cited 2019 Dec 09].
Available from: https://scholarworks.uark.edu/etd/1985.
Council of Science Editors:
Ding H. Exploiting Hardware Abstraction for Parallel Programming Framework: Platform and Multitasking. [Doctoral Dissertation]. University of Arkansas; 2017. Available from: https://scholarworks.uark.edu/etd/1985

University of Pretoria
2.
Jacobson, Jared Neil.
Assessing OpenGL
for 2D rendering of geospatial data.
Degree: MSc, Geography, Geoinformatics and
Meteorology, 2015, University of Pretoria
URL: http://hdl.handle.net/2263/45917
► The purpose of this study was to investigate the suitability of using the OpenGL and OpenCL graphics application programming interfaces (APIs), to increase the speed…
(more)
▼ The purpose of this study was to investigate the
suitability of using the OpenGL and OpenCL graphics
application
programming interfaces (APIs), to increase the speed at which 2D
vector geographic information
could be rendered. The research
focused on rendering APIs available to the Windows operating
system.
In order to determine the suitability of OpenGL for
efficiently rendering geographic data, this dissertation
looked at
how software and
hardware based rendering performed. The results
were then compared to that of
the different rendering APIs. In
order to collect the data necessary to achieve this; an in-depth
study of
geographic information systems (GIS), geographic
coordinate systems, OpenGL and OpenCL was conducted.
A simplistic
2D geographic rendering engine was then constructed using a number
of graphic APIs which
included GDI, GDI+, DirectX, OpenGL and the
Direct2D API. The purpose of the developed rendering engine
was to
provide a tool on which to perform a number of rendering
experiments. A large dataset was then
rendered via each of the
implementations. The processing times as well as image quality were
recorded and
analysed. This research investigated the potential
issues such as acquiring data to be rendered for the API as
fast
as possible. This was needed to ensure saturation at the API level.
Other aspects such as difficulty of
implementation as well as
implementation differences were examined.
Additionally, leveraging
the OpenCL API in conjunction with the TopoJSON storage format as a
means of data
compression was investigated. Compression is
beneficial in that to get optimal rendering performance from
OpenGL, the graphic data to be rendered needs to reside in the
graphics processing unit (GPU) memory bank.
More data in GPU
memory in turn theoretically provides faster rendering times. The
aim was to utilise the
extra processing power of the GPU to decode
the data and then pass it to the OpenGL API for rendering and
display. This was achievable via OpenGL/OpenCL context sharing.
The results of the research showed that on average, the OpenGL API
provided a significant speedup of between
nine and fifteen times
that of GDI and GDI+. This means a faster and more performant
rendering engine could
be built with OpenGL at its core.
Additional experiments show that the OpenGL API performed faster
than
GDI and GDI+ even when a dedicated graphics device is not
present. A challenge early in the experiments was
related to the
supply of data to the graphic API. Disk access is orders of
magnitude slower than the rest of the
computer system. As such, in
order to saturate the different graphics APIs, data had to be
loaded into main
memory.
Using the TopoJSON storage format
yielded decent data compression allowing a larger amount of data to
be
stored on the GPU. However, in an initial experiment, it took
longer to process the TopoJSON file into a flat
structure that
could be utilised by OpenGL than to simply use the actual created
objects, process them on the
central processing unit (CPU)…
Advisors/Committee Members: Coetzee, Serena Martha (advisor), Kourie, Derrick G. (coadvisor).
Subjects/Keywords: UCTD; 2D; OpenGL; Rendering; GIS; Hardware
Acceleration
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Jacobson, J. N. (2015). Assessing OpenGL
for 2D rendering of geospatial data. (Masters Thesis). University of Pretoria. Retrieved from http://hdl.handle.net/2263/45917
Chicago Manual of Style (16th Edition):
Jacobson, Jared Neil. “Assessing OpenGL
for 2D rendering of geospatial data.” 2015. Masters Thesis, University of Pretoria. Accessed December 09, 2019.
http://hdl.handle.net/2263/45917.
MLA Handbook (7th Edition):
Jacobson, Jared Neil. “Assessing OpenGL
for 2D rendering of geospatial data.” 2015. Web. 09 Dec 2019.
Vancouver:
Jacobson JN. Assessing OpenGL
for 2D rendering of geospatial data. [Internet] [Masters thesis]. University of Pretoria; 2015. [cited 2019 Dec 09].
Available from: http://hdl.handle.net/2263/45917.
Council of Science Editors:
Jacobson JN. Assessing OpenGL
for 2D rendering of geospatial data. [Masters Thesis]. University of Pretoria; 2015. Available from: http://hdl.handle.net/2263/45917

McMaster University
3.
Thong, Jason.
FPGA Acceleration of Decision-Based Problems using Heterogeneous Computing.
Degree: PhD, 2014, McMaster University
URL: http://hdl.handle.net/11375/16419
► The Boolean satisfiability (SAT) problem is central to many applications involving the verification and optimization of digital systems. These combinatorial problems are typically solved by…
(more)
▼ The Boolean satisfiability (SAT) problem is central to many applications involving the verification and optimization of digital systems. These combinatorial problems are typically solved by using a decision-based approach, however the lengthy compute time of SAT can make it prohibitively impractical for some applications.
We discuss how the underlying physical characteristics of various technologies affect the practicality of SAT solvers. Power dissipation and other physical limitations are increasingly restricting the improvement in performance of conventional software on CPUs. We use heterogeneous computing to maximize the strengths of different underlying technologies as well as different computing architectures.
In this thesis, we present a custom hardware architecture for accelerating the common computation within a SAT solver. Algorithms and data structures must be fundamentally redesigned in order to maximize the strengths of customized computing. Generalizable optimizations are proposed to maximize the throughput, minimize communication latencies, and aggressively compact the memory. We tightly integrate as well as jointly optimize the hardware accelerator and the software host.
Our fully implemented system is significantly faster than pure software on real-life SAT problems. Due to our insights and optimizations, we are able to benchmark SAT in uncharted territory.
Thesis
Doctor of Philosophy (PhD)
Advisors/Committee Members: Nicolici, Nicola, Electrical and Computer Engineering.
Subjects/Keywords: FPGA; heterogeneous computing; Boolean satisfiability; hardware acceleration
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Thong, J. (2014). FPGA Acceleration of Decision-Based Problems using Heterogeneous Computing. (Doctoral Dissertation). McMaster University. Retrieved from http://hdl.handle.net/11375/16419
Chicago Manual of Style (16th Edition):
Thong, Jason. “FPGA Acceleration of Decision-Based Problems using Heterogeneous Computing.” 2014. Doctoral Dissertation, McMaster University. Accessed December 09, 2019.
http://hdl.handle.net/11375/16419.
MLA Handbook (7th Edition):
Thong, Jason. “FPGA Acceleration of Decision-Based Problems using Heterogeneous Computing.” 2014. Web. 09 Dec 2019.
Vancouver:
Thong J. FPGA Acceleration of Decision-Based Problems using Heterogeneous Computing. [Internet] [Doctoral dissertation]. McMaster University; 2014. [cited 2019 Dec 09].
Available from: http://hdl.handle.net/11375/16419.
Council of Science Editors:
Thong J. FPGA Acceleration of Decision-Based Problems using Heterogeneous Computing. [Doctoral Dissertation]. McMaster University; 2014. Available from: http://hdl.handle.net/11375/16419

University of Windsor
4.
Tang, Qing Yun.
FPGA Based Acceleration of Matrix Decomposition and Clustering Algorithm Using High Level Synthesis.
Degree: MA, Electrical and Computer Engineering, 2016, University of Windsor
URL: https://scholar.uwindsor.ca/etd/5669
► FPGAs have shown great promise for accelerating computationally intensive algorithms. However, FPGA-based accelerator design is tedious and time consuming if we rely on traditional HDL…
(more)
▼ FPGAs have shown great promise for accelerating computationally intensive algorithms. However, FPGA-based accelerator design is tedious and time consuming if we rely on traditional HDL based design method. Recent introduction of Altera SDK for OpenCL (AOCL) high level synthesis tool enables developers to utilize FPGA’s potential without long development time and extensive
hardware knowledge.
AOCL is used in this thesis to accelerate computationally intensive algorithms in the field of machine learning and scientific computing. The algorithms studied are k-means clustering, k-nearest neighbour search, N-body simulation and LU decomposition. The performance and power consumption of the algorithms synthesized using AOCL for FPGA are evaluated against state of the art CPU and GPU implementations. The k-means clustering and k-nearest neighbor kernels designed for FPGA significantly out-performed optimized CPU implementations while achieving similar or better power efficiency than that of GPU.
Advisors/Committee Members: Khalid, Mohammed.
Subjects/Keywords: FPGA; Hardware Acceleration; High Level Synthesis; OpenCL
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Tang, Q. Y. (2016). FPGA Based Acceleration of Matrix Decomposition and Clustering Algorithm Using High Level Synthesis. (Masters Thesis). University of Windsor. Retrieved from https://scholar.uwindsor.ca/etd/5669
Chicago Manual of Style (16th Edition):
Tang, Qing Yun. “FPGA Based Acceleration of Matrix Decomposition and Clustering Algorithm Using High Level Synthesis.” 2016. Masters Thesis, University of Windsor. Accessed December 09, 2019.
https://scholar.uwindsor.ca/etd/5669.
MLA Handbook (7th Edition):
Tang, Qing Yun. “FPGA Based Acceleration of Matrix Decomposition and Clustering Algorithm Using High Level Synthesis.” 2016. Web. 09 Dec 2019.
Vancouver:
Tang QY. FPGA Based Acceleration of Matrix Decomposition and Clustering Algorithm Using High Level Synthesis. [Internet] [Masters thesis]. University of Windsor; 2016. [cited 2019 Dec 09].
Available from: https://scholar.uwindsor.ca/etd/5669.
Council of Science Editors:
Tang QY. FPGA Based Acceleration of Matrix Decomposition and Clustering Algorithm Using High Level Synthesis. [Masters Thesis]. University of Windsor; 2016. Available from: https://scholar.uwindsor.ca/etd/5669

University of Guelph
5.
Lacey, Griffin James.
Deep Learning on FPGAs
.
Degree: 2016, University of Guelph
URL: https://atrium.lib.uoguelph.ca/xmlui/handle/10214/9887
► The recent successes of deep learning are largely attributed to the advancement of hardware acceleration technologies, which can accommodate the incredible growth of data sizes…
(more)
▼ The recent successes of deep learning are largely attributed to the advancement of
hardware acceleration technologies, which can accommodate the incredible growth of data sizes and model complexity. The current solution involves using clusters of graphics processing units (GPU) to achieve performance beyond that of general purpose processors (GPP), but the use of field programmable gate arrays (FPGA) is gaining popularity as an alternative due to their low power consumption and flexible architecture. However, there is a lack of infrastructure available for deep learning on FPGAs compared to what is available for GPPs and GPUs, and the practical challenges of developing such infrastructure are often ignored in contemporary work. Through the development of a software framework which extends the popular Caffe framework, this thesis demonstrates the viability of FPGAs as an
acceleration platform for deep learning, and addresses many of the associated technical and practical challenges.
Advisors/Committee Members: Taylor, Graham W (advisor), Areibi, Shawki (advisor).
Subjects/Keywords: Deep Learning;
FPGA;
Machine Learning;
Hardware Acceleration
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Lacey, G. J. (2016). Deep Learning on FPGAs
. (Thesis). University of Guelph. Retrieved from https://atrium.lib.uoguelph.ca/xmlui/handle/10214/9887
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
Chicago Manual of Style (16th Edition):
Lacey, Griffin James. “Deep Learning on FPGAs
.” 2016. Thesis, University of Guelph. Accessed December 09, 2019.
https://atrium.lib.uoguelph.ca/xmlui/handle/10214/9887.
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
MLA Handbook (7th Edition):
Lacey, Griffin James. “Deep Learning on FPGAs
.” 2016. Web. 09 Dec 2019.
Vancouver:
Lacey GJ. Deep Learning on FPGAs
. [Internet] [Thesis]. University of Guelph; 2016. [cited 2019 Dec 09].
Available from: https://atrium.lib.uoguelph.ca/xmlui/handle/10214/9887.
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
Council of Science Editors:
Lacey GJ. Deep Learning on FPGAs
. [Thesis]. University of Guelph; 2016. Available from: https://atrium.lib.uoguelph.ca/xmlui/handle/10214/9887
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation

Cornell University
6.
Ilbeyi, Berkin.
Co-Optimizing Hardware Design and Meta-Tracing Just-in-Time Compilation
.
Degree: 2019, Cornell University
URL: http://hdl.handle.net/1813/67316
► Performance of computers has enjoyed consistent gains due to the availability of faster and cheaper transistors, more complex hardware designs, and better hardware design tools.…
(more)
▼ Performance of computers has enjoyed consistent gains due to the availability of faster and cheaper transistors, more complex
hardware designs, and better
hardware design tools. Increasing computing performance has also led to the ability to develop more complex software, rising popularity of higher level languages, and better software design tools. Unfortunately, technology scaling is slowing, and
hardware and software depend more on each other to continue deliver performance gains in the absence of technology scaling. As single-threaded computing performance slows down, emerging domains such as machine learning are increasingly using custom
hardware. The proliferation of domain-specific
hardware requires the need for more agile
hardware design methodologies. Another trend in the software industry is the rising popularity of dynamic languages. These languages can be slow, but improvements in single-threaded performance and just-in-time (JIT) compilation have improved the performance over the years. As single-threaded performance slows down and software-only JIT techniques provide limited benefits into the future, new approaches are needed to improve the performance of dynamic languages. This thesis aims to address these two related challenges by co-optimizing
hardware design and meta-tracing JIT compilation technology. The first thrust of this thesis is to demonstrate meta-tracing JIT virtual machines (VMs) can be instrumental in building agile
hardware simulators across different abstraction levels including functional level, cycle level, and register-transfer level (RTL). I first introduce an instruction-set simulator that makes use of meta-tracing JIT compilation to productively define instruction semantics and encodings while rivaling the performance of purpose-built simulators. I then build on this simulator and add JIT-assisted cycle-level modeling and embedding within an existing simulator to achieve a very fast cycle-level simulation methodology. I also demonstrate that a number of simulation-aware JIT and JIT-aware simulation techniques can help productive
hardware generation and simulation frameworks to close the performance gap with commercial RTL simulators. The second thrust of this thesis explores
hardware acceleration opportunities in meta-tracing JIT VMs and proposes a software/
hardware co-optimization scheme that can significantly reduce dynamic instruction count in meta-tracing JIT VMs. For this, I first present a methodology to study and research meta-tracing JIT VMs and perform a detailed cross-layer workload characterization of these VMs. Then I quantify a type of value locality in JIT-compiled code called object dereference locality, and propose a software/
hardware co-optimization technique to improve the performance of meta-tracing JIT VMs.
Advisors/Committee Members: Martinez, Jose F. (committeeMember), Zhang, Zhiru (committeeMember).
Subjects/Keywords: Computer engineering;
Computer science;
compiler;
Hardware;
Hardware Acceleration;
Hardware Design;
JIT;
Meta-Tracing
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Ilbeyi, B. (2019). Co-Optimizing Hardware Design and Meta-Tracing Just-in-Time Compilation
. (Thesis). Cornell University. Retrieved from http://hdl.handle.net/1813/67316
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
Chicago Manual of Style (16th Edition):
Ilbeyi, Berkin. “Co-Optimizing Hardware Design and Meta-Tracing Just-in-Time Compilation
.” 2019. Thesis, Cornell University. Accessed December 09, 2019.
http://hdl.handle.net/1813/67316.
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
MLA Handbook (7th Edition):
Ilbeyi, Berkin. “Co-Optimizing Hardware Design and Meta-Tracing Just-in-Time Compilation
.” 2019. Web. 09 Dec 2019.
Vancouver:
Ilbeyi B. Co-Optimizing Hardware Design and Meta-Tracing Just-in-Time Compilation
. [Internet] [Thesis]. Cornell University; 2019. [cited 2019 Dec 09].
Available from: http://hdl.handle.net/1813/67316.
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
Council of Science Editors:
Ilbeyi B. Co-Optimizing Hardware Design and Meta-Tracing Just-in-Time Compilation
. [Thesis]. Cornell University; 2019. Available from: http://hdl.handle.net/1813/67316
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation

The Ohio State University
7.
XUE, Daqing.
Volume Visualization Using Advanced Graphics Hardware
Shaders.
Degree: PhD, Computer Science and Engineering, 2008, The Ohio State University
URL: http://rave.ohiolink.edu/etdc/view?acc_num=osu1219382224
► Graphics hardware based volume visualization techniques have been the active research topic over the last decade. With the more powerful computation ability, the availability…
(more)
▼ Graphics
hardware based volume visualization
techniques have been the active research topic over the last
decade. With the more powerful computation ability, the
availability of large texture memory, and the high programmability,
modern graphics
hardware has been playing a more and more important
role in volume visualization. In the first part
of the thesis, we focus on the graphics
hardware acceleration
techniques. Particularly, we develop a fast X-Ray volume rendering
technique using point-convolution. An X-ray image is generated by
convolving the voxel projection in the rendering buffer with a
reconstruction kernel. Our technique allows users to interactively
view large datasets at their original resolutions on standard PC
hardware. Later, an
acceleration technique for slice based volume
rendering (SBVR) is examined. By means of the early z-culling
feature from the modern graphics
hardware, we can properly set up
the z-buffer from isosurfaces to gain significant improvement in
rendering speed for SBVR. The high
programmability of the graphics processing unit (GPU) incurs a
great deal of research work on exploring this advanced graphics
hardware feature. In the second part of the thesis, we first
revisit the texture splat for flow visualization. We develop a
texture splat vertex shader to achieve fast animated flow
visualization. Furthermore, we develop a new rendering shader of
the implicit flow. By careful tracking and encoding of the
advection parameters into a three-dimensional texture, we achieve
high appearance control and flow representation in real time
rendering. Finally, we present an indirect shader synthesizer to
combine different shader rendering effects to create a highly
informative image to visualize the investigating data. One or more
different shaders are associated with the voxels or geometries. The
shader is resolved at run time to be selected for rendering. Our
indirect shader synthesizer provides a novel method to control the
appearance of the rendering over
multi-shaders.
Advisors/Committee Members: Crawfis, Roger (Advisor).
Subjects/Keywords: Computer Science; volume visualization; graphics hardware; hardware acceleration; flow visualization; multi-shader rendering
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
XUE, D. (2008). Volume Visualization Using Advanced Graphics Hardware
Shaders. (Doctoral Dissertation). The Ohio State University. Retrieved from http://rave.ohiolink.edu/etdc/view?acc_num=osu1219382224
Chicago Manual of Style (16th Edition):
XUE, Daqing. “Volume Visualization Using Advanced Graphics Hardware
Shaders.” 2008. Doctoral Dissertation, The Ohio State University. Accessed December 09, 2019.
http://rave.ohiolink.edu/etdc/view?acc_num=osu1219382224.
MLA Handbook (7th Edition):
XUE, Daqing. “Volume Visualization Using Advanced Graphics Hardware
Shaders.” 2008. Web. 09 Dec 2019.
Vancouver:
XUE D. Volume Visualization Using Advanced Graphics Hardware
Shaders. [Internet] [Doctoral dissertation]. The Ohio State University; 2008. [cited 2019 Dec 09].
Available from: http://rave.ohiolink.edu/etdc/view?acc_num=osu1219382224.
Council of Science Editors:
XUE D. Volume Visualization Using Advanced Graphics Hardware
Shaders. [Doctoral Dissertation]. The Ohio State University; 2008. Available from: http://rave.ohiolink.edu/etdc/view?acc_num=osu1219382224

Penn State University
8.
Snyder, Joshua Scott.
Optimization and Hardware Acceleration of Consensus-based
Matching and Tracking.
Degree: MS, Computer Science and Engineering, 2015, Penn State University
URL: https://etda.libraries.psu.edu/catalog/25036
► Image and video understanding has become an increasingly valuable capability for many emerging applications such as smart retail, intelligent surveillance, and autonomous robotic systems. The…
(more)
▼ Image and video understanding has become an
increasingly valuable capability for many emerging applications
such as smart retail, intelligent surveillance, and autonomous
robotic systems. The critical barrier to enabling these
applications is the high execution latencies of complex vision
tasks that make real-time system constraints difficult, or
impossible, to achieve. One specific instance of a complex vision
task is object tracking, which is the focus of this thesis. Object
tracking is a necessary component of grocery shopping assistance
applications that track a grocery item and a person’s hand and
guides the hand to the item to pick it up. Although there are many
object tracking algorithms to choose from, this work investigates
the performance bottlenecks and optimizations of the
Consensus-based Matching and Tracking, CMT, algorithm. To
circumvent the limitations of standard optical-flow based trackers,
CMT uses a descriptor matching step to redetect an object’s key
features that would be permanently lost in the standard approach.
This allows for an object to be hidden or occluded from view and
redetected once it reappears in the view of the camera. For fully
autonomous systems, in which re-initialization of a failed object
track may not be possible or prohibitively costly, robustness of
the tracker is of critical importance. As such, this work
introduces, an enhanced version of the CMT algorithm that exhibits
improvements in accuracy and robustness as evaluated against a
standardized benchmark. The improvement in accuracy and robustness
of the enhanced CMT comes at the cost of a significant increase in
computational latency. Accordingly, this work also proposes a
hybrid system that integrates high-performance custom hardware
accelerators with a traditional processor to alleviate these new
performance bottlenecks and to support real-time
throughput.
Subjects/Keywords: hardware acceleration; object tracking; FPGA; CMT;
hardware architecture; vision system; computer vision
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Snyder, J. S. (2015). Optimization and Hardware Acceleration of Consensus-based
Matching and Tracking. (Masters Thesis). Penn State University. Retrieved from https://etda.libraries.psu.edu/catalog/25036
Chicago Manual of Style (16th Edition):
Snyder, Joshua Scott. “Optimization and Hardware Acceleration of Consensus-based
Matching and Tracking.” 2015. Masters Thesis, Penn State University. Accessed December 09, 2019.
https://etda.libraries.psu.edu/catalog/25036.
MLA Handbook (7th Edition):
Snyder, Joshua Scott. “Optimization and Hardware Acceleration of Consensus-based
Matching and Tracking.” 2015. Web. 09 Dec 2019.
Vancouver:
Snyder JS. Optimization and Hardware Acceleration of Consensus-based
Matching and Tracking. [Internet] [Masters thesis]. Penn State University; 2015. [cited 2019 Dec 09].
Available from: https://etda.libraries.psu.edu/catalog/25036.
Council of Science Editors:
Snyder JS. Optimization and Hardware Acceleration of Consensus-based
Matching and Tracking. [Masters Thesis]. Penn State University; 2015. Available from: https://etda.libraries.psu.edu/catalog/25036
9.
Merchant, Murtaza.
Testing and Validation of a Prototype Gpgpu Design for FPGAs.
Degree: MS, Electrical & Computer Engineering, 2013, University of Massachusetts
URL: https://scholarworks.umass.edu/theses/1012
► Due to their suitability for highly parallel and pipelined computation, field programmable gate arrays (FPGAs) and general-purpose graphics processing units (GPGPUs) have emerged as…
(more)
▼ Due to their suitability for highly parallel and pipelined computation, field programmable gate arrays (FPGAs) and general-purpose graphics processing units (GPGPUs) have emerged as top contenders for
hardware acceleration of high-performance computing applications. FPGAs are highly specialized devices that can be customized to a specific application, whereas GPGPUs are made of a fixed array of multiprocessors with a rigid architectural model. To alleviate this rigidity as well as to combine some other benefits of the two platforms, it is desirable to explore the implementation of a flexible GPGPU (soft GPGPU) using the reconfigurable fabric found in an FPGA. This thesis describes an aggressive effort to test and validate a prototype GPGPU design targeted to a Virtex-6 FPGA. Individual design stages are tested and integrated together using manually-generated RTL testbenches and logic simulation tools. The soft GPGPU design is validated by benchmarking the platform against five standard CUDA benchmarks. The platform is fully CUDA-compatible and supports direct execution of CUDA compiled binaries. Platform scalability is validated by varying the number of processing cores as well as multiprocessors, and evaluating their effects on area and performance. Experimental results show as average speedup of 25x for a 32 core soft GPGPU configuration over a fully optimized MicroBlaze soft microprocessor, accentuating benefits of the thread-based execution model of GPUs and their ability to perform complex control flow operations in
hardware. The testing and validation of the designed soft GPGPU system serves as a prerequisite for rapid design exploration of the platform in the future.
Advisors/Committee Members: Russell G Tessier.
Subjects/Keywords: GPGPU; FPGA; hardware acceleration; CUDA compatible; scalable; flexible; VLSI and Circuits, Embedded and Hardware Systems
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Merchant, M. (2013). Testing and Validation of a Prototype Gpgpu Design for FPGAs. (Masters Thesis). University of Massachusetts. Retrieved from https://scholarworks.umass.edu/theses/1012
Chicago Manual of Style (16th Edition):
Merchant, Murtaza. “Testing and Validation of a Prototype Gpgpu Design for FPGAs.” 2013. Masters Thesis, University of Massachusetts. Accessed December 09, 2019.
https://scholarworks.umass.edu/theses/1012.
MLA Handbook (7th Edition):
Merchant, Murtaza. “Testing and Validation of a Prototype Gpgpu Design for FPGAs.” 2013. Web. 09 Dec 2019.
Vancouver:
Merchant M. Testing and Validation of a Prototype Gpgpu Design for FPGAs. [Internet] [Masters thesis]. University of Massachusetts; 2013. [cited 2019 Dec 09].
Available from: https://scholarworks.umass.edu/theses/1012.
Council of Science Editors:
Merchant M. Testing and Validation of a Prototype Gpgpu Design for FPGAs. [Masters Thesis]. University of Massachusetts; 2013. Available from: https://scholarworks.umass.edu/theses/1012

Penn State University
10.
DeBole, Michael Vincent.
CONFIGURABLE ACCELERATORS FOR VIDEO ANALYTICS.
Degree: PhD, Computer Science and Engineering, 2011, Penn State University
URL: https://etda.libraries.psu.edu/catalog/11829
► Video analytics is the science of analyzing image sequences and video with the aim to gain a cognitive understanding of a scene. The applications which…
(more)
▼ Video analytics is the science of analyzing image
sequences and video with the aim to gain a cognitive understanding
of a scene. The applications which can take advantage of video
analytics are diverse, ranging from media measurement systems and
surveillance, to medical imaging and traffic systems.
Unfortunately, many of these algorithms can still not be deployed
in embedded environments, or achieve real-time performance, because
of the computational and size, weight, and power (SWaP) constraints
of such systems. In particular, performing complex imaging tasks in
real time are still beyond the capabilities of general CPUs and
embedded microcontrollers alone. Alternatively, systems that have
the ability to perform video analytics in real-time usually require
high SWaPs that forbid their use within an embedded system. The
goal of this dissertation is to explore several areas which have
the potential for enabling low SWaP accelerators to meet the
performance goals of real-time systems. These areas include
low-cost field programmable gate arrays (FPGAs), graphics
processing units (GPUs), three-dimensional (3D) integrated circuits
(ICs), and flexible, high-performance FPGA systems which enable
algorithm exploration. FPGAs have become a highly competitive
platform for implementing low-power systems aimed at real-time
applications. This dissertation describes the implementation of two
popular machine learning algorithms, the artificial neural network
(ANN) and support vector machine (SVM), targeting embedded FPGA
systems. These algorithms were chosen because they can have direct
impacts on commercial applications where these algorithms are used
extensively. Both implementations demonstrate the ability to
perform at the 30 frames-per-second necessary to support real-time
operation and can be configured to meet the resource constraints of
the system. The second class of accelerator, the GPU, consists of
tens to hundreds of functional units with an underlying hardware
architecture that has been fixed. This dissertation examines a key
algorithm towards recognizing salient features within an image,
known as center-surround distribution distance. Through the use of
a GPU platform, the algorithm was able to be accelerated by up to
30 times over an optimized CPU implementation, enabling the
algorithms use for real-time applications. The third area, the 3D
IC, targets an application specific integrated circuit (ASIC)
design that has historically been the most efficient choice for
accelerating custom applications, as they provide the highest
performance at the lowest SWaP. This dissertation demonstrates the
design and implementation of a custom accelerator chip using 3D
technology targeted towards a complete embedded camera accelerator
platform. The chip implements a popular pre-processing algorithm
which extracts skin regions from an image and can operate at 312
frames-per-second (10X real-time performance). Lastly, this
dissertation explores the Falcon framework, which allows
high-performance FPGA systems to automatically be…
Subjects/Keywords: FPGA; Hardware Acceleration; Image Processing; 3D IC; FPGA
Framework
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
DeBole, M. V. (2011). CONFIGURABLE ACCELERATORS FOR VIDEO ANALYTICS. (Doctoral Dissertation). Penn State University. Retrieved from https://etda.libraries.psu.edu/catalog/11829
Chicago Manual of Style (16th Edition):
DeBole, Michael Vincent. “CONFIGURABLE ACCELERATORS FOR VIDEO ANALYTICS.” 2011. Doctoral Dissertation, Penn State University. Accessed December 09, 2019.
https://etda.libraries.psu.edu/catalog/11829.
MLA Handbook (7th Edition):
DeBole, Michael Vincent. “CONFIGURABLE ACCELERATORS FOR VIDEO ANALYTICS.” 2011. Web. 09 Dec 2019.
Vancouver:
DeBole MV. CONFIGURABLE ACCELERATORS FOR VIDEO ANALYTICS. [Internet] [Doctoral dissertation]. Penn State University; 2011. [cited 2019 Dec 09].
Available from: https://etda.libraries.psu.edu/catalog/11829.
Council of Science Editors:
DeBole MV. CONFIGURABLE ACCELERATORS FOR VIDEO ANALYTICS. [Doctoral Dissertation]. Penn State University; 2011. Available from: https://etda.libraries.psu.edu/catalog/11829

University of Debrecen
11.
Gacsal, Patrik.
Hardveres algoritmusgyorsítás FPGA segítségével
.
Degree: DE – TEK – Informatikai Kar, 2012, University of Debrecen
URL: http://hdl.handle.net/2437/128233
► Az x86 architektúra hardveres gyorsításának elérhető megoldásai, egyedi gyorsítási lehetőségek tárgyalása FPGA segítségével. A dolgozat az x86 alapú számítógépek hardveres gyorsításának történelmét foglalja össze, valamint…
(more)
▼ Az x86 architektúra hardveres gyorsításának elérhető megoldásai, egyedi gyorsítási lehetőségek tárgyalása FPGA segítségével. A dolgozat az x86 alapú számítógépek hardveres gyorsításának történelmét foglalja össze, valamint az FPGA alapú egyedi hardveres gyorsítás lehetőségeit taglalja egy lehetséges jövőkép felvázolásával, amelyet a részben FPGA alapú hibrid számítógépek képviselnek.
Advisors/Committee Members: Végh, János (advisor).
Subjects/Keywords: hardware;
acceleration;
fpga;
custom;
ISA;
extension;
coprocessor;
x86
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Gacsal, P. (2012). Hardveres algoritmusgyorsítás FPGA segítségével
. (Thesis). University of Debrecen. Retrieved from http://hdl.handle.net/2437/128233
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
Chicago Manual of Style (16th Edition):
Gacsal, Patrik. “Hardveres algoritmusgyorsítás FPGA segítségével
.” 2012. Thesis, University of Debrecen. Accessed December 09, 2019.
http://hdl.handle.net/2437/128233.
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
MLA Handbook (7th Edition):
Gacsal, Patrik. “Hardveres algoritmusgyorsítás FPGA segítségével
.” 2012. Web. 09 Dec 2019.
Vancouver:
Gacsal P. Hardveres algoritmusgyorsítás FPGA segítségével
. [Internet] [Thesis]. University of Debrecen; 2012. [cited 2019 Dec 09].
Available from: http://hdl.handle.net/2437/128233.
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
Council of Science Editors:
Gacsal P. Hardveres algoritmusgyorsítás FPGA segítségével
. [Thesis]. University of Debrecen; 2012. Available from: http://hdl.handle.net/2437/128233
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation

Linköping University
12.
Holmér, Johan.
Acceleration and Integration of Sound Decoding in FPGA.
Degree: Electrical Engineering, 2011, Linköping University
URL: http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-70180
► The task has been to develop a network media renderer on an embedded linux system running on a Spartan 6 FPGA. One of the…
(more)
▼ The task has been to develop a network media renderer on an embedded linux system running on a Spartan 6 FPGA. One of the challenges have been to make the best use of the limited FPGA area. MP3 have been the prioritised format. To achieve fast MP3 decoding a MicroBlaze soft processor have been configured for speed with concern to the small area availabe. Also the software MP3 decoding process have been accelerated with hardware. MP3 files with full quality (320 kbit/s) can be decoded with real time requirements. A sound interface hardware have been designed to handle the decoded sound samples and convert them to the S/PDIF standard interface. Also UPnP commands have been implemented with the MP3 player software to complete the renderer’s network functionality.
Subjects/Keywords: Hardware acceleration; digital signal processing; embedded systems; sound encoding; Electronics; Elektronik
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Holmér, J. (2011). Acceleration and Integration of Sound Decoding in FPGA. (Thesis). Linköping University. Retrieved from http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-70180
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
Chicago Manual of Style (16th Edition):
Holmér, Johan. “Acceleration and Integration of Sound Decoding in FPGA.” 2011. Thesis, Linköping University. Accessed December 09, 2019.
http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-70180.
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
MLA Handbook (7th Edition):
Holmér, Johan. “Acceleration and Integration of Sound Decoding in FPGA.” 2011. Web. 09 Dec 2019.
Vancouver:
Holmér J. Acceleration and Integration of Sound Decoding in FPGA. [Internet] [Thesis]. Linköping University; 2011. [cited 2019 Dec 09].
Available from: http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-70180.
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
Council of Science Editors:
Holmér J. Acceleration and Integration of Sound Decoding in FPGA. [Thesis]. Linköping University; 2011. Available from: http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-70180
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation

University of Toronto
13.
Boutros, Andrew Maher Mansour.
Enhancing FPGA Architecture for Efficient Deep Learning Inference.
Degree: 2018, University of Toronto
URL: http://hdl.handle.net/1807/91435
► Deep Learning (DL) has become best-in-class for numerous applications but at a high computational cost that necessitates high-performance energy-efficient acceleration. FPGAs offer an appealing DL…
(more)
▼ Deep Learning (DL) has become best-in-class for numerous applications but at a high computational cost that necessitates high-performance energy-efficient acceleration. FPGAs offer an appealing DL inference acceleration platform due to their flexibility and energy-efficiency. This thesis explores FPGA architectural changes to enhance the efficiency of a class of DL models, convolutional neural networks (CNNs), on FPGAs. We first build three state-of-the-art CNN computing architectures (CAs) as benchmarks representative of the DL domain and quantify the FPGA vs. ASIC efficiency gaps for these CAs to highlight the bottlenecks of current FPGA architectures. Then, we enhance the flexibility of digital signal processing (DSP) blocks on current FPGAs for low-precision DL. Our DSP block increases the performance of 8-bit and 4-bit CNN inference by 1.3x and 1.6x respectively with minimal block area overhead. Finally, we present a preliminary evaluation of logic block architectural changes, leaving their detailed evaluation for future work.
M.A.S.
Advisors/Committee Members: Betz, Vaughn, Electrical and Computer Engineering.
Subjects/Keywords: Convolutional Neural Networks; Deep Learning; FPGA Architecture; Hardware Acceleration; 0464
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Boutros, A. M. M. (2018). Enhancing FPGA Architecture for Efficient Deep Learning Inference. (Masters Thesis). University of Toronto. Retrieved from http://hdl.handle.net/1807/91435
Chicago Manual of Style (16th Edition):
Boutros, Andrew Maher Mansour. “Enhancing FPGA Architecture for Efficient Deep Learning Inference.” 2018. Masters Thesis, University of Toronto. Accessed December 09, 2019.
http://hdl.handle.net/1807/91435.
MLA Handbook (7th Edition):
Boutros, Andrew Maher Mansour. “Enhancing FPGA Architecture for Efficient Deep Learning Inference.” 2018. Web. 09 Dec 2019.
Vancouver:
Boutros AMM. Enhancing FPGA Architecture for Efficient Deep Learning Inference. [Internet] [Masters thesis]. University of Toronto; 2018. [cited 2019 Dec 09].
Available from: http://hdl.handle.net/1807/91435.
Council of Science Editors:
Boutros AMM. Enhancing FPGA Architecture for Efficient Deep Learning Inference. [Masters Thesis]. University of Toronto; 2018. Available from: http://hdl.handle.net/1807/91435

University of Windsor
14.
Janik, Ian Spencer.
High Level Synthesis and Evaluation of the Secure Hash Standard for FPGAs.
Degree: MA, Electrical and Computer Engineering, 2015, University of Windsor
URL: https://scholar.uwindsor.ca/etd/5470
► Secure hash algorithms (SHAs) are important components of cryptographic applications. SHA performance on central processing units (CPUs) is slow, therefore, acceleration must be done…
(more)
▼ Secure hash algorithms (SHAs) are important components of cryptographic applications. SHA performance on central processing units (CPUs) is slow, therefore,
acceleration must be done using
hardware such as Field Programmable Gate Arrays (FPGAs). Considerable work has been done in academia using FPGAs to accelerate SHAs. These designs were implemented using
Hardware Description Language (HDL) based design methodologies, which are tedious and time consuming. High Level Synthesis (HLS) enables designers to synthesize optimized FPGA
hardware from algorithm specifications in programming languages such as C/C++. This substantially reduces the design cost and time. In this thesis, the Altera SDK for OpenCL (AOCL) HLS tool was used to synthesize the SHAs on FPGAs and to explore the design space of the algorithms. The results were evaluated against the previous HDL based designs. Synthesized FPGA
hardware performance was comparable to the HDL based designs despite the simpler and faster design process.
Advisors/Committee Members: Khalid, Mohammed.
Subjects/Keywords: FPGAs; Hardware Acceleration; High Level Synthesis; SHA1; SHA2; SHA3
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Janik, I. S. (2015). High Level Synthesis and Evaluation of the Secure Hash Standard for FPGAs. (Masters Thesis). University of Windsor. Retrieved from https://scholar.uwindsor.ca/etd/5470
Chicago Manual of Style (16th Edition):
Janik, Ian Spencer. “High Level Synthesis and Evaluation of the Secure Hash Standard for FPGAs.” 2015. Masters Thesis, University of Windsor. Accessed December 09, 2019.
https://scholar.uwindsor.ca/etd/5470.
MLA Handbook (7th Edition):
Janik, Ian Spencer. “High Level Synthesis and Evaluation of the Secure Hash Standard for FPGAs.” 2015. Web. 09 Dec 2019.
Vancouver:
Janik IS. High Level Synthesis and Evaluation of the Secure Hash Standard for FPGAs. [Internet] [Masters thesis]. University of Windsor; 2015. [cited 2019 Dec 09].
Available from: https://scholar.uwindsor.ca/etd/5470.
Council of Science Editors:
Janik IS. High Level Synthesis and Evaluation of the Secure Hash Standard for FPGAs. [Masters Thesis]. University of Windsor; 2015. Available from: https://scholar.uwindsor.ca/etd/5470

University of Illinois – Urbana-Champaign
15.
Kesler, David R.
A hardware acceleration technique for gradient descent and conjugate gradient.
Degree: MS, 1200, 2011, University of Illinois – Urbana-Champaign
URL: http://hdl.handle.net/2142/24241
► Gradient descent, conjugate gradient, and other iterative algorithms are a powerful class of algorithms; however, they can take a long time for conver- gence. Baseline…
(more)
▼ Gradient descent, conjugate gradient, and other iterative algorithms are a
powerful class of algorithms; however, they can take a long time for conver-
gence. Baseline accelerator designs feature insu cient coverage of operations
and do not work well on the problems we target. In this thesis we present
a novel
hardware architecture for accelerating gradient descent and other
similar algorithms. To support this architecture, we also present a sparse
matrix-vector storage format, and software support for utilizing the format,
so that it can be e ciently mapped onto
hardware which is also well suited for
dense operations. We show that the accelerator design outperforms similar
designs which target only the most dominant operation of a given algorithm,
providing substantial energy and performance bene ts. We further show that
the accelerator can be reasonably implemented on a general purpose CPU
with small area overhead.
Advisors/Committee Members: Kumar, Rakesh (advisor).
Subjects/Keywords: Gradient Descent; Conjugate Gradient; Hardware Acceleration; Matrix Multiplication
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Kesler, D. R. (2011). A hardware acceleration technique for gradient descent and conjugate gradient. (Thesis). University of Illinois – Urbana-Champaign. Retrieved from http://hdl.handle.net/2142/24241
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
Chicago Manual of Style (16th Edition):
Kesler, David R. “A hardware acceleration technique for gradient descent and conjugate gradient.” 2011. Thesis, University of Illinois – Urbana-Champaign. Accessed December 09, 2019.
http://hdl.handle.net/2142/24241.
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
MLA Handbook (7th Edition):
Kesler, David R. “A hardware acceleration technique for gradient descent and conjugate gradient.” 2011. Web. 09 Dec 2019.
Vancouver:
Kesler DR. A hardware acceleration technique for gradient descent and conjugate gradient. [Internet] [Thesis]. University of Illinois – Urbana-Champaign; 2011. [cited 2019 Dec 09].
Available from: http://hdl.handle.net/2142/24241.
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
Council of Science Editors:
Kesler DR. A hardware acceleration technique for gradient descent and conjugate gradient. [Thesis]. University of Illinois – Urbana-Champaign; 2011. Available from: http://hdl.handle.net/2142/24241
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation

Penn State University
16.
Okafor, Ikenna.
Hardware Acceleration of Visual Object Search.
Degree: 2017, Penn State University
URL: https://etda.libraries.psu.edu/catalog/14067izo5011
► Visual Object Search, the process of locating an object within an image, is a key task in many automated vision systems with applications ranging from…
(more)
▼ Visual Object Search, the process of locating an object within an image, is a key task in many automated vision systems with applications ranging from surveillance to medical imaging. The task is typically performed using one of two methods: an exhaustive/semi exhaustive search, or region proposal followed by classification. In practice, reasonable classification accuracies, especially in real time systems, have been achieved by incorporating the latter to avoid expensive searching of the entire scene. However, localizing the object within the scene still presents a challenge to visual object search systems.
Hardware acceleration has the potential to remove this dependency by making exhaustive/semi-exhaustive image search feasible from a latency perspective. This work aims to investigate the computational performance benefits of using either method, provided the opportunity for
hardware acceleration.
Advisors/Committee Members: Vijaykrishnan Narayanan, Thesis Advisor.
Subjects/Keywords: Computer Vision; Sliding Window; ROI; Hardware Acceleration; FPGA
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Okafor, I. (2017). Hardware Acceleration of Visual Object Search. (Thesis). Penn State University. Retrieved from https://etda.libraries.psu.edu/catalog/14067izo5011
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
Chicago Manual of Style (16th Edition):
Okafor, Ikenna. “Hardware Acceleration of Visual Object Search.” 2017. Thesis, Penn State University. Accessed December 09, 2019.
https://etda.libraries.psu.edu/catalog/14067izo5011.
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
MLA Handbook (7th Edition):
Okafor, Ikenna. “Hardware Acceleration of Visual Object Search.” 2017. Web. 09 Dec 2019.
Vancouver:
Okafor I. Hardware Acceleration of Visual Object Search. [Internet] [Thesis]. Penn State University; 2017. [cited 2019 Dec 09].
Available from: https://etda.libraries.psu.edu/catalog/14067izo5011.
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
Council of Science Editors:
Okafor I. Hardware Acceleration of Visual Object Search. [Thesis]. Penn State University; 2017. Available from: https://etda.libraries.psu.edu/catalog/14067izo5011
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation

University of North Texas
17.
Amarasinghe, Dhanyu Eshaka.
Real-time Rendering of Burning Objects in Video Games.
Degree: 2013, University of North Texas
URL: https://digital.library.unt.edu/ark:/67531/metadc500131/
► In recent years there has been growing interest in limitless realism in computer graphics applications. Among those, my foremost concentration falls into the complex physical…
(more)
▼ In recent years there has been growing interest in limitless realism in computer graphics applications. Among those, my foremost concentration falls into the complex physical simulations and modeling with diverse applications for the gaming industry. Different simulations have been virtually successful by replicating the details of physical process. As a result, some were strong enough to lure the user into believable virtual worlds that could destroy any sense of attendance. In this research, I focus on fire simulations and its deformation process towards various virtual objects. In most game engines model loading takes place at the beginning of the game or when the game is transitioning between levels. Game models are stored in large data structures. Since changing or adjusting a large data structure while the game is proceeding may adversely affect the performance of the game. Therefore, developers may choose to avoid procedural simulations to save resources and avoid interruptions on performance. I introduce a process to implement a real-time model deformation while maintaining performance. It is a challenging task to achieve high quality simulation while utilizing minimum resources to represent multiple events in timely manner. Especially in video games, this overwhelming criterion would be robust enough to sustain the engaging player's willing suspension of disbelief. I have implemented and tested my method on a relatively modest GPU using CUDA. My experiments conclude this method gives a believable visual effect while using small fraction of CPU and GPU resources.
Advisors/Committee Members: Parberry, Ian, Mikler, Armin, Renka, Robert, Akl, Robert G., Tarau, Paul.
Subjects/Keywords: Hardware acceleration; volume rendering; CUDA; free form deformation; polygonal modeling
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share






Boston University
18.
Zhou, Boyou.
A multi-layer approach to designing secure systems: from circuit to software.
Degree: PhD, Electrical & Computer Engineering, 2019, Boston University
URL: http://hdl.handle.net/2144/36149
► In the last few years, security has become one of the key challenges in computing systems. Failures in the secure operations of these systems have…
(more)
▼ In the last few years, security has become one of the key challenges in computing systems. Failures in the secure operations of these systems have led to massive information leaks and cyber-attacks. Case in point, the identity leaks from Equifax in 2016, Spectre and Meltdown attacks to Intel and AMD processors in 2017, Cyber-attacks on Facebook in 2018. These recent attacks have shown that the intruders attack different layers of the systems, from low-level
hardware to software as a service(SaaS). To protect the systems, the defense mechanisms should confront the attacks in the different layers of the systems. In this work, we propose four security mechanisms for computing systems: (i ) using backside imaging to detect
Hardware Trojans (HTs) in Application Specific Integrated Circuits (ASICs) chips, (ii ) developing energy-efficient reconfigurable cryptographic engines, (iii) examining the feasibility of malware detection using
Hardware Performance Counters (HPC).
Most of the threat models assume that the root of trust is the
hardware running beneath the software stack. However, attackers can insert malicious
hardware blocks, i.e. HTs, into the Integrated Circuits (ICs) that provide back-doors to the attackers or leak confidential information. HTs inserted during fabrication are extremely hard to detect since their overheads in performance and power are below the variations in the performance and power caused by manufacturing. In our work, we have developed an optical method that identifies modified or replaced gates in the ICs. We use the near-infrared light to image the ICs because silicon is transparent to near-infrared light and metal reflects infrared light. We leverage the near-infrared imaging to identify the locations of each gate, based on the signatures of metal structures reflected by the lowest metal layer. By comparing the imaged results to the pre-fabrication design, we can identify any modifications, shifts or replacements in the circuits to detect HTs.
With the trust of the silicon, the computing system must use secure communication channels for its applications. The low-energy cost devices, such as the Internet of Things (IoT), leverage strong cryptographic algorithms (e.g. AES, RSA, and SHA) during communications. The cryptographic operations cause the IoT devices a significant amount of power. As a result, the power budget limits their applications. To mitigate the high power consumption, modern processors embed these cryptographic operations into
hardware primitives. This also improves system performance. The
hardware unit embedded into the processor provides high energy-efficiency, low energy cost. However,
hardware implementations limit flexibility. The longevity of theIoTs can exceed the lifetime of the cryptographic algorithms. The replacement of the IoT devices is costly and sometimes prohibitive, e.g., monitors in nuclear reactors.In order to reconfigure cryptographic algorithms into
hardware, we have developed
a system with a reconfigurable encryption engine on the Zedboard platform. The…
Advisors/Committee Members: Joshi, Ajay (advisor), Egele, Manuel (advisor).
Subjects/Keywords: Computer engineering; Cryptographic acceleration; Hardware Trojan detection; Malware detection; Security
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Zhou, B. (2019). A multi-layer approach to designing secure systems: from circuit to software. (Doctoral Dissertation). Boston University. Retrieved from http://hdl.handle.net/2144/36149
Chicago Manual of Style (16th Edition):
Zhou, Boyou. “A multi-layer approach to designing secure systems: from circuit to software.” 2019. Doctoral Dissertation, Boston University. Accessed December 09, 2019.
http://hdl.handle.net/2144/36149.
MLA Handbook (7th Edition):
Zhou, Boyou. “A multi-layer approach to designing secure systems: from circuit to software.” 2019. Web. 09 Dec 2019.
Vancouver:
Zhou B. A multi-layer approach to designing secure systems: from circuit to software. [Internet] [Doctoral dissertation]. Boston University; 2019. [cited 2019 Dec 09].
Available from: http://hdl.handle.net/2144/36149.
Council of Science Editors:
Zhou B. A multi-layer approach to designing secure systems: from circuit to software. [Doctoral Dissertation]. Boston University; 2019. Available from: http://hdl.handle.net/2144/36149

New Jersey Institute of Technology
19.
Li, Gang.
High-performance matrix multiplication on Intel and FGPA platforms.
Degree: MSin Computer Engineering - (M.S.), Electrical and Computer Engineering, 2012, New Jersey Institute of Technology
URL: https://digitalcommons.njit.edu/theses/136
► Matrix multiplication is at the core of high-performance numerical computation. Software methods of accelerating matrix multiplication fall into two categories. One is based on…
(more)
▼ Matrix multiplication is at the core of high-performance numerical computation. Software methods of accelerating matrix multiplication fall into two categories. One is based on calculation simplification. The other one is based on increasing the memory access efficiency. Also matrix multiplication can be accelerated using vector processors. In this investigation, various matrix multiplication algorithms and the vector-based
hardware acceleration method are analyzed and compared in terms of performance and memory requirements. Results are shown for Intel and Xilinx FPGA platforms. They show that when the CPU is fast, Goto's algorithm runs faster than Strassen's algorithm because the data access speed is the bottleneck in this case. On the contrary, when the CPU is slow, Strassen's algorithm runs faster because the computation complexity becomes the key factor in this case. Also, the results show that SIMD platforms, such as Intel Xeon and SIMD extensions and an in-house developed VP (Vector co-Processor), for an FPGA, can accelerate matrix multiplication substantially. It is even shown that the VP runs faster than MKL (Intel's optimized Math Kernel Library). This is because not only can the VP take advantage of larger vector lengths but it also minimizes inherent
hardware overheads.
Advisors/Committee Members: Sotirios Ziavras, Roberto Rojas-Cessa, Edwin Hou.
Subjects/Keywords: Matrix multiplication algorithms; Vector-based hardware acceleration; Computer Engineering
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Li, G. (2012). High-performance matrix multiplication on Intel and FGPA platforms. (Thesis). New Jersey Institute of Technology. Retrieved from https://digitalcommons.njit.edu/theses/136
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
Chicago Manual of Style (16th Edition):
Li, Gang. “High-performance matrix multiplication on Intel and FGPA platforms.” 2012. Thesis, New Jersey Institute of Technology. Accessed December 09, 2019.
https://digitalcommons.njit.edu/theses/136.
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
MLA Handbook (7th Edition):
Li, Gang. “High-performance matrix multiplication on Intel and FGPA platforms.” 2012. Web. 09 Dec 2019.
Vancouver:
Li G. High-performance matrix multiplication on Intel and FGPA platforms. [Internet] [Thesis]. New Jersey Institute of Technology; 2012. [cited 2019 Dec 09].
Available from: https://digitalcommons.njit.edu/theses/136.
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
Council of Science Editors:
Li G. High-performance matrix multiplication on Intel and FGPA platforms. [Thesis]. New Jersey Institute of Technology; 2012. Available from: https://digitalcommons.njit.edu/theses/136
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
20.
Hansson, Karl.
Performance and Perceived Realism in Rasterized 3D Sound Propagation for Interactive Virtual Environments.
Degree: 2019, , Department of Software Engineering
URL: http://urn.kb.se/resolve?urn=urn:nbn:se:bth-18251
► Background. 3D sound propagation is important for immersion and realism in interactive and dynamic virtual environments. However, this is difficult to model in a…
(more)
▼ Background. 3D sound propagation is important for immersion and realism in interactive and dynamic virtual environments. However, this is difficult to model in a physically accurate manner under real-time constraints. Computer graphics techniques are used in acoustics research to increase performance, yet there is little utilization of the especially efficient rasterization techniques, possibly due to concerns of physical accuracy. Fortunately, psychoacoustics have shown that perceived realism does not equate physical accuracy. This indicates that perceptually realistic and high-performance 3D sound propagation may be achievable with rasterization techniques. Objectives. This thesis investigates whether 3D sound propagation can be modelled with high performance and perceived realism using rasterization-based techniques. Methods. A rasterization-based solution for 3D sound propagation is implemented. Its perceived realism is measured using psychoacoustic evaluations. Its performance is analyzed through computation time measurements with varying sound source and triangle count, and theoretical calculations of memory consumption. The performance and perceived realism of the rasterization-based solution is compared with an existing solution. Results. The rasterization-based solution shows both higher performance and perceived realism than the existing solution. Conclusions. 3D sound propagation can be modelled with high performance and perceived realism using rasterization-based techniques. Thus, rasterized 3D sound propagation may provide efficient, low-cost, perceptually realistic 3D audio for areas where immersion and perceptual realism are important, such as video games, serious games, live entertainment events, architectural design, art production and training simulations.
Bakgrund. 3D-ljudpropagering är viktig för inlevelse och realism i interaktiva och dynamiska virtuella miljöer. Dock är detta svårt att modellera på fysiskt träffsäkert sätt med realtidsbegränsningar. Tekniker inom datorgrafik används inom akustikforskning för att öka prestanda, ändock används knappt de synnerligen effektiva rasteriseringsteknikerna, möjligtvis på grund av osäkerhet kring fysisk träffsäkerhet. Lyckligtvis har psykoakustiken visat att uppfattad realism inte är detsamma som fysisk träffsäkerhet. Detta är en indikation på att högpresterande och perceptuellt realistisk 3D-ljudpropagering kan åstadkommas med rasteriseringstekniker. Syfte. Denna avhandling undersöker huruvida 3D-ljudpropagering kan modelleras med hög prestanda och perceptuell realism med rasteriseringstekniker. Metod. En rasteriseringsbaserad lösning för 3D-ljudpropagering implementeras. Dess perceptuella realism mäts genom psykoakustiska utvärderingar. Dess prestanda analyseras genom körtidsmätningar vid varierande antal ljudkällor och trianglar, och teoretiska uträkningar över…
Subjects/Keywords: psychoacoustics; rasterization; hardware-acceleration; psykoakustik; rasterisering; hårdvaruacceleration; Media Engineering; Mediateknik
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Hansson, K. (2019). Performance and Perceived Realism in Rasterized 3D Sound Propagation for Interactive Virtual Environments. (Thesis). , Department of Software Engineering. Retrieved from http://urn.kb.se/resolve?urn=urn:nbn:se:bth-18251
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
Chicago Manual of Style (16th Edition):
Hansson, Karl. “Performance and Perceived Realism in Rasterized 3D Sound Propagation for Interactive Virtual Environments.” 2019. Thesis, , Department of Software Engineering. Accessed December 09, 2019.
http://urn.kb.se/resolve?urn=urn:nbn:se:bth-18251.
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
MLA Handbook (7th Edition):
Hansson, Karl. “Performance and Perceived Realism in Rasterized 3D Sound Propagation for Interactive Virtual Environments.” 2019. Web. 09 Dec 2019.
Vancouver:
Hansson K. Performance and Perceived Realism in Rasterized 3D Sound Propagation for Interactive Virtual Environments. [Internet] [Thesis]. , Department of Software Engineering; 2019. [cited 2019 Dec 09].
Available from: http://urn.kb.se/resolve?urn=urn:nbn:se:bth-18251.
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
Council of Science Editors:
Hansson K. Performance and Perceived Realism in Rasterized 3D Sound Propagation for Interactive Virtual Environments. [Thesis]. , Department of Software Engineering; 2019. Available from: http://urn.kb.se/resolve?urn=urn:nbn:se:bth-18251
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation

University of Victoria
21.
Kanan, Awos.
Optimized hardware accelerators for data mining applications.
Degree: Department of Electrical and Computer Engineering, 2018, University of Victoria
URL: https://dspace.library.uvic.ca//handle/1828/9079
► Data mining plays an important role in a variety of fields including bioinformatics, multimedia, business intelligence, marketing, and medical diagnosis. Analysis of today’s huge and…
(more)
▼ Data mining plays an important role in a variety of fields including bioinformatics, multimedia, business intelligence, marketing, and medical diagnosis. Analysis of today’s
huge and complex data involves several data mining algorithms including clustering and
classification. The computational complexity of machine learning and data mining algorithms, that are frequently used in today’s applications such as embedded systems, makes the design of efficient
hardware architectures for these algorithms a challenging issue for the development of such systems. The aim of this work is to optimize the performance of
hardware acceleration for data mining applications in terms of speed and area. Most of the previous accelerator architectures proposed in the literature have been obtained using ad hoc techniques that do not allow for design space exploration, some did not consider the size (number of samples) and dimensionality (number of features in each sample) of the datasets. To obtain practical architectures that are amenable for
hardware implementation, size and dimensionality of input datasets are taken into consideration in this work. For one-dimensional data, algorithm-level optimizations are investigated to design a fast and area-efficient
hardware accelerator for clustering one-dimensional datasets using the well-known K-Means clustering algorithm. Experimental results show that the optimizations adopted in the proposed architecture result in faster convergence of the algorithm using less
hardware resources while maintaining the quality of clustering results. The computation of similarity distance matrices is one of the computational kernels that are generally required by several machine learning and data mining algorithms to measure the degree of similarity between data samples. For these algorithms, distance calculation is considered a computationally intensive task that accounts for a significant portion of the processing time. A systematic methodology is presented to explore the design space of 2-D and 1-D processor array architectures for similarity distance computation involved in processing datasets of different sizes and dimensions. Six 2-D and six 1-D processor array architectures are developed systematically using linear scheduling and projection operations. The obtained
architectures are classified based on the size and dimensionality of input datasets, analyzed in terms of speed and area, and compared with previous architectures in the literature. Motivated by the necessity to accommodate large-scale and high-dimensional data, nonlinear scheduling and projection operations are finally introduced to design a scalable processor array architecture for the computation of similarity distance matrices. Implementation results of the proposed architecture show improved compromise between area and speed. Moreover, it scales better for large and high-dimensional datasets since the architecture is fully parameterized and only has to deal with one data dimension in each time step.
Advisors/Committee Members: Gebali, Fayez (supervisor), Ibrahim, Atef (supervisor).
Subjects/Keywords: Data Mining; Parallel Algorithms; Hardware Acceleration; Systolic Arrays; Design Methodology
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Kanan, A. (2018). Optimized hardware accelerators for data mining applications. (Thesis). University of Victoria. Retrieved from https://dspace.library.uvic.ca//handle/1828/9079
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
Chicago Manual of Style (16th Edition):
Kanan, Awos. “Optimized hardware accelerators for data mining applications.” 2018. Thesis, University of Victoria. Accessed December 09, 2019.
https://dspace.library.uvic.ca//handle/1828/9079.
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
MLA Handbook (7th Edition):
Kanan, Awos. “Optimized hardware accelerators for data mining applications.” 2018. Web. 09 Dec 2019.
Vancouver:
Kanan A. Optimized hardware accelerators for data mining applications. [Internet] [Thesis]. University of Victoria; 2018. [cited 2019 Dec 09].
Available from: https://dspace.library.uvic.ca//handle/1828/9079.
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
Council of Science Editors:
Kanan A. Optimized hardware accelerators for data mining applications. [Thesis]. University of Victoria; 2018. Available from: https://dspace.library.uvic.ca//handle/1828/9079
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
22.
Siddiqui, Fahad Manzoor.
FPGA-based programmable embedded platform for image processing applications.
Degree: PhD, 2018, Queen's University Belfast
URL: https://pure.qub.ac.uk/portal/en/theses/fpgabased-programmable-embedded-platform-for-image-processing-applications(a59d226f-253a-475b-b064-fd79359dcccb).html
;
https://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.766276
► A vast majority of electronic systems including medical, surveillance and critical infrastructure employs image processing to provide intelligent analysis. They use onboard pre-processing to reduce…
(more)
▼ A vast majority of electronic systems including medical, surveillance and critical infrastructure employs image processing to provide intelligent analysis. They use onboard pre-processing to reduce data bandwidth and memory requirements before sending information to the central system. Field Programmable Gate Arrays (FPGAs) represent a strong platform as they permit reconfigurability and pipelining for streaming applications. However, rapid advances and changes in these application use cases crave adaptable hardware architectures that can process dynamic data workloads and be easily programmed to achieve ecient solutions in terms of area, time and power. FPGA-based development needs iterative design cycles, hardware synthesis and place-and-route times which are alien to the software developers. This work proposes an FPGA-based programmable hardware acceleration approach to reduce design effort and time. This allows developers to use FPGAs to profile, optimise and quickly prototype algorithms using a more familiar software-centric, edit-compile-run design flow that enables the programming of the platform by software rather than high-level synthesis (HLS) engineering principles. Central to the work has been the development of an optimised FPGA-based processor called Image Processing Processor (IPPro) which efficiently uses the underlying resources and presents a programmable environment to the programmer using a dataflow design principle. This gives superior performance when compared to competing alternatives. From this, a three-layered platform has been created which enables the realisation of parallel computing skeletons on FPGA which are used to eciently express designs in high-level programming languages. From bottom-up, these layers represent programming (actor, multiple actors and parallel skeletons) and hardware (IPPro core, multicore IPPro, system infrastructure) abstraction. The platform allows acceleration of parallel and non-parallel dataflow applications. A set of point and area image pre-processing functions are implemented on Avnet Zedboard platform which allows the evaluation of the performance. The point function achieved 2.53 times better performance than the area functions and point and area functions achieved performance improvements of 7.80 and 5.27 times over sin- gle core IPPro by exploiting data parallelism. The pipelined execution of multiple stages revealed that a dataflow graph can be decomposed into balanced actors to deliver maximum performance by hiding data transfer and processing time through exploiting task parallelism; otherwise, the maximum achievable performance is limited by the slowest actor due to the ripple effect caused by unbalanced actors. The platform delivered better performance in terms of fps/Watt/Area than Embedded Graphic Processing Unit (GPU) considering both technologies allows a software-centric design flow.
Subjects/Keywords: FPGA; Dataflow; Multicore; Zynq; Parallel computing; Hardware acceleration; Image Processing; Programmable
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Siddiqui, F. M. (2018). FPGA-based programmable embedded platform for image processing applications. (Doctoral Dissertation). Queen's University Belfast. Retrieved from https://pure.qub.ac.uk/portal/en/theses/fpgabased-programmable-embedded-platform-for-image-processing-applications(a59d226f-253a-475b-b064-fd79359dcccb).html ; https://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.766276
Chicago Manual of Style (16th Edition):
Siddiqui, Fahad Manzoor. “FPGA-based programmable embedded platform for image processing applications.” 2018. Doctoral Dissertation, Queen's University Belfast. Accessed December 09, 2019.
https://pure.qub.ac.uk/portal/en/theses/fpgabased-programmable-embedded-platform-for-image-processing-applications(a59d226f-253a-475b-b064-fd79359dcccb).html ; https://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.766276.
MLA Handbook (7th Edition):
Siddiqui, Fahad Manzoor. “FPGA-based programmable embedded platform for image processing applications.” 2018. Web. 09 Dec 2019.
Vancouver:
Siddiqui FM. FPGA-based programmable embedded platform for image processing applications. [Internet] [Doctoral dissertation]. Queen's University Belfast; 2018. [cited 2019 Dec 09].
Available from: https://pure.qub.ac.uk/portal/en/theses/fpgabased-programmable-embedded-platform-for-image-processing-applications(a59d226f-253a-475b-b064-fd79359dcccb).html ; https://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.766276.
Council of Science Editors:
Siddiqui FM. FPGA-based programmable embedded platform for image processing applications. [Doctoral Dissertation]. Queen's University Belfast; 2018. Available from: https://pure.qub.ac.uk/portal/en/theses/fpgabased-programmable-embedded-platform-for-image-processing-applications(a59d226f-253a-475b-b064-fd79359dcccb).html ; https://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.766276

Virginia Tech
23.
Odom, Jacob Henry.
Indexing Large Permutations in Hardware.
Degree: MS, Computer Engineering, 2019, Virginia Tech
URL: http://hdl.handle.net/10919/89906
► In computing, some applications need the ability to shuffle or rearrange items based on run time information during their normal operations. A similar task is…
(more)
▼ In computing, some applications need the ability to shuffle or rearrange items based on run time information during their normal operations. A similar task is a partial shuffle where only an information dependent selection of the total items is returned in a shuffled order. Initially, there may be the assumption that these are trivial tasks. However, the applications that rely on this ability are typically related to security which requires repeatable, unbiased operations. These requirements quickly turn seemingly simple tasks to complex. Worse, often they are done incorrectly and only appear to meet these requirements, which has disastrous implications for security. A current and dominating method to shuffle items that meets these requirements was developed over fifty years ago and is based on an even older algorithm refer to as Fisher-Yates, after its original authors. Fisher-Yates based methods shuffle items in memory, which is seen as advantageous in software but only serves as a disadvantage in
hardware since memory access is significantly slower than other operations. Additionally, when performing a partial shuffle, Fisher-Yates methods require the same resources as when performing a complete shuffle. This is due to the fact that, with Fisher-Yates methods, each element in a shuffle is dependent on all of the other elements. Alternate methods to meet these requirements are known but are only able to shuffle a very small number of items before they become too slow for practical use. To combat the disadvantages current methods of shuffling possess, this thesis proposes an alternate approach to performing shuffles. This alternate approach meets the previously stated requirements while outperforming current methods. This alternate approach is also able to be extended to shuffling any number of items while maintaining a useable level of performance. Further, unlike current popular shuffling methods, the proposed method has no inter-item dependency and thus offers great advantages over current popular methods with partial shuffles.
Advisors/Committee Members: Athanas, Peter M. (committeechair), Martin, Thomas L. (committee member), Tront, Joseph G. (committee member).
Subjects/Keywords: permutations; combinatorics; hardware acceleration; Fisher-Yates; Knuth-Shuffle
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Odom, J. H. (2019). Indexing Large Permutations in Hardware. (Masters Thesis). Virginia Tech. Retrieved from http://hdl.handle.net/10919/89906
Chicago Manual of Style (16th Edition):
Odom, Jacob Henry. “Indexing Large Permutations in Hardware.” 2019. Masters Thesis, Virginia Tech. Accessed December 09, 2019.
http://hdl.handle.net/10919/89906.
MLA Handbook (7th Edition):
Odom, Jacob Henry. “Indexing Large Permutations in Hardware.” 2019. Web. 09 Dec 2019.
Vancouver:
Odom JH. Indexing Large Permutations in Hardware. [Internet] [Masters thesis]. Virginia Tech; 2019. [cited 2019 Dec 09].
Available from: http://hdl.handle.net/10919/89906.
Council of Science Editors:
Odom JH. Indexing Large Permutations in Hardware. [Masters Thesis]. Virginia Tech; 2019. Available from: http://hdl.handle.net/10919/89906

Brno University of Technology
24.
Bareš, Jan.
Návrh protokolu hardwarového akcelerátoru náročných výpočtů nad více jádry
.
Degree: 2018, Brno University of Technology
URL: http://hdl.handle.net/11012/80760
► Práce se zabývá návrhem komunikačního protokolu, který má umožnit přenos dat mezi řídicím počítačem a výpočetními jádry, implementovanými na čipy FPGA. Účelem komunikace je urychlení…
(more)
▼ Práce se zabývá návrhem komunikačního protokolu, který má umožnit přenos dat mezi řídicím počítačem a výpočetními jádry, implementovanými na čipy FPGA. Účelem komunikace je urychlení výpočetně náročných softwarových algoritmů pro neproudové zpracování dat jejich hardwarovým výpočtem v akceleračním systému. Práce definuje terminologii použitou pro návrh protokolu a analyzuje současná řešení vymezeného problému. Poté práce provádí návrh struktury vlastního akceleračního systému a návrh komunikačnímu protokolu. v hlavní části práce popisuje implementaci protokolu provedenou v jazyku VHDL a simulaci implementovaných modulů. Na závěr uvádí způsob aplikace navrženého řešení a diskutuje možnosti rozšíření této práce.; This work deals with design of communication protocol for data transmission between control computer and computing cores implemented on FPGA chips. The purpose of the communication is speeding the performance demanding software algorithms of non-stream data processing by their
hardware computation on accelerating system. The work defines a terminology used for protocol design and analyses current solutions of given issue. After that the work designs structure of the accelerating system and communication protocol. In the main part the work describes the implementation of the protocol in VHDL language and the simulation of implemented modules. At the end of the work the aplication of designed solution is presented along with possible extension of this work.
Advisors/Committee Members: Šťáva, Martin (advisor).
Subjects/Keywords: Hardwarová akcelerace;
urychlovač;
akcelerační systém;
FPGA;
návrh protokolu;
komunikační protokol;
Hardware acceleration;
accelerator;
acceleration system;
FPGA;
design of protocol;
communication protocol
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Bareš, J. (2018). Návrh protokolu hardwarového akcelerátoru náročných výpočtů nad více jádry
. (Thesis). Brno University of Technology. Retrieved from http://hdl.handle.net/11012/80760
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
Chicago Manual of Style (16th Edition):
Bareš, Jan. “Návrh protokolu hardwarového akcelerátoru náročných výpočtů nad více jádry
.” 2018. Thesis, Brno University of Technology. Accessed December 09, 2019.
http://hdl.handle.net/11012/80760.
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
MLA Handbook (7th Edition):
Bareš, Jan. “Návrh protokolu hardwarového akcelerátoru náročných výpočtů nad více jádry
.” 2018. Web. 09 Dec 2019.
Vancouver:
Bareš J. Návrh protokolu hardwarového akcelerátoru náročných výpočtů nad více jádry
. [Internet] [Thesis]. Brno University of Technology; 2018. [cited 2019 Dec 09].
Available from: http://hdl.handle.net/11012/80760.
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
Council of Science Editors:
Bareš J. Návrh protokolu hardwarového akcelerátoru náročných výpočtů nad více jádry
. [Thesis]. Brno University of Technology; 2018. Available from: http://hdl.handle.net/11012/80760
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation

University of Illinois – Chicago
25.
Di Tucci, Lorenzo.
Efficient High Performance FPGA-Based Applications Design via SDAccel.
Degree: 2016, University of Illinois – Chicago
URL: http://hdl.handle.net/10027/21325
► Custom hardware accelerators are widely used to improve the performance of software appli- cations in terms of execution times and to reduce energy consumption. However,…
(more)
▼ Custom
hardware accelerators are widely used to improve the performance of software appli- cations in terms of execution times and to reduce energy consumption. However, the realization of a
hardware accelerator and its integration into the final system is a difficult and error prone process. For this reason, both industry and academy are continuously developing Computer Aided Design (CAD) tools to assist the designer in the development process. Although many of the steps of the design are now automated, system integration, SW/HW interfaces definition and drivers generation are still almost completely manual tasks. The latest tool by Xilinx how- ever, aims at improving the
hardware design experience by automating the majority of the steps in the design flow and by leveraging the OpenCL standard to enhance the overall productivity and to enable code portability.
This work provides an analysis and an overview of the new Xilinx SDAccel framework, comparing its design flow to other state of the art frameworks. In this context we use this tool to accelerate two case studies from the bioinformatics field. The first case study concerns pairwise alignment and the second one the protein folding problem. The work is organized as follows:
• We start with an introduction to our work, followed by a brief introduction to the context and our contributions in Chapter 1.• Chapter 2 gives the reader an overview of Field Programmable Gate Arrays (FPGAs), followed by an introduction to the
Hardware Design Flow (HDF). The chapter ends with a theoretical introduction of the two case studies that we developed.
• Chapter 3 describes some state of the art tools used in the design of
hardware applications, comparing them and highlighting the main features of each.
• Chapter 4 analyzes the problem presented in this dissertation. It starts by describing the design for HPC and ends by talking of how new CAD tools aims at automating the steps of the
hardware design flow.
• Chapter 5 introduces the tool we used to accelerate our case studies: SDAccel. It starts with an introduction to the framework, then it introduces its architecture and its main features. Finally, it discusses how we faced the problems of the tool and our contributions to its development.
• Chapter 6 describes how we decided to accelerate the two case studies introduced in Chapter 2. It explains the architectural choices that we’ve made, as well as the reasons that lead us to choose them.
• Chapter 7, presents the results of the two case studies, the experimental settings and the comparison of our result with state of the art implementation of the same algorithm.
• Finally, in Chapter 8 we draw the conclusions of this work and we provide some insights into possible future work.
Advisors/Committee Members: Rao, Wejing (advisor).
Subjects/Keywords: FPGA; SDAccel; EDA; CAD; Smith-Waterman; Protein Folding; Hardware Acceleration; Hardware Design Flow; High Level Synthesis; System Level Design; Hardware Architecture; Custom Hardware Accelerators
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Di Tucci, L. (2016). Efficient High Performance FPGA-Based Applications Design via SDAccel. (Thesis). University of Illinois – Chicago. Retrieved from http://hdl.handle.net/10027/21325
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
Chicago Manual of Style (16th Edition):
Di Tucci, Lorenzo. “Efficient High Performance FPGA-Based Applications Design via SDAccel.” 2016. Thesis, University of Illinois – Chicago. Accessed December 09, 2019.
http://hdl.handle.net/10027/21325.
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
MLA Handbook (7th Edition):
Di Tucci, Lorenzo. “Efficient High Performance FPGA-Based Applications Design via SDAccel.” 2016. Web. 09 Dec 2019.
Vancouver:
Di Tucci L. Efficient High Performance FPGA-Based Applications Design via SDAccel. [Internet] [Thesis]. University of Illinois – Chicago; 2016. [cited 2019 Dec 09].
Available from: http://hdl.handle.net/10027/21325.
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
Council of Science Editors:
Di Tucci L. Efficient High Performance FPGA-Based Applications Design via SDAccel. [Thesis]. University of Illinois – Chicago; 2016. Available from: http://hdl.handle.net/10027/21325
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation

Brno University of Technology
26.
Novotňák, Jiří.
Hardwarová akcelerace šifrování síťového provozu
.
Degree: 2010, Brno University of Technology
URL: http://hdl.handle.net/11012/54260
► Cílem této práce je navrhnout a implementovat vyskorychlostní šifrátor síťového provozus propustností 10Gb/s v jednom směru. Implementační platformou je FPGA Xilinx Virtex5vlx155t umístěné na kartě…
(more)
▼ Cílem této práce je navrhnout a implementovat vyskorychlostní šifrátor síťového provozus propustností 10Gb/s v jednom směru. Implementační platformou je FPGA Xilinx Virtex5vlx155t umístěné na kartě COMBOv2-LXT. Šifrování je založeno na algoritmu AESs použitím 128 bitového klíče. Zabezpečený protokol je použit ESP pracující nad protokolem IPv4. Design je plně syntetizovatelný nástrojem Xilinx ISE 11.3, bohužel se jej však nepodařilo prakticky otestovat na reálném
hardware. Úspěšné testy byly provedeny v simulaci.; The aim of this thesis is to draft and implement high-speed encryptor of network trafic with throughput 10Gb/s in one way. It has been implementated for FPGA Xilinx Virtex5vlx155t placed on card COMBOv2-LXT. The encryption is based on AES algorithm using 128 bit key length. The security protokol is ESP in version for protokol IPv4. Design is fully synthesizable with tool Xilinx ISE 11.3, however it is not tested on real
hardware. Tests in simulation works fine.
Advisors/Committee Members: Žádník, Martin (advisor).
Subjects/Keywords: Hardware;
akcelerace;
šifrování;
AES;
FPGA;
VHDL;
IPSEC;
ESP;
Hardware;
acceleration;
encryption;
AES;
FPGA;
VHDL;
IPSEC;
ESP
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Novotňák, J. (2010). Hardwarová akcelerace šifrování síťového provozu
. (Thesis). Brno University of Technology. Retrieved from http://hdl.handle.net/11012/54260
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
Chicago Manual of Style (16th Edition):
Novotňák, Jiří. “Hardwarová akcelerace šifrování síťového provozu
.” 2010. Thesis, Brno University of Technology. Accessed December 09, 2019.
http://hdl.handle.net/11012/54260.
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
MLA Handbook (7th Edition):
Novotňák, Jiří. “Hardwarová akcelerace šifrování síťového provozu
.” 2010. Web. 09 Dec 2019.
Vancouver:
Novotňák J. Hardwarová akcelerace šifrování síťového provozu
. [Internet] [Thesis]. Brno University of Technology; 2010. [cited 2019 Dec 09].
Available from: http://hdl.handle.net/11012/54260.
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
Council of Science Editors:
Novotňák J. Hardwarová akcelerace šifrování síťového provozu
. [Thesis]. Brno University of Technology; 2010. Available from: http://hdl.handle.net/11012/54260
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation

Texas A&M University
27.
Gulati, Kanupriya.
Hardware Acceleration of Electronic Design Automation Algorithms.
Degree: 2010, Texas A&M University
URL: http://hdl.handle.net/1969.1/ETD-TAMU-2009-12-7471
► With the advances in very large scale integration (VLSI) technology, hardware is going parallel. Software, which was traditionally designed to execute on single core microprocessors,…
(more)
▼ With the advances in very large scale integration (VLSI) technology,
hardware is going
parallel. Software, which was traditionally designed to execute on single core microprocessors,
now faces the tough challenge of taking advantage of this parallelism, made available
by the scaling of
hardware. The work presented in this dissertation studies the
acceleration
of electronic design automation (EDA) software on several
hardware platforms such
as custom integrated circuits (ICs), field programmable gate arrays (FPGAs) and graphics
processors. This dissertation concentrates on a subset of EDA algorithms which are heavily
used in the VLSI design flow, and also have varying degrees of inherent parallelism
in them. In particular, Boolean satisfiability, Monte Carlo based statistical static timing
analysis, circuit simulation, fault simulation and fault table generation are explored. The
architectural and performance tradeoffs of implementing the above applications on these
alternative platforms (in comparison to their implementation on a single core microprocessor)
are studied. In addition, this dissertation also presents an automated approach to
accelerate uniprocessor code using a graphics processing unit (GPU). The key idea is to
partition the software application into kernels in an automated fashion, such that multiple
instances of these kernels, when executed in parallel on the GPU, can maximally benefit
from the GPU?s
hardware resources.
The work presented in this dissertation demonstrates that several EDA algorithms can
be successfully rearchitected to maximally harness their performance on alternative platforms
such as custom designed ICs, FPGAs and graphic processors, and obtain speedups upto 800X. The approaches in this dissertation collectively aim to contribute towards enabling
the computer aided design (CAD) community to accelerate EDA algorithms on arbitrary
hardware platforms.
Advisors/Committee Members: Khatri, Sunil (advisor), Walker, Hank (committee member), Li, Peng (committee member), Ji, Jim (committee member), Kirkpatrick, Desmond (committee member).
Subjects/Keywords: Hardware Acceleration; Graphics Processing Units; FPGA; Custom IC; Boolean Satisfiability; Fault Simulation
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Gulati, K. (2010). Hardware Acceleration of Electronic Design Automation Algorithms. (Thesis). Texas A&M University. Retrieved from http://hdl.handle.net/1969.1/ETD-TAMU-2009-12-7471
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
Chicago Manual of Style (16th Edition):
Gulati, Kanupriya. “Hardware Acceleration of Electronic Design Automation Algorithms.” 2010. Thesis, Texas A&M University. Accessed December 09, 2019.
http://hdl.handle.net/1969.1/ETD-TAMU-2009-12-7471.
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
MLA Handbook (7th Edition):
Gulati, Kanupriya. “Hardware Acceleration of Electronic Design Automation Algorithms.” 2010. Web. 09 Dec 2019.
Vancouver:
Gulati K. Hardware Acceleration of Electronic Design Automation Algorithms. [Internet] [Thesis]. Texas A&M University; 2010. [cited 2019 Dec 09].
Available from: http://hdl.handle.net/1969.1/ETD-TAMU-2009-12-7471.
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
Council of Science Editors:
Gulati K. Hardware Acceleration of Electronic Design Automation Algorithms. [Thesis]. Texas A&M University; 2010. Available from: http://hdl.handle.net/1969.1/ETD-TAMU-2009-12-7471
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation

NSYSU
28.
Wu, Bo-sheng.
Acceleration of Image Feature Extraction Algorithms.
Degree: Master, Computer Science and Engineering, 2014, NSYSU
URL: http://etd.lib.nsysu.edu.tw/ETD-db/ETD-search/view_etd?URN=etd-0810114-020324
► The description of local features of images has been successfully applied to many areas, including wide baseline matching, object recognition, texture recognition, image retrieval, robot…
(more)
▼ The description of local features of images has been successfully applied to many areas, including wide baseline matching, object recognition, texture recognition, image retrieval, robot localization, video data mining, etc. However, pure software implementations usually cannot achieve the requirement of real-time processing. In this thesis, we present software
acceleration of general-purpose computing on graphics processing units (GPGPU) for two popular image feature extraction/description algorithms, Shift-Invariant Feature Transform (SIFT) and Speeded-Up Robust Feature (SURF). Furthermore, several versions of
hardware SURF accelerators are also implemented. The four major parts of SIFT are scale-space extrema detection, keypoint localization, orientation assignment, and keypoint description where scale-space extrema detection and keypoint description, the most critical parts, take most of the total execution time. SURF is composed of four major steps: integral image calculation, fast Hessian detection, orientation assignment, and keypoint description. In terms of software implementation, the computation complexity of SURF is significantly reduced compared with that of SIFT. However,
hardware acceleration of SURF is still required for real time processing requirement. In this thesis, we slightly modify the original SURF algorithms in order to significantly reduce the
hardware complexity for the implementations of fast Hessian detection and keypoint description without sacrificing too much in speed performance. Experimental results of both software and
hardware acceleration are also given and compared.
Advisors/Committee Members: Shiann-Rong Kuang (chair), Jih-Ching Chiu (chair), Chuen-Yau Chen (chair), Shen-Fu Hsiao (committee member), Ming-Chih Chen (chair).
Subjects/Keywords: scale-invariant feature transform; Speeded-Up Robust Feature; hardware acceleration; image feature extraction; OpenCL; GPGPU
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Wu, B. (2014). Acceleration of Image Feature Extraction Algorithms. (Thesis). NSYSU. Retrieved from http://etd.lib.nsysu.edu.tw/ETD-db/ETD-search/view_etd?URN=etd-0810114-020324
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
Chicago Manual of Style (16th Edition):
Wu, Bo-sheng. “Acceleration of Image Feature Extraction Algorithms.” 2014. Thesis, NSYSU. Accessed December 09, 2019.
http://etd.lib.nsysu.edu.tw/ETD-db/ETD-search/view_etd?URN=etd-0810114-020324.
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
MLA Handbook (7th Edition):
Wu, Bo-sheng. “Acceleration of Image Feature Extraction Algorithms.” 2014. Web. 09 Dec 2019.
Vancouver:
Wu B. Acceleration of Image Feature Extraction Algorithms. [Internet] [Thesis]. NSYSU; 2014. [cited 2019 Dec 09].
Available from: http://etd.lib.nsysu.edu.tw/ETD-db/ETD-search/view_etd?URN=etd-0810114-020324.
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
Council of Science Editors:
Wu B. Acceleration of Image Feature Extraction Algorithms. [Thesis]. NSYSU; 2014. Available from: http://etd.lib.nsysu.edu.tw/ETD-db/ETD-search/view_etd?URN=etd-0810114-020324
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation

Penn State University
29.
Chandrashekhar, Anusha.
Acceleration of monocular depth extraction for
images.
Degree: MS, Computer Science and Engineering, 2014, Penn State University
URL: https://etda.libraries.psu.edu/catalog/23442
► This thesis evaluates and profiles a monocular depth estimation algorithm in which depth maps are generated from a single image using a non-parametric depth transfer…
(more)
▼ This thesis evaluates and profiles a monocular depth
estimation algorithm in which depth maps are generated from a
single image using a non-parametric depth transfer approach. 3D
depth from images has a wide range of applications in surveillance,
tracking, robotics and general scene understanding. Recent work
shows that depth can be used as an important cue in visual saliency
in order to distinguish between similar objects. The depth transfer
algorithm is evaluated on the Make3D and NYU datasets and the
relative, logarithmic and RMS errors are evaluated for these
datasets.It is shown that the depth transfer algorithm performs
better than the state-of-the-art depth estimation algorithms. A
multi-core CPU implementation of the depth transfer algorithm is
profiled in order to determine the compute intensive stages in the
algorithm. A Graphics Processing Unit (GPU) architecture using
NVIDIA Compute Unified Device Architecture (CUDA) for accelerating
the execution time of the bottleneck is proposed. The architecture
makes efficient use of the GPU threads and memory which results in
significant speed up. The GPU implementation is compared with the
multi-core CPU implementation and it is shown that the proposed GPU
architecture is capable of accelerating the algorithm by upto 4.3x
(depending on image size) than the CPU-based implementation. A fast
depth estimation technique is proposed to accelerate the
computation of depth of moving objects in a video sequence. This
method achieves significant speedup over the CPU and GPU
implementation of the depth transfer algorithm, with processing
rates that are closer to real time. The depth values from the fast
depth estimator is compared to the ground truth depth values to
show that the RMS error is significantly low and within an
acceptable range.
Subjects/Keywords: Monocular depth extraction; GPU; CUDA; Hardware
acceleration; non-parametric depth; fast depth estimator; computer
vision
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Chandrashekhar, A. (2014). Acceleration of monocular depth extraction for
images. (Masters Thesis). Penn State University. Retrieved from https://etda.libraries.psu.edu/catalog/23442
Chicago Manual of Style (16th Edition):
Chandrashekhar, Anusha. “Acceleration of monocular depth extraction for
images.” 2014. Masters Thesis, Penn State University. Accessed December 09, 2019.
https://etda.libraries.psu.edu/catalog/23442.
MLA Handbook (7th Edition):
Chandrashekhar, Anusha. “Acceleration of monocular depth extraction for
images.” 2014. Web. 09 Dec 2019.
Vancouver:
Chandrashekhar A. Acceleration of monocular depth extraction for
images. [Internet] [Masters thesis]. Penn State University; 2014. [cited 2019 Dec 09].
Available from: https://etda.libraries.psu.edu/catalog/23442.
Council of Science Editors:
Chandrashekhar A. Acceleration of monocular depth extraction for
images. [Masters Thesis]. Penn State University; 2014. Available from: https://etda.libraries.psu.edu/catalog/23442
30.
Irick, Kevin Maurice.
A Configurable Platform for Sensor and Image
Processing.
Degree: PhD, Computer Science and Engineering, 2009, Penn State University
URL: https://etda.libraries.psu.edu/catalog/9989
► Smart Environments are environments that exhibit ambient intelligence to those that interact with them. Smart Environments represent the next generation of pervasive computing enabled by…
(more)
▼ Smart Environments are environments that exhibit
ambient intelligence to those that interact with them. Smart
Environments represent the next generation of pervasive computing
enabled by the proliferation of low cost, high performance,
embedded devices. The enabling technology behind this new paradigm
of pervasive computing is the Smart Sensor. A Smart Sensor is a
device that combines physical sensing apparatus with a
computational entity that allows localized interpretation of sensor
data. In time critical applications the interpretation of the data
can result in an immediate process decision such as alerting mine
occupants of the rapidly decreasing quantity of oxygen in a
corridor. In less time sensitive applications, such as shopper
behavior analysis, a Smart Camera can analyze the behavior of
retail store patrons captured by an onboard CMOS camera and send
statistics to the store manager periodically or by request.
Advances in semiconductor manufacturing technologies have resulted
in extremely small transistor feature sizes, low operating
voltages, and increased operating frequencies. Overall, embedded
semiconductor devices have experienced increasing computational
power in decreasing package sizes. In addition, sensor technology
has seen advancements that yield more reliable, accurate, and
smaller sensors that are easier to interface. Advanced research in
polymer nanowires has resulted in the development of gas sensors,
consisting of nanowire arrays assembled on CMOS wafers that can
detect the presence of volatile compounds at extremely low
concentrations. Moreover, CCD imaging sensors are rapidly being
replaced by cheaper, lower power consuming, faster, and higher
resolution CMOS alternatives. Consequently, the case for Smart
Sensor utility has been proven by numerous and diverse application
scenarios. What's left is the development of novel computational
architectures that implement the intelligence for a wide range of
Smart Sensing applications while adhering to tight footprint
constraints, strict performance requirements, and green power
profiles. Issues in hardware acceleration, algorithm mapping, low
power design, networking, and hardware/software co-design are all
prevalent in this new embedded system paradigm. Of particular
interest, and consequently the focus of this dissertation, are
reconfigurable architectures that can be tuned to leverage the
appropriate amount of computational resources to meet the
performance requirements and power budget of a particular
application. Specifically, this dissertation contributes novel
hardware architectures for gas sensing, face detection, and gender
recognition algorithms that are suitable for deployment on FPGAs.
Further, this dissertation contributes an FPGA programming
framework that reduces the complexities associated with the design
of reconfigurable hardware systems.
Subjects/Keywords: Image Processing; Hardware Acceleration; FPGA
…vi
Chapter 5 Framework for Hardware Accelerator Integration… …51
Hardware Infrastructure… …25
Figure 3-3: Neural Network hardware architecture… …46
Figure 5-1: AlgoFLEX Hardware Infrastructure… …reconfigurable hardware devices that allow the functionality of a logic circuit
to be defined and…
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Irick, K. M. (2009). A Configurable Platform for Sensor and Image
Processing. (Doctoral Dissertation). Penn State University. Retrieved from https://etda.libraries.psu.edu/catalog/9989
Chicago Manual of Style (16th Edition):
Irick, Kevin Maurice. “A Configurable Platform for Sensor and Image
Processing.” 2009. Doctoral Dissertation, Penn State University. Accessed December 09, 2019.
https://etda.libraries.psu.edu/catalog/9989.
MLA Handbook (7th Edition):
Irick, Kevin Maurice. “A Configurable Platform for Sensor and Image
Processing.” 2009. Web. 09 Dec 2019.
Vancouver:
Irick KM. A Configurable Platform for Sensor and Image
Processing. [Internet] [Doctoral dissertation]. Penn State University; 2009. [cited 2019 Dec 09].
Available from: https://etda.libraries.psu.edu/catalog/9989.
Council of Science Editors:
Irick KM. A Configurable Platform for Sensor and Image
Processing. [Doctoral Dissertation]. Penn State University; 2009. Available from: https://etda.libraries.psu.edu/catalog/9989
◁ [1] [2] [3] [4] ▶
.