You searched for +publisher:"Delft University of Technology" +contributor:("Al-Ars, Zaid")
.
Showing records 1 – 30 of
46 total matches.
◁ [1] [2] ▶
No search limiters apply to these results.

Delft University of Technology
1.
Noordsij, Lennart (author).
Parallelization of Variable Rate Decompression for GPU Acceleration.
Degree: 2019, Delft University of Technology
URL: http://resolver.tudelft.nl/uuid:ebf17e05-4d9f-4a73-8c77-c1e7073e932f
► Data movement has been long identified as the biggest challenge facing modern computer systems designers. To tackle this challenge, many novel data compression algorithms have…
(more)
▼ Data movement has been long identified as the biggest challenge facing modern computer systems designers. To tackle this challenge, many novel data compression algorithms have been developed. These compression algorithms can be embedded into bandwidth-bound applications to reduce their memory traffic volume. As a result, data decompression, in many instances, is in the critical path of the application execution, while the compression itself can happen offine or outside of the critical path. Therefore, fast data decompression is of utmost importance. However, most existing parallel decompression schemes adopt a particular parallelization strategy suited for a particular HW platform. Such an approach fails to harness the parallelism found in diverse modern HW architectures. To this end, we propose multiple parallelization strategies for variable rate data decompression. The proposed strategies aim to utilize parallel architectures efficiently. Our strategies are based on generating extra information during the encoding phase, and then passing this information in a side-channel to the decoder. After that, the decoder can use that extra information to speed-up the decoding process tremendously. To demonstrate the effectiveness of our strategies, we implement them in a state-of-the-art compression algorithm called ZFP and apply it on a real-life industrial application from ASML. Our implementation is publicly available on GitHub. This application is a feed-forward control model for controlling wafer heat in EUV lithography machines. The application is dominated by matrix-vector multiplication (which is bandwidth-bound) and is executed on GPUs. We show that parallelization strategies suited for multicore CPUs are different from the ones suited for GPUs. On a CPU, we achieve a near-optimal speedup and an overhead size which is consistently less than 0.04% of the compressed data size. On a GPU, we achieve a decoding throughput of more than 130 GiB/s which allows us to execute the ASML application within the given time budget.
Advisors/Committee Members: Ars%2C%20Zaid%22%29&pagesize-30">
Al-
Ars,
Zaid (mentor),
Delft University of Technology (degree granting institution).
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Noordsij, L. (. (2019). Parallelization of Variable Rate Decompression for GPU Acceleration. (Masters Thesis). Delft University of Technology. Retrieved from http://resolver.tudelft.nl/uuid:ebf17e05-4d9f-4a73-8c77-c1e7073e932f
Chicago Manual of Style (16th Edition):
Noordsij, Lennart (author). “Parallelization of Variable Rate Decompression for GPU Acceleration.” 2019. Masters Thesis, Delft University of Technology. Accessed April 18, 2021.
http://resolver.tudelft.nl/uuid:ebf17e05-4d9f-4a73-8c77-c1e7073e932f.
MLA Handbook (7th Edition):
Noordsij, Lennart (author). “Parallelization of Variable Rate Decompression for GPU Acceleration.” 2019. Web. 18 Apr 2021.
Vancouver:
Noordsij L(. Parallelization of Variable Rate Decompression for GPU Acceleration. [Internet] [Masters thesis]. Delft University of Technology; 2019. [cited 2021 Apr 18].
Available from: http://resolver.tudelft.nl/uuid:ebf17e05-4d9f-4a73-8c77-c1e7073e932f.
Council of Science Editors:
Noordsij L(. Parallelization of Variable Rate Decompression for GPU Acceleration. [Masters Thesis]. Delft University of Technology; 2019. Available from: http://resolver.tudelft.nl/uuid:ebf17e05-4d9f-4a73-8c77-c1e7073e932f

Delft University of Technology
2.
Kulkarni, Rujuta (author).
Exploring Multicore Architectures For Streaming Applications.
Degree: 2019, Delft University of Technology
URL: http://resolver.tudelft.nl/uuid:3cbbd723-5fc8-481a-9dc3-263163589f0e
► The Smith Waterman algorithm is used to perform local alignment on biological sequences by calculating a similarity matrix. This process is computation-intensive. Only the elements…
(more)
▼ The Smith Waterman algorithm is used to perform local alignment on biological sequences by calculating a similarity matrix. This process is computation-intensive. Only the elements along the minor diagonal of the matrix can be calculated in parallel, due to the nature of dependencies present in the algorithm. In the past, CPUs, GPUs and FPGAs have been used to implement the Smith Waterman algorithm. While GPUs offer better performance as compared to FPGAs and are easier to program, they have higher power consumption. The FPGA implementations typically employ systolic arrays, which consist of processing elements connected in a regular manner through which data is streamed. Custom designed processing elements for an FPGA implementation entails a lot of effort. In this thesis, we investigate alternative architectures to provide performance with a lower power profile and ease of programmability. We design a systolic array architecture with general purpose processors and map the Smith Waterman algorithm on it. The design of the systolic array consists of scratchpad memories to store intermediate data. Since employing multiple processors is a common method to extract more performance nowadays, we compare our architecture with a multicore architecture. Simulation results show that the systolic array architecture promises more speedup than the multicore architecture, achieving a performance of up to 1.5MCUPS for 16 processing elements, which is 4x times faster than a 16-processor multicore architecture. Moreover the performance of the systolic array architecture scales well with increasing number of processors as compared to the multicore architecture. Mapping the SW algorithm to the systolic array architecture is possible using only 100 lines of code programmed within 2 person-weeks in C which is a standard, familiar language. Our experiences with mapping the algorithm onto the systolic array architecture show that it could result into a CUDA-like programming paradigm.
Computer Engineering
Advisors/Committee Members: Ars%2C%20Zaid%22%29&pagesize-30">
Al-
Ars,
Zaid (mentor),
Delft University of Technology (degree granting institution).
Subjects/Keywords: Multicore architecture; Systolic array; Smith waterman
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Kulkarni, R. (. (2019). Exploring Multicore Architectures For Streaming Applications. (Masters Thesis). Delft University of Technology. Retrieved from http://resolver.tudelft.nl/uuid:3cbbd723-5fc8-481a-9dc3-263163589f0e
Chicago Manual of Style (16th Edition):
Kulkarni, Rujuta (author). “Exploring Multicore Architectures For Streaming Applications.” 2019. Masters Thesis, Delft University of Technology. Accessed April 18, 2021.
http://resolver.tudelft.nl/uuid:3cbbd723-5fc8-481a-9dc3-263163589f0e.
MLA Handbook (7th Edition):
Kulkarni, Rujuta (author). “Exploring Multicore Architectures For Streaming Applications.” 2019. Web. 18 Apr 2021.
Vancouver:
Kulkarni R(. Exploring Multicore Architectures For Streaming Applications. [Internet] [Masters thesis]. Delft University of Technology; 2019. [cited 2021 Apr 18].
Available from: http://resolver.tudelft.nl/uuid:3cbbd723-5fc8-481a-9dc3-263163589f0e.
Council of Science Editors:
Kulkarni R(. Exploring Multicore Architectures For Streaming Applications. [Masters Thesis]. Delft University of Technology; 2019. Available from: http://resolver.tudelft.nl/uuid:3cbbd723-5fc8-481a-9dc3-263163589f0e

Delft University of Technology
3.
Helmiriawan, Helmi (author).
Scalability Analysis of Predictive Maintenance Using Machine Learning in Oil Refineries.
Degree: 2018, Delft University of Technology
URL: http://resolver.tudelft.nl/uuid:dbf3e77c-e624-47ef-b951-3f1948b1609a
► Modern refineries typically use a high number of sensors that generate an enormous amount of data about the condition of the plants. This generated data…
(more)
▼ Modern refineries typically use a high number of sensors that generate an enormous amount of data about the condition of the plants. This generated data can be used to perform predictive maintenance, an approach to predict impending failures and mitigate downtime in refineries. This research analyzes the scalability of machine learning methods for predictive maintenance solution in an oil refinery. It can be done by modeling the normal behavior of the plant and use the prediction error to identify anomalies which might potentially become failures. Several methods and learning algorithms are explored in this research to model the normal behavior of multiple components in the plant. The experiments are performed by using historical process data from a crude distiller unit at Shell Pernis Refinery. The results show that the proposed approach using multiple targets model is able to predict multiple components in the plant. It is not only able to detect anomalies but also identify the faulty component. Furthermore, it reduces the required time to model the normal behavior of the plant which improves the scalability of the predictive maintenance approach in the refinery.
Computer Science
Advisors/Committee Members: Ars%2C%20Zaid%22%29&pagesize-30">
Al-
Ars,
Zaid (mentor),
Delft University of Technology (degree granting institution).
Subjects/Keywords: Machine Learning; Predictive Maintenance; Anomaly Detection; Deep Learning
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Helmiriawan, H. (. (2018). Scalability Analysis of Predictive Maintenance Using Machine Learning in Oil Refineries. (Masters Thesis). Delft University of Technology. Retrieved from http://resolver.tudelft.nl/uuid:dbf3e77c-e624-47ef-b951-3f1948b1609a
Chicago Manual of Style (16th Edition):
Helmiriawan, Helmi (author). “Scalability Analysis of Predictive Maintenance Using Machine Learning in Oil Refineries.” 2018. Masters Thesis, Delft University of Technology. Accessed April 18, 2021.
http://resolver.tudelft.nl/uuid:dbf3e77c-e624-47ef-b951-3f1948b1609a.
MLA Handbook (7th Edition):
Helmiriawan, Helmi (author). “Scalability Analysis of Predictive Maintenance Using Machine Learning in Oil Refineries.” 2018. Web. 18 Apr 2021.
Vancouver:
Helmiriawan H(. Scalability Analysis of Predictive Maintenance Using Machine Learning in Oil Refineries. [Internet] [Masters thesis]. Delft University of Technology; 2018. [cited 2021 Apr 18].
Available from: http://resolver.tudelft.nl/uuid:dbf3e77c-e624-47ef-b951-3f1948b1609a.
Council of Science Editors:
Helmiriawan H(. Scalability Analysis of Predictive Maintenance Using Machine Learning in Oil Refineries. [Masters Thesis]. Delft University of Technology; 2018. Available from: http://resolver.tudelft.nl/uuid:dbf3e77c-e624-47ef-b951-3f1948b1609a

Delft University of Technology
4.
Suursalu, Sander (author).
Predictive Maintenance Using Machine Learning Methods in Petrochemical Refineries.
Degree: 2017, Delft University of Technology
URL: http://resolver.tudelft.nl/uuid:e95f39a4-569a-470e-a431-962b9766a302
► This research project evaluates the suitability of machine learning methods for early fault prediction and predictive maintenance in petrochemical refineries based on real- life use…
(more)
▼ This research project evaluates the suitability of machine learning methods for early fault prediction and predictive maintenance in petrochemical refineries based on real- life use cases at Shell Pernis. Refineries are mature industrial installations, however, unplanned shutdowns still occur due to equipment failures. Refineries have petabytes of process control data available from the past years, however, all of that data is unla- belled. The goal of this research project was to evaluate, whether useful information can be extracted from the process control data. The resulting approach had to be compatible with Shell IT, scalable to larger sections of the refinery, reusable in other parts of the refinery and capable of detecting the components that cause the potential faults. During this research project, multiple solutions based on artificial neural net- works and statistical approaches were implemented to model the normal behaviour of the monitored systems. Abnormal predictions for the modelled systems were then used to predict failures in advance, where the prediction horizon reached more than a month for some use cases. 4-layer GRUs with tanh activation functions and an input sequence length of 4 samples provided the best results. GRUs were 7% faster to train than LSTMs while reducing the prediction error by 15%. Furthermore, the predic- tion error was less than 3% for the normal operating conditions while reaching more than 15% prior to failures. Therefore, machine learning models can predict failures in petrochemical refineries without any industry-specific knowledge, if the model is trained with clean data that does not contain abnormal time series.
Advisors/Committee Members: Ars%2C%20Zaid%22%29&pagesize-30">
Al-
Ars,
Zaid (mentor),
Delft University of Technology (degree granting institution).
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Suursalu, S. (. (2017). Predictive Maintenance Using Machine Learning Methods in Petrochemical Refineries. (Masters Thesis). Delft University of Technology. Retrieved from http://resolver.tudelft.nl/uuid:e95f39a4-569a-470e-a431-962b9766a302
Chicago Manual of Style (16th Edition):
Suursalu, Sander (author). “Predictive Maintenance Using Machine Learning Methods in Petrochemical Refineries.” 2017. Masters Thesis, Delft University of Technology. Accessed April 18, 2021.
http://resolver.tudelft.nl/uuid:e95f39a4-569a-470e-a431-962b9766a302.
MLA Handbook (7th Edition):
Suursalu, Sander (author). “Predictive Maintenance Using Machine Learning Methods in Petrochemical Refineries.” 2017. Web. 18 Apr 2021.
Vancouver:
Suursalu S(. Predictive Maintenance Using Machine Learning Methods in Petrochemical Refineries. [Internet] [Masters thesis]. Delft University of Technology; 2017. [cited 2021 Apr 18].
Available from: http://resolver.tudelft.nl/uuid:e95f39a4-569a-470e-a431-962b9766a302.
Council of Science Editors:
Suursalu S(. Predictive Maintenance Using Machine Learning Methods in Petrochemical Refineries. [Masters Thesis]. Delft University of Technology; 2017. Available from: http://resolver.tudelft.nl/uuid:e95f39a4-569a-470e-a431-962b9766a302

Delft University of Technology
5.
Bhosale, Parag (author).
GPU based image registration.
Degree: 2017, Delft University of Technology
URL: http://resolver.tudelft.nl/uuid:740d4ebd-2436-4b1a-a8db-cb1e0d56a980
► Currently, non-rigid image registration algorithms are too time intensive to use in time-critical applications. To solve this problem, stochastic gradient descent (SGD) has been implemented…
(more)
▼ Currently, non-rigid image registration algorithms are too time intensive to use in time-critical applications. To solve this problem, stochastic gradient descent (SGD) has been implemented in image registration. But, SGD depends on manual step size selection which is dicult and time consuming. To avoid such manual selection, SGD has been improved further by using adaptive stochastic gradient descent (ASGD) and fast adaptive stochastic gradient descent (FASGD) to select an optimal step size automatically. Although FASGD has reduced the computation time drastically, non-rigid registration still cannot be used in time critical applications. So far, a serial implementation of FASGD has been tested on CPU architecture in elastix toolbox. Thus, a parallel implementation of SGD can be a possible solution to this problem. The work proposed in this thesis implemented a NiftyReg toolbox extension to graphic processing units (GPUs), divided into two methods. First, NiftyReg2, a possible optimization of the current NiftyReg. Second, NiftyRegSGD, a high performance implementation of SGD on the GPU framework of NiftyReg. A novel sampling strategy, random chunk sampling is also proposed which is tailored to the GPU architecture. Random chunk sampling is an optimization to utilize memory bandwidth of GPU eectively to increase the throughput of CUDA kernels. Experiments have been performed on 3D lung CT data of 19 patients, which compared NiftyRegSGD (with and without random chunk sampler) with CPU-based elastix FASGD and NiftyReg. The registration runtime was 21.5s, 13.02s, 4.4s and 2.8s for elastix-FASGD, NiftyReg2, NiftyRegSGD without, and NiftyRegSGD with random chunk sampling, respectively, while similar accuracy was obtained. Thus, proposed GPU based non-rigid registration can be used for a time critical application with further extensions. The abstract which discusses the work done during this thesis has been accepted for publication in the medical imaging conference of the Society of Photographic Instrumentation Engineers (SPIE).
Embedded Systems
Advisors/Committee Members: Ars%2C%20Zaid%22%29&pagesize-30">
Al-
Ars,
Zaid (mentor),
Delft University of Technology (degree granting institution).
Subjects/Keywords: GPGPU; Image registratiion; Image processing; stochastic gradient; memory access optimization
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Bhosale, P. (. (2017). GPU based image registration. (Masters Thesis). Delft University of Technology. Retrieved from http://resolver.tudelft.nl/uuid:740d4ebd-2436-4b1a-a8db-cb1e0d56a980
Chicago Manual of Style (16th Edition):
Bhosale, Parag (author). “GPU based image registration.” 2017. Masters Thesis, Delft University of Technology. Accessed April 18, 2021.
http://resolver.tudelft.nl/uuid:740d4ebd-2436-4b1a-a8db-cb1e0d56a980.
MLA Handbook (7th Edition):
Bhosale, Parag (author). “GPU based image registration.” 2017. Web. 18 Apr 2021.
Vancouver:
Bhosale P(. GPU based image registration. [Internet] [Masters thesis]. Delft University of Technology; 2017. [cited 2021 Apr 18].
Available from: http://resolver.tudelft.nl/uuid:740d4ebd-2436-4b1a-a8db-cb1e0d56a980.
Council of Science Editors:
Bhosale P(. GPU based image registration. [Masters Thesis]. Delft University of Technology; 2017. Available from: http://resolver.tudelft.nl/uuid:740d4ebd-2436-4b1a-a8db-cb1e0d56a980

Delft University of Technology
6.
Lu, Yun (author).
Enabling Big Data Analytics For MATLAB Programs Using High Performance Compute Methods.
Degree: 2018, Delft University of Technology
URL: http://resolver.tudelft.nl/uuid:88ece5de-f233-4c13-a940-f5c862a9b154
► In this work, a possible solution to allow for scalable MATLAB deployment on big data clusters through Spark without using the official MATLAB toolbox is…
(more)
▼ In this work, a possible solution to allow for scalable MATLAB deployment on big data clusters through Spark without using the official MATLAB toolbox is introduced. Other possible solutions that can be used for accelerating existing MATLAB code including calling modules written by Graphics Processing Unit (GPU) and Python Pool with multiprocessors are also investigated in this thesis. Among these approaches, Spark solution is achieved by accessing to PySpark through Python. Instead of using distributed computing server of MATLAB that is necessary for the official Spark approach in the newest version, our approach is low-cost, easy to set up, flexible and general enough to handle changes, and enable for scaling up. All the solutions are analyzed for bottlenecks based on their performance in initialization, memory transfer, data conversion and computational throughput. Our analysis shows that initialization \& memory transfer for GPU, data conversion for Python/Pyspark when the data input or output has high dimensions can be bottlenecks. For use case analysis, a medical image registration MATLAB application using NCC was accelerated by multiple solutions. The results indicate that GPU and PySpark using cluster have the best performance, which was 5.7x and 7.8x faster than MATLAB with Pool performance. Based on the overall performance of these solutions, a decision tree for the most optimal solution to choose is built for the future research.
Computer Engineering
Advisors/Committee Members: Ars%2C%20Zaid%22%29&pagesize-30">
Al-
Ars,
Zaid (mentor),
Delft University of Technology (degree granting institution).
Subjects/Keywords: MATLAB; Image processing; Big Data; Spark
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Lu, Y. (. (2018). Enabling Big Data Analytics For MATLAB Programs Using High Performance Compute Methods. (Masters Thesis). Delft University of Technology. Retrieved from http://resolver.tudelft.nl/uuid:88ece5de-f233-4c13-a940-f5c862a9b154
Chicago Manual of Style (16th Edition):
Lu, Yun (author). “Enabling Big Data Analytics For MATLAB Programs Using High Performance Compute Methods.” 2018. Masters Thesis, Delft University of Technology. Accessed April 18, 2021.
http://resolver.tudelft.nl/uuid:88ece5de-f233-4c13-a940-f5c862a9b154.
MLA Handbook (7th Edition):
Lu, Yun (author). “Enabling Big Data Analytics For MATLAB Programs Using High Performance Compute Methods.” 2018. Web. 18 Apr 2021.
Vancouver:
Lu Y(. Enabling Big Data Analytics For MATLAB Programs Using High Performance Compute Methods. [Internet] [Masters thesis]. Delft University of Technology; 2018. [cited 2021 Apr 18].
Available from: http://resolver.tudelft.nl/uuid:88ece5de-f233-4c13-a940-f5c862a9b154.
Council of Science Editors:
Lu Y(. Enabling Big Data Analytics For MATLAB Programs Using High Performance Compute Methods. [Masters Thesis]. Delft University of Technology; 2018. Available from: http://resolver.tudelft.nl/uuid:88ece5de-f233-4c13-a940-f5c862a9b154

Delft University of Technology
7.
Wang, Saiyi (author).
Scaling up the GATK RNA-seq Variant Calling Pipeline with Apache Spark.
Degree: 2018, Delft University of Technology
URL: http://resolver.tudelft.nl/uuid:871a645a-81e4-4686-abc0-53944366c6a4
► Next-generation sequencing (NGS) technology has dramatically increased the availability of RNA-seq data. Though primarily used for novel gene identification, expression quantification, and splice analysis, RNA-seq…
(more)
▼ Next-generation sequencing (NGS) technology has dramatically increased the availability of RNA-seq data. Though primarily used for novel gene identification, expression quantification, and splice analysis, RNA-seq is also a cheap and efficient alternative for variant calling to genome sequencing data. RNA sequencing costs less than genome sequencing. Plus, the variants discovered from RNA-seq data are expressed, which is a desired feature for researchers who want to study the relation between genotype and phenotype. What’s more, variants called in RNA-seq data can be used to validate the discoveries from whole-genome sequencing (WGS) or wholeexome sequencing (WES). The GATK team has adapted the Best Practices pipeline to be able to process RNA-seq data from raw FASTQ reads to variants. However, some components of the pipeline are not optimized to process large datasets efficiently. We have studied several scalable solutions that scale up the DNA-seq Best Practices pipeline in hopes of applying the most efficient framework among them to scaling up the RNA-seq pipeline. We select Spark and implement a parallel RNA-seq variant calling pipeline based on the GATK Best Practices recommendations. Whereas the original sequential pipeline takes ~29 hours to process a dataset of 50 GB with one thread, and ~16 hours with 40 threads on a node with 20 Hyper-Threading cores, our implementation takes only ~2 hours with 16 nodes, each of which has 8 CPU cores without Hyper-Threading. Our implementation is also 24.77% faster than the alternative solution while keeping equally accurate results.
Embedded Systems
Advisors/Committee Members: Ars%2C%20Zaid%22%29&pagesize-30">
Al-
Ars,
Zaid (mentor),
Delft University of Technology (degree granting institution).
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Wang, S. (. (2018). Scaling up the GATK RNA-seq Variant Calling Pipeline with Apache Spark. (Masters Thesis). Delft University of Technology. Retrieved from http://resolver.tudelft.nl/uuid:871a645a-81e4-4686-abc0-53944366c6a4
Chicago Manual of Style (16th Edition):
Wang, Saiyi (author). “Scaling up the GATK RNA-seq Variant Calling Pipeline with Apache Spark.” 2018. Masters Thesis, Delft University of Technology. Accessed April 18, 2021.
http://resolver.tudelft.nl/uuid:871a645a-81e4-4686-abc0-53944366c6a4.
MLA Handbook (7th Edition):
Wang, Saiyi (author). “Scaling up the GATK RNA-seq Variant Calling Pipeline with Apache Spark.” 2018. Web. 18 Apr 2021.
Vancouver:
Wang S(. Scaling up the GATK RNA-seq Variant Calling Pipeline with Apache Spark. [Internet] [Masters thesis]. Delft University of Technology; 2018. [cited 2021 Apr 18].
Available from: http://resolver.tudelft.nl/uuid:871a645a-81e4-4686-abc0-53944366c6a4.
Council of Science Editors:
Wang S(. Scaling up the GATK RNA-seq Variant Calling Pipeline with Apache Spark. [Masters Thesis]. Delft University of Technology; 2018. Available from: http://resolver.tudelft.nl/uuid:871a645a-81e4-4686-abc0-53944366c6a4

Delft University of Technology
8.
Li, Minfeng (author).
Early DNA Analysis Using Incomplete DNA Data.
Degree: 2018, Delft University of Technology
URL: http://resolver.tudelft.nl/uuid:bea21c14-1aa7-4f75-8cd4-b23f17589208
► In the past few years, considerable attention has been paid to reduce the computational time for the analysis of genome data, which eliminated critical computational…
(more)
▼ In the past few years, considerable attention has been paid to reduce the computational time for the analysis of genome data, which eliminated critical computational bottlenecks in the time needed for the analysis of DNA information. However, the analysis of genome data is still facing time consuming challenges due to the slow speed of DNA sequencing machines. DNA sequencing is a time-consuming process that could take days to sequence even a single sample. This limits the speed of existing DNA analysis methods since they all need to wait for getting the full sequenced DNA data before they start the analysis. As a result, DNA analysis pipelines are not able to benefit from the reduced computational analysis time. Recently, a new method called early DNA analysis was introduced where the genome analysis pipeline is started with incomplete DNA data before all DNA sequencing finishes, which opens the door to decrease the total time consumption of DNA analysis including the sequencing time. In this thesis, a parallel implementation of the early DNA analysis approach based on the Apache Spark big data framework is proposed to improve its performance. Besides, using incomplete DNA data sets brings also a slight drop of the accuracy in genome analysis. The original method proposed a few simple methods to complete the unknown DNA data, but these can be improved to increase the accuracy. Therefore, a few new algorithms are also proposed and tested to increase accuracy in this thesis. Results show that the proposed scalability solution towards early DNA analysis could achieve a 7.6× speed-up with 97.48% correctness when deployed on a 4-node Power7+ cluster, while one of the advanced completion algorithms could increase the classification accuracy for unknown DNA data by 0.006%.
Embedded Systems
Advisors/Committee Members: Ars%2C%20Zaid%22%29&pagesize-30">
Al-
Ars,
Zaid (mentor),
Delft University of Technology (degree granting institution).
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Li, M. (. (2018). Early DNA Analysis Using Incomplete DNA Data. (Masters Thesis). Delft University of Technology. Retrieved from http://resolver.tudelft.nl/uuid:bea21c14-1aa7-4f75-8cd4-b23f17589208
Chicago Manual of Style (16th Edition):
Li, Minfeng (author). “Early DNA Analysis Using Incomplete DNA Data.” 2018. Masters Thesis, Delft University of Technology. Accessed April 18, 2021.
http://resolver.tudelft.nl/uuid:bea21c14-1aa7-4f75-8cd4-b23f17589208.
MLA Handbook (7th Edition):
Li, Minfeng (author). “Early DNA Analysis Using Incomplete DNA Data.” 2018. Web. 18 Apr 2021.
Vancouver:
Li M(. Early DNA Analysis Using Incomplete DNA Data. [Internet] [Masters thesis]. Delft University of Technology; 2018. [cited 2021 Apr 18].
Available from: http://resolver.tudelft.nl/uuid:bea21c14-1aa7-4f75-8cd4-b23f17589208.
Council of Science Editors:
Li M(. Early DNA Analysis Using Incomplete DNA Data. [Masters Thesis]. Delft University of Technology; 2018. Available from: http://resolver.tudelft.nl/uuid:bea21c14-1aa7-4f75-8cd4-b23f17589208

Delft University of Technology
9.
Qiu, Tongdong (author).
GPU Acceleration of DNA Alignment Algorithms of Long Reads for DNA Assembly.
Degree: 2018, Delft University of Technology
URL: http://resolver.tudelft.nl/uuid:589a0a2c-cfec-434c-8740-651b5ae5cc40
► Third generation sequencing machines produce reads with tens of thousands of base pairs. To perform de novo assembly, all reads must be compared with every…
(more)
▼ Third generation sequencing machines produce reads with tens of thousands of base pairs. To perform de novo assembly, all reads must be compared with every other read to find overlaps. Finding overlaps with the optimal Smith-Waterman is not feasible, since the complexity of Smith-Waterman is quadratic with the length of the reads. Heuristics are designed be faster, but are not guaranteed to give the optimal solution. Two heuristic DNA aligners are Daligner and Darwin. Daligner uses an edit graph based algorithm that has an O(ND) complexity, where N is the read length, and D the number of differences between the two aligned reads. Darwin creates overlapping tiles to search promising areas of the Smith-Waterman matrix, and is empirically shown to be optimal. This work implements these algorithms on a GPU, and compares the two with respect to sensitivity and specificity. Daligner is not suitable for GPU acceleration, but Darwin has shown speedup of 109x vs 8 CPU threads, using a Tesla K40. The speedup increases to 148x when the Smith-Waterman scores are not calculated. Despite large speedups for Darwin, Daligner is 2-6x faster than Darwin, and slightly more sensitive and specific. An advantage of Darwin is that is produces generally longer overlaps, calculates the Smith-Waterman score, and is able to report the aligned sequences, where Daligner only reports the start and end of the overlap.
Computer Engineering
Advisors/Committee Members: Ars%2C%20Zaid%22%29&pagesize-30">
Al-
Ars,
Zaid (mentor),
Delft University of Technology (degree granting institution).
Subjects/Keywords: GPU; DNA; Alignment; Acceleration; Daligner; Darwin; CUDA
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Qiu, T. (. (2018). GPU Acceleration of DNA Alignment Algorithms of Long Reads for DNA Assembly. (Masters Thesis). Delft University of Technology. Retrieved from http://resolver.tudelft.nl/uuid:589a0a2c-cfec-434c-8740-651b5ae5cc40
Chicago Manual of Style (16th Edition):
Qiu, Tongdong (author). “GPU Acceleration of DNA Alignment Algorithms of Long Reads for DNA Assembly.” 2018. Masters Thesis, Delft University of Technology. Accessed April 18, 2021.
http://resolver.tudelft.nl/uuid:589a0a2c-cfec-434c-8740-651b5ae5cc40.
MLA Handbook (7th Edition):
Qiu, Tongdong (author). “GPU Acceleration of DNA Alignment Algorithms of Long Reads for DNA Assembly.” 2018. Web. 18 Apr 2021.
Vancouver:
Qiu T(. GPU Acceleration of DNA Alignment Algorithms of Long Reads for DNA Assembly. [Internet] [Masters thesis]. Delft University of Technology; 2018. [cited 2021 Apr 18].
Available from: http://resolver.tudelft.nl/uuid:589a0a2c-cfec-434c-8740-651b5ae5cc40.
Council of Science Editors:
Qiu T(. GPU Acceleration of DNA Alignment Algorithms of Long Reads for DNA Assembly. [Masters Thesis]. Delft University of Technology; 2018. Available from: http://resolver.tudelft.nl/uuid:589a0a2c-cfec-434c-8740-651b5ae5cc40

Delft University of Technology
10.
Enthoven, David (author).
Privacy in federated deep learning on medical data.
Degree: 2019, Delft University of Technology
URL: http://resolver.tudelft.nl/uuid:a6f05abc-fe60-446d-a0fc-a1818edd25e2
► With the increasing number of data collectors such as smartphones, immense amounts of data are available. These data have great value for training machine learning…
(more)
▼ With the increasing number of data collectors such as smartphones, immense amounts of data are available. These data have great value for training machine learning models. Federated learning is a distributed machine learning approach that allows a machine learning model to train on a distributed data-set without transferring any data and therefore claims that privacy is preserved. In this thesis, privacy is considered specifically for the use-case of medical data. These are sensitive and distinct for different patients. A step-wise argument as to what constitutes privacy preservation is formulated. This notably requires systems to be able to train on singular samples without compromising their privacy. As such, the federated averaging algorithm (FedAvg) is demonstrated to be critically insecure against certain attack methods. A chosen attack method is used to show how training data is reconstructed with solely the model update. The viability of this attack method is demonstrated to great extend for fully connected neural networks and convolutional neural networks To adhere to the strict privacy formulation, a novel federated learning method is presented in this thesis which is called Locally Encoded Federated Averaging (LEFedAvg). This method works on the premise that a part of the model remains private throughout. Subsequently, it is demonstrated to be usable and how this method allows for collaborative training. The privacy benefits of this federated learning method are empirically shown. The trade-off between performance and privacy is demonstrated and discussed for a more realistic operational setting.
Electrical Engineer | Embedded Systems
Advisors/Committee Members: Ars%2C%20Zaid%22%29&pagesize-30">
Al-
Ars,
Zaid (mentor),
Delft University of Technology (degree granting institution).
Subjects/Keywords: Federater learning; Deep learning; privacy; Model sharing
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Enthoven, D. (. (2019). Privacy in federated deep learning on medical data. (Masters Thesis). Delft University of Technology. Retrieved from http://resolver.tudelft.nl/uuid:a6f05abc-fe60-446d-a0fc-a1818edd25e2
Chicago Manual of Style (16th Edition):
Enthoven, David (author). “Privacy in federated deep learning on medical data.” 2019. Masters Thesis, Delft University of Technology. Accessed April 18, 2021.
http://resolver.tudelft.nl/uuid:a6f05abc-fe60-446d-a0fc-a1818edd25e2.
MLA Handbook (7th Edition):
Enthoven, David (author). “Privacy in federated deep learning on medical data.” 2019. Web. 18 Apr 2021.
Vancouver:
Enthoven D(. Privacy in federated deep learning on medical data. [Internet] [Masters thesis]. Delft University of Technology; 2019. [cited 2021 Apr 18].
Available from: http://resolver.tudelft.nl/uuid:a6f05abc-fe60-446d-a0fc-a1818edd25e2.
Council of Science Editors:
Enthoven D(. Privacy in federated deep learning on medical data. [Masters Thesis]. Delft University of Technology; 2019. Available from: http://resolver.tudelft.nl/uuid:a6f05abc-fe60-446d-a0fc-a1818edd25e2

Delft University of Technology
11.
Gkougkoulias, Konstantinos (author).
Porting and Evaluation of Overlay Architectures for FPGAs with Scientific Kernels.
Degree: 2017, Delft University of Technology
URL: http://resolver.tudelft.nl/uuid:f0dc25f1-9edb-4e87-9154-bafc4da58084
► In recent years due to the slow down of Moores Law and Dennard Scaling, alternative architectures are starting to be used instead of plain CPU…
(more)
▼ In recent years due to the slow down of Moores Law and Dennard Scaling, alternative architectures are starting to be used instead of plain CPU implementations. These new architectures, such as FPGAs and GPUs, offer higher performance to power consumption ratio when compared with a CPU only implementation. But these new approaches have to sacrifice programmability in favor of performance gains. While GPUs are somewhat easily programmableand provide high performance this comes at the cost of high power consumption. FPGA programming on the other hand is a tedious and time consuming task. Specialized personnel is required for this, as their programming requires a background in designing with HDL languages. Furthermore an implementation is specific to a certain algorithm and cannot be used for any other algorithm even if it is slightly different. So if a new algorithm for aparticular task is found then a part of the design process has to be redone. Also designing for FPGAs is a computationally intensive task as the whole design after simulation has to be synthesized and then placed and routed (P&R) for a particular FPGA every time the design changes slightly. This process of mapping the design can take hours or even days to compute for large designs. In recent years developments in High Level Synthesis (HLS) and OpenCL have made the whole process of designing for FPGAs an easier task. But this solution is notwithout problems either as the algorithm has to still be implemented for a specific FPGA device. A solution to the FPGA synthesis and P&R problem has recently been proposed with the name of FPGA Overlay Architectures. The core concept of this idea to abstract the FPGA create a virtual FPGA on top of the underlaying physical one in order to help with configuration and compile time. In this thesis, we investigate available alternative overlay architectures and select the most appropriate architecture for our analysis. We extended the selected architecture to be deployed on alternative FPGA hardware and to work in a shared CPU/FPGA system. Then, we implemented a number benchmarks to evaluate various aspects of system performance. Our results show that our architecture can be reconfigured in only 11.9us, as compared to seconds for full FPGA recon_guration. However, the overlay architecture uses 10.5x more LUTs and causes a drop in frequency of about 30% for the chosen architecture. For future work, there is room to improve these results by optimizing the interconnect network of the device.
Advisors/Committee Members: Ars%2C%20Zaid%22%29&pagesize-30">
Al-
Ars,
Zaid (mentor),
Peltenburg, J.W. (mentor),
Delft University of Technology (degree granting institution).
Subjects/Keywords: FPGA Overlays; Zynq
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Gkougkoulias, K. (. (2017). Porting and Evaluation of Overlay Architectures for FPGAs with Scientific Kernels. (Masters Thesis). Delft University of Technology. Retrieved from http://resolver.tudelft.nl/uuid:f0dc25f1-9edb-4e87-9154-bafc4da58084
Chicago Manual of Style (16th Edition):
Gkougkoulias, Konstantinos (author). “Porting and Evaluation of Overlay Architectures for FPGAs with Scientific Kernels.” 2017. Masters Thesis, Delft University of Technology. Accessed April 18, 2021.
http://resolver.tudelft.nl/uuid:f0dc25f1-9edb-4e87-9154-bafc4da58084.
MLA Handbook (7th Edition):
Gkougkoulias, Konstantinos (author). “Porting and Evaluation of Overlay Architectures for FPGAs with Scientific Kernels.” 2017. Web. 18 Apr 2021.
Vancouver:
Gkougkoulias K(. Porting and Evaluation of Overlay Architectures for FPGAs with Scientific Kernels. [Internet] [Masters thesis]. Delft University of Technology; 2017. [cited 2021 Apr 18].
Available from: http://resolver.tudelft.nl/uuid:f0dc25f1-9edb-4e87-9154-bafc4da58084.
Council of Science Editors:
Gkougkoulias K(. Porting and Evaluation of Overlay Architectures for FPGAs with Scientific Kernels. [Masters Thesis]. Delft University of Technology; 2017. Available from: http://resolver.tudelft.nl/uuid:f0dc25f1-9edb-4e87-9154-bafc4da58084

Delft University of Technology
12.
Chi, Huang-Da (author).
Parallelizing a Video Filter-chain for Multi- and Many-core Systems.
Degree: 2018, Delft University of Technology
URL: http://resolver.tudelft.nl/uuid:8f168240-026e-47ba-9cd8-4d3e657249aa
► Developing parallel applications to make efficient use of current and emerging parallel architectures remains a big challenge in modern application development where performance is a…
(more)
▼ Developing parallel applications to make efficient use of current and emerging parallel architectures remains a big challenge in modern application development where performance is a first-class citizen. Optimizing conventional video filter applications to take advantage of current multi- and many-core architectures is one such application. This thesis investigates the case of how to parallelize a video filter-chain to obtain maximum performance on Intel's newest many-core architecture, the Knights Landing platform, and also on a conventional Haswell Xeon server platform. Implemented optimizations include line parallelization, multiple frames in flight, and AVX-512. The line parallelization and multiple frames in flight optimizations are both coarse-grained parallelism strategies, focusing on minimizing synchronization and communication overhead while the AVX-512 optimization was a fine-grained parallelism strategy. Challenges found with the coarse-grained parallelism strategies are primarily load balancing issues. The line parallelization approach paired with the multiple frames in flight optimization managed to achieve a speedup of 27.14x for the 28-core Xeon server system and 95.47x for the Knights Landing system with the compute-intensive 8k color conversion benchmark. Memory-intensive benchmarks such as blend had lower but still decent overall speedups at 9.76x and 25.34x speedup for the Xeon server and Knights Landing platform respectively. The AVX-512 optimization for color conversion and scale resulted in a single-threaded performance enhancement of 1.41x and 1.60x speedup respectively. We can conclude from the experimental data analysis that for video filter applications, data parallelization strategies are very effective. Especially for compute-intensive filters such as color conversion, it can net up to near linear speedup to the amount of cores. The main limitation prohibiting speedup found in some other filters is memory bandwidth.
Advisors/Committee Members: Ars%2C%20Zaid%22%29&pagesize-30">
Al-
Ars,
Zaid (mentor),
Alvarez-Mesa, Mauricio (mentor),
Delft University of Technology (degree granting institution).
Subjects/Keywords: parallelism; filter; Video; scalability; Performance
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Chi, H. (. (2018). Parallelizing a Video Filter-chain for Multi- and Many-core Systems. (Masters Thesis). Delft University of Technology. Retrieved from http://resolver.tudelft.nl/uuid:8f168240-026e-47ba-9cd8-4d3e657249aa
Chicago Manual of Style (16th Edition):
Chi, Huang-Da (author). “Parallelizing a Video Filter-chain for Multi- and Many-core Systems.” 2018. Masters Thesis, Delft University of Technology. Accessed April 18, 2021.
http://resolver.tudelft.nl/uuid:8f168240-026e-47ba-9cd8-4d3e657249aa.
MLA Handbook (7th Edition):
Chi, Huang-Da (author). “Parallelizing a Video Filter-chain for Multi- and Many-core Systems.” 2018. Web. 18 Apr 2021.
Vancouver:
Chi H(. Parallelizing a Video Filter-chain for Multi- and Many-core Systems. [Internet] [Masters thesis]. Delft University of Technology; 2018. [cited 2021 Apr 18].
Available from: http://resolver.tudelft.nl/uuid:8f168240-026e-47ba-9cd8-4d3e657249aa.
Council of Science Editors:
Chi H(. Parallelizing a Video Filter-chain for Multi- and Many-core Systems. [Masters Thesis]. Delft University of Technology; 2018. Available from: http://resolver.tudelft.nl/uuid:8f168240-026e-47ba-9cd8-4d3e657249aa

Delft University of Technology
13.
Ritsma, Folkert (author).
Advanced Set Bounding Methods for Fault Detection.
Degree: 2019, Delft University of Technology
URL: http://resolver.tudelft.nl/uuid:b6bad7a5-0afd-4268-873d-32a4a18b4281
► Performance of set based fault detection is highly dependent on the complexity of the set bounding methods used to bound the healthy residual set. Existing…
(more)
▼ Performance of set based fault detection is highly dependent on the complexity of the set bounding methods used to bound the healthy residual set. Existing methods achieve robust performance with complex set bounding that narrowly define healthy system behavior, yet at the cost of higher computation times. In this thesis a major improvement is reached in both accuracy and computation time by applying machine learning methods to set bounding. A method is developed which achieves fault detection at several orders of magnitude the speed of an existing set based fault detection method without sacrificing a robust performance.
Mechanical Engineering | Systems and Control
Advisors/Committee Members: Ferrari, Riccardo (mentor), Ars%2C%20Zaid%22%29&pagesize-30">
Al-
Ars,
Zaid (mentor),
Delft University of Technology (degree granting institution).
Subjects/Keywords: Fault Detection; Machine Learning; Anomaly Detection; Outlier Detection; Support Vector Machines; Model Based Fault Detection; Set Based Fault Detection
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Ritsma, F. (. (2019). Advanced Set Bounding Methods for Fault Detection. (Masters Thesis). Delft University of Technology. Retrieved from http://resolver.tudelft.nl/uuid:b6bad7a5-0afd-4268-873d-32a4a18b4281
Chicago Manual of Style (16th Edition):
Ritsma, Folkert (author). “Advanced Set Bounding Methods for Fault Detection.” 2019. Masters Thesis, Delft University of Technology. Accessed April 18, 2021.
http://resolver.tudelft.nl/uuid:b6bad7a5-0afd-4268-873d-32a4a18b4281.
MLA Handbook (7th Edition):
Ritsma, Folkert (author). “Advanced Set Bounding Methods for Fault Detection.” 2019. Web. 18 Apr 2021.
Vancouver:
Ritsma F(. Advanced Set Bounding Methods for Fault Detection. [Internet] [Masters thesis]. Delft University of Technology; 2019. [cited 2021 Apr 18].
Available from: http://resolver.tudelft.nl/uuid:b6bad7a5-0afd-4268-873d-32a4a18b4281.
Council of Science Editors:
Ritsma F(. Advanced Set Bounding Methods for Fault Detection. [Masters Thesis]. Delft University of Technology; 2019. Available from: http://resolver.tudelft.nl/uuid:b6bad7a5-0afd-4268-873d-32a4a18b4281

Delft University of Technology
14.
Hesam, Ahmad (author).
Faster than the Speed of Life: Accelerating Developmental Biology Simulations with GPUs and FPGAs.
Degree: 2018, Delft University of Technology
URL: http://resolver.tudelft.nl/uuid:2fa2203b-ca26-4aa2-9861-1a4352391e09
► Life scientists are faced with the tough challenge of developing high-performance computer simulations of their increasingly complex models. BioDynaMo is an open-source biological simulation platform…
(more)
▼ Life scientists are faced with the tough challenge of developing high-performance computer simulations of their increasingly complex models. BioDynaMo is an open-source biological simulation platform that aims to alleviate them from the intricacies that go into development. Life scientists are able to base their models on top of BioDynaMo’s highly optimized core execution engine. At the core of all biological simulations is the mechanical interactions between possibly millions of objects. In this work we investigate the currently implemented method of handling mechanical interactions, and ways to improve the performance in order to enable large-scale and complex simulations. We propose to replace the existing kd-tree implementation for neighborhood operations with a uniform grid method that allows us to take advantage of architectures of hardware accelerators, such as GPUs and FPGAs. As a result, the multi-threaded uniform grid implementation accounts for a 14× speedup with respect to the serial baseline version. Accelerating the mechanical interactions through hardware acceleration proved to perform best on a GPU, with a resulting speedup of 134×.
BioDynaMo
Advisors/Committee Members: Ars%2C%20Zaid%22%29&pagesize-30">
Al-
Ars,
Zaid (mentor),
Rademakers, Fons (mentor),
Delft University of Technology (degree granting institution).
Subjects/Keywords: HPC; GPU computing; FPGA; Simulation; Computational Biology; BioDynaMo; Heterogeneous acceleration
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Hesam, A. (. (2018). Faster than the Speed of Life: Accelerating Developmental Biology Simulations with GPUs and FPGAs. (Masters Thesis). Delft University of Technology. Retrieved from http://resolver.tudelft.nl/uuid:2fa2203b-ca26-4aa2-9861-1a4352391e09
Chicago Manual of Style (16th Edition):
Hesam, Ahmad (author). “Faster than the Speed of Life: Accelerating Developmental Biology Simulations with GPUs and FPGAs.” 2018. Masters Thesis, Delft University of Technology. Accessed April 18, 2021.
http://resolver.tudelft.nl/uuid:2fa2203b-ca26-4aa2-9861-1a4352391e09.
MLA Handbook (7th Edition):
Hesam, Ahmad (author). “Faster than the Speed of Life: Accelerating Developmental Biology Simulations with GPUs and FPGAs.” 2018. Web. 18 Apr 2021.
Vancouver:
Hesam A(. Faster than the Speed of Life: Accelerating Developmental Biology Simulations with GPUs and FPGAs. [Internet] [Masters thesis]. Delft University of Technology; 2018. [cited 2021 Apr 18].
Available from: http://resolver.tudelft.nl/uuid:2fa2203b-ca26-4aa2-9861-1a4352391e09.
Council of Science Editors:
Hesam A(. Faster than the Speed of Life: Accelerating Developmental Biology Simulations with GPUs and FPGAs. [Masters Thesis]. Delft University of Technology; 2018. Available from: http://resolver.tudelft.nl/uuid:2fa2203b-ca26-4aa2-9861-1a4352391e09

Delft University of Technology
15.
Tuna, Ozan Dogu (author).
HPC Based Acceleration for Optimization of Predictive Models: Lithography Overlay Performance Modeling.
Degree: 2019, Delft University of Technology
URL: http://resolver.tudelft.nl/uuid:f5bab9f2-e67e-41a6-815c-05dc21987ea0
► This thesis project achieves designing and comparing two parallel implementations for exhaustive grid search along a large model space to find the optimum mapping model…
(more)
▼ This thesis project achieves designing and comparing two parallel implementations for exhaustive grid search along a large model space to find the optimum mapping model for overlay predictions used in ASML lithography machines. The search algorithm leads to an effectively intractable problem as long as sequential implementation is concerned, but a parallel implementation using the technologies pro-vided by ASML High Performance Cluster (HPC) pave the way to tackle the challenge. A number of parallel execu-tion concepts have been developed using different frame-works that are exposed to the ASML HPC developer com-munity by the platform maintainers. Among these con-cepts, the most promising ones with respect to a defined set of criteria have been chosen to carry on with the implemen-tation effort. It has been shown that a PBS based Lab im-plementation can scale on HPC with a parallel efficiency of 66%, with most of the efficiency loss stemming from scheduler overhead. A second, Spark based Fab implementa-tion has an increased efficiency of 82%, paving a way for speedup of almost 1700x for a Spark cluster with 2048cores. Moreover, It has been shown experimentally that perfor-mance scales linearly over the model space dimensions. Baseline sequential implementation is estimated to take, by extrapolation, 2590 hours to execute on a single core for a typical model space use case. Refactoring the sequential implementation to utilize multiple CPU cores through mul-tiprocessing can drive execution down to 115 hours on a 24-core machine. Fab parallel implementation executes the same use case in 1.6 hours, enabling exploratory and itera-tive approaches to modeling for data scientists and domain experts.
Computer Engineering
Advisors/Committee Members: Ars%2C%20Zaid%22%29&pagesize-30">
Al-
Ars,
Zaid (mentor),
Valente, Frederico (mentor),
Delft University of Technology (degree granting institution).
Subjects/Keywords: Parallel Frameworks; Predictive Model Optimization; Spark; PySpark
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Tuna, O. D. (. (2019). HPC Based Acceleration for Optimization of Predictive Models: Lithography Overlay Performance Modeling. (Masters Thesis). Delft University of Technology. Retrieved from http://resolver.tudelft.nl/uuid:f5bab9f2-e67e-41a6-815c-05dc21987ea0
Chicago Manual of Style (16th Edition):
Tuna, Ozan Dogu (author). “HPC Based Acceleration for Optimization of Predictive Models: Lithography Overlay Performance Modeling.” 2019. Masters Thesis, Delft University of Technology. Accessed April 18, 2021.
http://resolver.tudelft.nl/uuid:f5bab9f2-e67e-41a6-815c-05dc21987ea0.
MLA Handbook (7th Edition):
Tuna, Ozan Dogu (author). “HPC Based Acceleration for Optimization of Predictive Models: Lithography Overlay Performance Modeling.” 2019. Web. 18 Apr 2021.
Vancouver:
Tuna OD(. HPC Based Acceleration for Optimization of Predictive Models: Lithography Overlay Performance Modeling. [Internet] [Masters thesis]. Delft University of Technology; 2019. [cited 2021 Apr 18].
Available from: http://resolver.tudelft.nl/uuid:f5bab9f2-e67e-41a6-815c-05dc21987ea0.
Council of Science Editors:
Tuna OD(. HPC Based Acceleration for Optimization of Predictive Models: Lithography Overlay Performance Modeling. [Masters Thesis]. Delft University of Technology; 2019. Available from: http://resolver.tudelft.nl/uuid:f5bab9f2-e67e-41a6-815c-05dc21987ea0

Delft University of Technology
16.
Berkers, Martijn (author).
Solving convex optimization problems on FPGA using OpenCL.
Degree: 2020, Delft University of Technology
URL: http://resolver.tudelft.nl/uuid:be4ad409-872d-49a2-a092-4d8e2164709d
► The application of accelerators in HPC applications has seen enormous growth in the last decade. In the field of HPC demands on throughput are steadily…
(more)
▼ The application of accelerators in HPC applications has seen enormous growth in the last decade. In the field of HPC demands on throughput are steadily growing. Not all of the algorithms used have a clear HW architecture which performs the best. Our work explores the performance of different HW architectures in solving a convex optimization problem. These algorithms are a sequence of dependent operations making it an interesting use-case because parallelism is not easily found. Our work focuses on a use-case of an on machine computational model present in ASML, we explore the acceleration of a quadratic programming Active-Set algorithm on dedicated hardware. There are libraries available to do this on both the CPU and GPU, while nothing is available for the FPGA. Our work focuses on filling this gap by implementing the algorithm using a high-level abstraction parallel programming language in order to ease development for FPGA accelerators. We use the Intel FPGA SDK for OpenCL framework to evaluate the performance trade-offs involved with FPGA acceleration and compare the performance to both the CPU and GPU using library functions. To fit FPGA architecture the algorithm is converted to a dataflow algorithm to enable streaming of data between kernels. The implementation leverages the features introduced in the Intel FPGA SDK for OpenCL framework to stream data using on-chip low-latency communication between kernels. We demonstrate that such a complicated algorithm can efficiently be implemented using the OpenCL framework. Our implementation achieves competitive performance compared to optimized library function on both the CPU and GPU. The OpenCL framework allows for easy design space exploration. We have explored different optimization strategies. The execution time of the final FPGA implementation is 3.5x and 1.2x longer than the CPU and GPU respectively in double precision floating-point. If the accuracy of the FPGA implementation is reduced to single precision there is a speedup of 2.2x in execution time compared to the double precision variant. Higher throughput can be achieved by duplicating the implementation. With the current size of the algorithm, two additional copies are possible. A handcrafted implementation could further improve the FPGA performance by manually managing local memory structures and reusing processing elements. However, significantly fewer lines of code are required, and a significant reduction in development time is achieved by using the OpenCL framework compared to traditional hardware description languages.
Advisors/Committee Members: Ars%2C%20Zaid%22%29&pagesize-30">
Al-
Ars,
Zaid (mentor),
van der Vlugt, Steven (graduation committee),
Delft University of Technology (degree granting institution).
Subjects/Keywords: FPGA; OpenCL; HLS; GPU; BLAS; LAPACK; convex optimisation
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Berkers, M. (. (2020). Solving convex optimization problems on FPGA using OpenCL. (Masters Thesis). Delft University of Technology. Retrieved from http://resolver.tudelft.nl/uuid:be4ad409-872d-49a2-a092-4d8e2164709d
Chicago Manual of Style (16th Edition):
Berkers, Martijn (author). “Solving convex optimization problems on FPGA using OpenCL.” 2020. Masters Thesis, Delft University of Technology. Accessed April 18, 2021.
http://resolver.tudelft.nl/uuid:be4ad409-872d-49a2-a092-4d8e2164709d.
MLA Handbook (7th Edition):
Berkers, Martijn (author). “Solving convex optimization problems on FPGA using OpenCL.” 2020. Web. 18 Apr 2021.
Vancouver:
Berkers M(. Solving convex optimization problems on FPGA using OpenCL. [Internet] [Masters thesis]. Delft University of Technology; 2020. [cited 2021 Apr 18].
Available from: http://resolver.tudelft.nl/uuid:be4ad409-872d-49a2-a092-4d8e2164709d.
Council of Science Editors:
Berkers M(. Solving convex optimization problems on FPGA using OpenCL. [Masters Thesis]. Delft University of Technology; 2020. Available from: http://resolver.tudelft.nl/uuid:be4ad409-872d-49a2-a092-4d8e2164709d

Delft University of Technology
17.
Hilmarsson, Saevar (author).
Hala ρ-VEX: Highly-Programmable Dynamically-Reconfigurable FPGA-based Streaming Platform for Image Processing.
Degree: 2018, Delft University of Technology
URL: http://resolver.tudelft.nl/uuid:c372f6e8-92f2-4793-9b14-e4b445b1a6c8
► Image processing is found in many fields and in many domains. Advances in digital image capturing technology allows for faster video rates, of higher quality,…
(more)
▼ Image processing is found in many fields and in many domains. Advances in digital image capturing technology allows for faster video rates, of higher quality, than has been seen before and that trend continues. With greater resolution and increased data flow there is also a need for faster and better hardware for image processing. As the trend introduced in Moore's law is slowing down, and possibly reaching saturation in the coming years, there is an ongoing search for new and different solutions in processor architecture. The trend went from single core to multi core and many core and now we are looking into other designs like memory streaming architectures and runtime reconfigurable computers. This thesis designs, implements and evaluates a programming interface for a dynamically-reconfigurable memory-streaming platform for image processing with a focus on programmability, power consumption, reconfigurability and performance. An application programming interface (API) is created to aid with new code development for the platform. The API is a library of functions that are run on an ARM processor and are used to setup, and communicate with, a stream of ρ-VEX soft processors running on a field programmable gate array (FPGA). In this research we look at other state-of-the-art solutions, for comparison and inspiration, that focus on programmability, reconfiguration and performance. The platform is reconfigurable at runtime and experiments show that it takes under 200 ms to completely reconfigure the fabric and initialize a new configuration of ρ-VEX processors. The platform is tested on a Zynq-7000 chip from Xilinx. Comparison is made between streaming architecture and a many core setup using the same amount of ρ-VEX soft processors. The results show a speedup of factor of 2 by using a single processing stream of seven cores compared with seven cores individually running the same algorithm. The result is a working fully-programmable and open-source streaming platform for the image processing domain.
Computer Engineering
Advisors/Committee Members: Ars%2C%20Zaid%22%29&pagesize-30">
Al-
Ars,
Zaid (mentor),
Wong, Stephan (graduation committee),
van Leuken, Rene (graduation committee),
Delft University of Technology (degree granting institution).
Subjects/Keywords: FPGA; Image processing; rVEX; Streaming architecture
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Hilmarsson, S. (. (2018). Hala ρ-VEX: Highly-Programmable Dynamically-Reconfigurable FPGA-based Streaming Platform for Image Processing. (Masters Thesis). Delft University of Technology. Retrieved from http://resolver.tudelft.nl/uuid:c372f6e8-92f2-4793-9b14-e4b445b1a6c8
Chicago Manual of Style (16th Edition):
Hilmarsson, Saevar (author). “Hala ρ-VEX: Highly-Programmable Dynamically-Reconfigurable FPGA-based Streaming Platform for Image Processing.” 2018. Masters Thesis, Delft University of Technology. Accessed April 18, 2021.
http://resolver.tudelft.nl/uuid:c372f6e8-92f2-4793-9b14-e4b445b1a6c8.
MLA Handbook (7th Edition):
Hilmarsson, Saevar (author). “Hala ρ-VEX: Highly-Programmable Dynamically-Reconfigurable FPGA-based Streaming Platform for Image Processing.” 2018. Web. 18 Apr 2021.
Vancouver:
Hilmarsson S(. Hala ρ-VEX: Highly-Programmable Dynamically-Reconfigurable FPGA-based Streaming Platform for Image Processing. [Internet] [Masters thesis]. Delft University of Technology; 2018. [cited 2021 Apr 18].
Available from: http://resolver.tudelft.nl/uuid:c372f6e8-92f2-4793-9b14-e4b445b1a6c8.
Council of Science Editors:
Hilmarsson S(. Hala ρ-VEX: Highly-Programmable Dynamically-Reconfigurable FPGA-based Streaming Platform for Image Processing. [Masters Thesis]. Delft University of Technology; 2018. Available from: http://resolver.tudelft.nl/uuid:c372f6e8-92f2-4793-9b14-e4b445b1a6c8

Delft University of Technology
18.
Hes, Robin (author).
Insurance – A Machine Learning Perspective: Predicting Automobile Liability Insurance Pure Premiums Using Machine Learning Methods.
Degree: 2018, Delft University of Technology
URL: http://resolver.tudelft.nl/uuid:643ed1a3-5eeb-4fc1-8a92-73a0b72f50cf
► This thesis explores the use of machine learning techniques in an effort to increase insurer competitiveness. It asks whether it is possible to accurately estimate the…
(more)
▼ This thesis explores the use of machine learning techniques in an effort to increase insurer competitiveness. It asks whether it is possible to accurately estimate the expected financial loss of a given insurance contract and how this information can be used to gain a competitive edge in the business. To answer these questions, some basic principles of insurance are introduced, with a focus on statistical modeling. Furthermore, potentially successful algorithms and techniques are described, like ordinary least squares, generalized linear models (GLMs), generalized additive models, clustering, random forests and gradient boosting trees. It is shown that theory that was originally developed for GLMs, can easily be generalized to other methods, chiefly gradient boosting, with often better results. A new form of evaluation is introduced that helps to rate the efficiency of an insurance portfolio. This, and other metrics are finally applied to several designed models to demonstrate their effectiveness.
Computer Engineering and Electrical Engineering
Advisors/Committee Members: Ars%2C%20Zaid%22%29&pagesize-30">
Al-
Ars,
Zaid (mentor),
Tax, David (graduation committee),
de Voogd, G.W.H. (graduation committee),
Delft University of Technology (degree granting institution).
Subjects/Keywords: insurance; Machine Learning; pure premium
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Hes, R. (. (2018). Insurance – A Machine Learning Perspective: Predicting Automobile Liability Insurance Pure Premiums Using Machine Learning Methods. (Masters Thesis). Delft University of Technology. Retrieved from http://resolver.tudelft.nl/uuid:643ed1a3-5eeb-4fc1-8a92-73a0b72f50cf
Chicago Manual of Style (16th Edition):
Hes, Robin (author). “Insurance – A Machine Learning Perspective: Predicting Automobile Liability Insurance Pure Premiums Using Machine Learning Methods.” 2018. Masters Thesis, Delft University of Technology. Accessed April 18, 2021.
http://resolver.tudelft.nl/uuid:643ed1a3-5eeb-4fc1-8a92-73a0b72f50cf.
MLA Handbook (7th Edition):
Hes, Robin (author). “Insurance – A Machine Learning Perspective: Predicting Automobile Liability Insurance Pure Premiums Using Machine Learning Methods.” 2018. Web. 18 Apr 2021.
Vancouver:
Hes R(. Insurance – A Machine Learning Perspective: Predicting Automobile Liability Insurance Pure Premiums Using Machine Learning Methods. [Internet] [Masters thesis]. Delft University of Technology; 2018. [cited 2021 Apr 18].
Available from: http://resolver.tudelft.nl/uuid:643ed1a3-5eeb-4fc1-8a92-73a0b72f50cf.
Council of Science Editors:
Hes R(. Insurance – A Machine Learning Perspective: Predicting Automobile Liability Insurance Pure Premiums Using Machine Learning Methods. [Masters Thesis]. Delft University of Technology; 2018. Available from: http://resolver.tudelft.nl/uuid:643ed1a3-5eeb-4fc1-8a92-73a0b72f50cf

Delft University of Technology
19.
Plaisant van der Wal, Renzo (author).
The Future of Fraud Detection: Detecting Fraudulent Insurance Claims Using Machine Learning Methods.
Degree: 2018, Delft University of Technology
URL: http://resolver.tudelft.nl/uuid:935a0d46-2e26-4af5-b308-32b5fe54926b
► Machine learning methods are explored in an attempt to achieve better predictive performance than the legacy rule-based fraud detection systems that are currently used to…
(more)
▼ Machine learning methods are explored in an attempt to achieve better predictive performance than the legacy rule-based fraud detection systems that are currently used to detect fraudulent car insurance claims. There are two key principles that lead the exploration of machine learning techniques and algorithms in this thesis, namely, the applicability to imbalanced data, and the interpretability of predictions. The dataset used for model training and evaluation contains only 0.3% fraudulent claims compared to 99.7% non-fraudulent claims, which can therefore be considered highly imbalanced. Furthermore, prediction interpretability is of great importance, since fraud experts are directly interfacing with the output of the machine learning models. With the key principles in mind, this thesis considers four algorithms, Logistic Regression, Random Forest, LightGBM and a Stacking classifier. The algorithms are trained on the imbalanced learning problem by using a combination of undersampling (random and Edited Nearest Neighbors), oversampling (SMOTE) and class weighting. Conclusively, each trained model meets the objective, with the Stacking classifier combining the best performance with the lowest variance. By benchmarking the baseline for two different parameters, the models can be evaluated for two boundary conditions, which leads to tunable performance between the two conditions. Ultimately, the performance of the Stacking classifier is tunable (by moving its classification threshold) to roughly a 70-80% increase in extra fraud caught or a 75% reduction in effort. Extra fraud will increase the amount of real fraudulent claims that fraud experts get to see, and effort reduction leads to an increase in capacity, which enables fraud experts to spend more time on other more relevant tasks.
Computer Engineering
Advisors/Committee Members: Ars%2C%20Zaid%22%29&pagesize-30">
Al-
Ars,
Zaid (mentor),
Verwer, Sicco (graduation committee),
de Voogd, G.W.H. (graduation committee),
Delft University of Technology (degree granting institution).
Subjects/Keywords: Insurance; Machine Learning; fraud detection; fraud; imbalanced
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Plaisant van der Wal, R. (. (2018). The Future of Fraud Detection: Detecting Fraudulent Insurance Claims Using Machine Learning Methods. (Masters Thesis). Delft University of Technology. Retrieved from http://resolver.tudelft.nl/uuid:935a0d46-2e26-4af5-b308-32b5fe54926b
Chicago Manual of Style (16th Edition):
Plaisant van der Wal, Renzo (author). “The Future of Fraud Detection: Detecting Fraudulent Insurance Claims Using Machine Learning Methods.” 2018. Masters Thesis, Delft University of Technology. Accessed April 18, 2021.
http://resolver.tudelft.nl/uuid:935a0d46-2e26-4af5-b308-32b5fe54926b.
MLA Handbook (7th Edition):
Plaisant van der Wal, Renzo (author). “The Future of Fraud Detection: Detecting Fraudulent Insurance Claims Using Machine Learning Methods.” 2018. Web. 18 Apr 2021.
Vancouver:
Plaisant van der Wal R(. The Future of Fraud Detection: Detecting Fraudulent Insurance Claims Using Machine Learning Methods. [Internet] [Masters thesis]. Delft University of Technology; 2018. [cited 2021 Apr 18].
Available from: http://resolver.tudelft.nl/uuid:935a0d46-2e26-4af5-b308-32b5fe54926b.
Council of Science Editors:
Plaisant van der Wal R(. The Future of Fraud Detection: Detecting Fraudulent Insurance Claims Using Machine Learning Methods. [Masters Thesis]. Delft University of Technology; 2018. Available from: http://resolver.tudelft.nl/uuid:935a0d46-2e26-4af5-b308-32b5fe54926b

Delft University of Technology
20.
Lévy, Jonathan (author).
Acceleration of Seed Extension for BWA-MEM DNA Alignment Using GPUs.
Degree: 2019, Delft University of Technology
URL: http://resolver.tudelft.nl/uuid:bd22471f-058a-4071-95bb-5126e263124b
► DNA aligning is a compute-intensive and time-consuming task required for all further DNA processing. It consists in finding for each DNA string from a sample…
(more)
▼ DNA aligning is a compute-intensive and time-consuming task required for all further DNA processing. It consists in finding for each DNA string from a sample its location in a reference genome. Usual techniques for short reads (hundreds of bases) involve seed-extension, where a small matching seed is found with quick search through FM-index and then extended on both ends with a custom Smith-Waterman algorithm, giving optimal solution. However this seed-extension takes a tremendous amount of time. This is why we present in this thesis a solution to offload extension on a GPU to be done in a parallel fashion. This is possible thanks to the fact that the DNA reads do not present any kind of relation between each other. We used the Burrows-Wheeler Aligner - Maximal Exact Match (BWA-MEM), a reference program in the field, to which we integrated a GPU-accelerated library for extension, GASAL2. However, BWA-MEM has a left-right dependency on extension starting scores, with the left alignment starting with the seed score, then the right part starting with the previously calculated left score. We designed a solution by starting both extensions with the seed score, we called this method the "seed-only" paradigm. On our test machine featuring 12 hyperthreaded cores and an NVIDIA Tesla K40c, when running with 12 threads, we could observe a raw kernel speed-up of 4.8x ; but if we allow non-blocking calls to let the CPU run the seeding tasks while the GPU computes the extension, we can reach a 16x effective speed-up for the extension. This extension part takes around 27% of the total time but our acceleration introduces a small overhead due to memory preparations and copying, which makes the whole application 1.28x faster, getting close to the theoretical maximum of 1.37x. Additionally, the paradigm shift we operated creates minimal differences in the final main scores on good quality alignments, with a 1.82% difference for our 5.2 million sequences data set. This makes our acceleration with GASAL2 an solid and efficient solution for a single machine.
Advisors/Committee Members: Ars%2C%20Zaid%22%29&pagesize-30">
Al-
Ars,
Zaid (mentor),
Ahmed, Nauman (mentor),
Zandrahimi, Mahroo (graduation committee),
Ishihara, Ryoichi (graduation committee),
Delft University of Technology (degree granting institution).
Subjects/Keywords: GPU; DNA; biology; parallel processing
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Lévy, J. (. (2019). Acceleration of Seed Extension for BWA-MEM DNA Alignment Using GPUs. (Masters Thesis). Delft University of Technology. Retrieved from http://resolver.tudelft.nl/uuid:bd22471f-058a-4071-95bb-5126e263124b
Chicago Manual of Style (16th Edition):
Lévy, Jonathan (author). “Acceleration of Seed Extension for BWA-MEM DNA Alignment Using GPUs.” 2019. Masters Thesis, Delft University of Technology. Accessed April 18, 2021.
http://resolver.tudelft.nl/uuid:bd22471f-058a-4071-95bb-5126e263124b.
MLA Handbook (7th Edition):
Lévy, Jonathan (author). “Acceleration of Seed Extension for BWA-MEM DNA Alignment Using GPUs.” 2019. Web. 18 Apr 2021.
Vancouver:
Lévy J(. Acceleration of Seed Extension for BWA-MEM DNA Alignment Using GPUs. [Internet] [Masters thesis]. Delft University of Technology; 2019. [cited 2021 Apr 18].
Available from: http://resolver.tudelft.nl/uuid:bd22471f-058a-4071-95bb-5126e263124b.
Council of Science Editors:
Lévy J(. Acceleration of Seed Extension for BWA-MEM DNA Alignment Using GPUs. [Masters Thesis]. Delft University of Technology; 2019. Available from: http://resolver.tudelft.nl/uuid:bd22471f-058a-4071-95bb-5126e263124b

Delft University of Technology
21.
de Haan, Erwin (author).
Automated FPGA Hardware Synthesis for High-Throughput Big Data Filtering and Transformation: An SQL query transpiler targeting Vivado HLS C++ tools for high-level stream transformation and filtering on FPGAs using Apache Arrow.
Degree: 2019, Delft University of Technology
URL: http://resolver.tudelft.nl/uuid:b1059caf-8eee-48d8-8a78-c14d3b0d7db3
► Despite its advantages in performance and control, hardware design is mainly bottlenecked by high design complexity and long development time. This thesis explores the use…
(more)
▼ Despite its advantages in performance and control, hardware design is mainly bottlenecked by high design complexity and long development time. This thesis explores the use of domain specific languages for high-level synthesis (HLS) of hardware data filters and transformations. The main goal of this thesis’ prototype is automating the transpiling of SQL to HLS C++ to generate hardware for filtering and data streams using CAPI on POWER systems. This work uses the Fletcher framework to automate the handling of data movement between memory and the field-programmable gate array (FPGA). The use of HLS technologies can greatly reduce the development time of FPGAs compared to manual FPGA development workflows. Deploying FPGAs in fast changing data processing pipelines, can be very complicated or limit the use of the FPGA hardware. This work investigates if HLScan be used for these kinds of applications to reduce total development time while still maintaining performance. Additionally, the use of the Fletcher framework further reduces required developer time. The proof-of-concept shows that it is possible to efficiently use HLS for data filtering and transformations. And that without a significant effort from the designer, usable designs and filters can be generated. For example some of the simpler kernels can reach upwards of 1 GB/s while using less than 1 % of a Xilinx Kintex UltraScale XCKU060 FPGA. By using multiple instances of these kernels the design can saturate the system bandwidth. Though this approach is not without issue, it does lend itself to extending the tool and some extra development effort to improve the current proof-of-concept. The project code is released under Apache 2.0 license on GitHub at: https://github.com/EraYaN/FletcherFiltering.
Advisors/Committee Members: Ars%2C%20Zaid%22%29&pagesize-30">
Al-
Ars,
Zaid (mentor),
Hofstee, Peter (graduation committee),
Rellermeyer, Jan (graduation committee),
Peltenburg, Johan (graduation committee),
Delft University of Technology (degree granting institution).
Subjects/Keywords: FPGA; HLS; SQL; compiler; automated; python
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
de Haan, E. (. (2019). Automated FPGA Hardware Synthesis for High-Throughput Big Data Filtering and Transformation: An SQL query transpiler targeting Vivado HLS C++ tools for high-level stream transformation and filtering on FPGAs using Apache Arrow. (Masters Thesis). Delft University of Technology. Retrieved from http://resolver.tudelft.nl/uuid:b1059caf-8eee-48d8-8a78-c14d3b0d7db3
Chicago Manual of Style (16th Edition):
de Haan, Erwin (author). “Automated FPGA Hardware Synthesis for High-Throughput Big Data Filtering and Transformation: An SQL query transpiler targeting Vivado HLS C++ tools for high-level stream transformation and filtering on FPGAs using Apache Arrow.” 2019. Masters Thesis, Delft University of Technology. Accessed April 18, 2021.
http://resolver.tudelft.nl/uuid:b1059caf-8eee-48d8-8a78-c14d3b0d7db3.
MLA Handbook (7th Edition):
de Haan, Erwin (author). “Automated FPGA Hardware Synthesis for High-Throughput Big Data Filtering and Transformation: An SQL query transpiler targeting Vivado HLS C++ tools for high-level stream transformation and filtering on FPGAs using Apache Arrow.” 2019. Web. 18 Apr 2021.
Vancouver:
de Haan E(. Automated FPGA Hardware Synthesis for High-Throughput Big Data Filtering and Transformation: An SQL query transpiler targeting Vivado HLS C++ tools for high-level stream transformation and filtering on FPGAs using Apache Arrow. [Internet] [Masters thesis]. Delft University of Technology; 2019. [cited 2021 Apr 18].
Available from: http://resolver.tudelft.nl/uuid:b1059caf-8eee-48d8-8a78-c14d3b0d7db3.
Council of Science Editors:
de Haan E(. Automated FPGA Hardware Synthesis for High-Throughput Big Data Filtering and Transformation: An SQL query transpiler targeting Vivado HLS C++ tools for high-level stream transformation and filtering on FPGAs using Apache Arrow. [Masters Thesis]. Delft University of Technology; 2019. Available from: http://resolver.tudelft.nl/uuid:b1059caf-8eee-48d8-8a78-c14d3b0d7db3

Delft University of Technology
22.
Yadav, Amitabh (author).
CC-Spin: A Micro-architecture design for scalable control of Spin-Qubit Quantum Processor.
Degree: 2019, Delft University of Technology
URL: http://resolver.tudelft.nl/uuid:5afeea39-eabf-4561-832f-35f93e6bcc3e
► Quantum Computing is an emerging field of technology with the promise that engineered quantum systems can address hard problems such as, problems with exponential compute…
(more)
▼ Quantum Computing is an emerging field of technology with the promise that engineered quantum systems can address hard problems such as, problems with exponential compute complexity in Chemistry, Genomics, Optimization and many more applications. Quantum Computer Architecture is an area of research targeted for the NISQ-era quantum computing and little research has been done for development of a scalable classical control and read-out infrastructure for the quantum processors. The project is aimed at study of SoC-FPGA design methodology and architecture design for control of quantum processor. The targeted quantum hardware is the Spin-Qubit in Semiconductor Quantum Dot Chip. The project is intended for understanding the design and working of a silicon-spin qubit for a computer (architecture) engineer. It further helps identify necessities for an architecture, Instruction Set requirements, bottlenecks and future challenges (specific to Spin-Qubit quantum processor) that would help in better designs for new control architectures. The objective of this thesis is directed towards addressing the architectural challenges for the quantum-classical hardware for controlling the NISQ-era quantum devices and beyond. We analyze the control infrastructure requirements and propose a micro-architecture and waveform generation methodology to integrate the physical device with the quantum compilation tool-chain.
Computer Engineering
Advisors/Committee Members: Khammassi, Nader (mentor), Bertels, Koen (mentor), Sebastiano, Fabio (graduation committee), Ars%2C%20Zaid%22%29&pagesize-30">
Al-
Ars,
Zaid (graduation committee),
Delft University of Technology (degree granting institution).
Subjects/Keywords: Quantum Computing; Quantum Control Microarchitecture; Computer Architecture
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Yadav, A. (. (2019). CC-Spin: A Micro-architecture design for scalable control of Spin-Qubit Quantum Processor. (Masters Thesis). Delft University of Technology. Retrieved from http://resolver.tudelft.nl/uuid:5afeea39-eabf-4561-832f-35f93e6bcc3e
Chicago Manual of Style (16th Edition):
Yadav, Amitabh (author). “CC-Spin: A Micro-architecture design for scalable control of Spin-Qubit Quantum Processor.” 2019. Masters Thesis, Delft University of Technology. Accessed April 18, 2021.
http://resolver.tudelft.nl/uuid:5afeea39-eabf-4561-832f-35f93e6bcc3e.
MLA Handbook (7th Edition):
Yadav, Amitabh (author). “CC-Spin: A Micro-architecture design for scalable control of Spin-Qubit Quantum Processor.” 2019. Web. 18 Apr 2021.
Vancouver:
Yadav A(. CC-Spin: A Micro-architecture design for scalable control of Spin-Qubit Quantum Processor. [Internet] [Masters thesis]. Delft University of Technology; 2019. [cited 2021 Apr 18].
Available from: http://resolver.tudelft.nl/uuid:5afeea39-eabf-4561-832f-35f93e6bcc3e.
Council of Science Editors:
Yadav A(. CC-Spin: A Micro-architecture design for scalable control of Spin-Qubit Quantum Processor. [Masters Thesis]. Delft University of Technology; 2019. Available from: http://resolver.tudelft.nl/uuid:5afeea39-eabf-4561-832f-35f93e6bcc3e

Delft University of Technology
23.
Bălan, Alex (author).
MANtIS: a novel information seeking dialogues dataset.
Degree: 2019, Delft University of Technology
URL: http://resolver.tudelft.nl/uuid:0ab2d1e4-385e-43cf-9883-cfc6c2f3f19c
► Nowadays, most users access the web through search engine portals. However, information needs can often be ill-defined or too broad to be solvable by a…
(more)
▼ Nowadays, most users access the web through search engine portals. However, information needs can often be ill-defined or too broad to be solvable by a list of results the user has to scroll through, which implies that he is most likely required to refine the need by himself to reach the desired result. In recent years, researchers have attempted to tackle these issues through conversations, more specifically through conversational search. This topic has seen an increase of interest from the research community, proven by the appearance of specialized workshops and seminars. The general public has also started to show interest, proven by the emergence of a wide range of virtual assistants, such as Google Assistant, Microsoft Cortana or Amazon Alexa. As such conversational systems seek to fulfill an information need of a user, they should be able to elicit and fully understand his requirements regardless of the domain, track the conversation as it evolves while attempting to clarify the initial information need and provide suggestions and answers that are based on concrete knowledge sources. Although various developments in domains adjacent to conversational search enabled us to better understand natural language, there is a lack of large-scale datasets that are appropriate for training models to perform conversational search tasks. Through our research, we have built a collection of over 80,000 conversations that fulfill the requirements of a conversational search dataset. We have benchmarked this dataset on three distinct tasks using multiple baselines.
Computer Science | Data Science
Advisors/Committee Members: Hauff, Claudia (mentor), Tintarev, Nava (graduation committee), Ars%2C%20Zaid%22%29&pagesize-30">
Al-
Ars,
Zaid (graduation committee),
Delft University of Technology (degree granting institution).
Subjects/Keywords: conversational search; Conversation; Information Retrieval; Conversational Agent; Ranking
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Bălan, A. (. (2019). MANtIS: a novel information seeking dialogues dataset. (Masters Thesis). Delft University of Technology. Retrieved from http://resolver.tudelft.nl/uuid:0ab2d1e4-385e-43cf-9883-cfc6c2f3f19c
Chicago Manual of Style (16th Edition):
Bălan, Alex (author). “MANtIS: a novel information seeking dialogues dataset.” 2019. Masters Thesis, Delft University of Technology. Accessed April 18, 2021.
http://resolver.tudelft.nl/uuid:0ab2d1e4-385e-43cf-9883-cfc6c2f3f19c.
MLA Handbook (7th Edition):
Bălan, Alex (author). “MANtIS: a novel information seeking dialogues dataset.” 2019. Web. 18 Apr 2021.
Vancouver:
Bălan A(. MANtIS: a novel information seeking dialogues dataset. [Internet] [Masters thesis]. Delft University of Technology; 2019. [cited 2021 Apr 18].
Available from: http://resolver.tudelft.nl/uuid:0ab2d1e4-385e-43cf-9883-cfc6c2f3f19c.
Council of Science Editors:
Bălan A(. MANtIS: a novel information seeking dialogues dataset. [Masters Thesis]. Delft University of Technology; 2019. Available from: http://resolver.tudelft.nl/uuid:0ab2d1e4-385e-43cf-9883-cfc6c2f3f19c

Delft University of Technology
24.
Bloom, Rens (author).
Channel Analysis for Passive Communication with Ambient Light.
Degree: 2017, Delft University of Technology
URL: http://resolver.tudelft.nl/uuid:167a3e1a-5d54-45d1-87fc-dc4f1f7be9e5
► The idea of the internet of things (IoT) states that devices with embedded electronics will outnumber the devices operated by humans. All those "things" will…
(more)
▼ The idea of the internet of things (IoT) states that devices with embedded electronics will outnumber the devices operated by humans. All those "things" will require connectivity to exchange information gathered from processed sensor data. A large part of the data will be sent through the air, such that mobile devices can be deployed without a need to connect any wire. Radio frequencies are widely used for wireless communication and include many systems like WiFi, Bluetooth, FM radio stations and mobile telephony. The capacity of radio channels is limited, thus IoT devices might require different methods for communication. In this work we study the behaviour of a relatively new communication channel, called the passive light channel. The passive light channel uses reflections of light to send information in the visible spectrum. It works a bit similar to barcode readers in stores, but it doesn't require a laser. It is sufficient to reflect sunlight or light from other sources, providing a sustainable method of transmitting information. Patterns consisting of black and white parts, or barcodes in particular, are used to regulate the reflected light. A light sensor is used to detect the reflected light and read the information from the reflected light. The key question in this research is how much information we can convey in such manner. In this work we show our three main contributions. First, we show that the passive light channel works with barcodes and reflections of ambient light. Second, a mathematical model and simulation tools are presented that describe the behaviour of the channel. Finally, the performance of the passive light channel with ambient light is analysed using empirical testbeds and compared with results of optical simulations.
Embedded Systems
Advisors/Committee Members: Zuñiga Zamalloa, Marco (mentor), Langendoen, Koen (graduation committee), Ars%2C%20Zaid%22%29&pagesize-30">
Al-
Ars,
Zaid (graduation committee),
Delft University of Technology (degree granting institution).
Subjects/Keywords: Passive Sensing; Visible light communication; communication channel; diffuse reflections
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Bloom, R. (. (2017). Channel Analysis for Passive Communication with Ambient Light. (Masters Thesis). Delft University of Technology. Retrieved from http://resolver.tudelft.nl/uuid:167a3e1a-5d54-45d1-87fc-dc4f1f7be9e5
Chicago Manual of Style (16th Edition):
Bloom, Rens (author). “Channel Analysis for Passive Communication with Ambient Light.” 2017. Masters Thesis, Delft University of Technology. Accessed April 18, 2021.
http://resolver.tudelft.nl/uuid:167a3e1a-5d54-45d1-87fc-dc4f1f7be9e5.
MLA Handbook (7th Edition):
Bloom, Rens (author). “Channel Analysis for Passive Communication with Ambient Light.” 2017. Web. 18 Apr 2021.
Vancouver:
Bloom R(. Channel Analysis for Passive Communication with Ambient Light. [Internet] [Masters thesis]. Delft University of Technology; 2017. [cited 2021 Apr 18].
Available from: http://resolver.tudelft.nl/uuid:167a3e1a-5d54-45d1-87fc-dc4f1f7be9e5.
Council of Science Editors:
Bloom R(. Channel Analysis for Passive Communication with Ambient Light. [Masters Thesis]. Delft University of Technology; 2017. Available from: http://resolver.tudelft.nl/uuid:167a3e1a-5d54-45d1-87fc-dc4f1f7be9e5

Delft University of Technology
25.
Feenstra, Bastiaan (author).
High Throughput Sorting on FPGAs using High Bandwidth Memory.
Degree: 2020, Delft University of Technology
URL: http://resolver.tudelft.nl/uuid:64d44f78-3c0b-400d-afbe-b33e6be9c023
► In this thesis we explore the acceleration of sorting algorithms on FPGAs using high bandwidth memory (HBM). The target application is an FPGA as an…
(more)
▼ In this thesis we explore the acceleration of sorting algorithms on FPGAs using high bandwidth memory (HBM). The target application is an FPGA as an accelerator in an OpenCAPI enabled system, that enables the accelerator to access main memory of the host at a bandwidth of 25 GB/s for either read or write. We explore under what read and write access patterns the HBM bandwidth of 460.8 GB/s can be met and identify specific circumstances under which this bandwidth can be achieved. The sorting algorithm is implemented in hardware as two steps: partitioning and sorting. We design two partitioning architectures and one sorting architecture. The sorting architecture sorts buckets generated in the partitioning step and is based on merge sort. It uses HBM and wide merge trees to reduce the number of passes through a memory. The architectures themselves are to be instantiated multiple times on the FPGA to achieve a higher sorting throughput. Simulating each architecture at 225 MHz, they are all designed to output up to 3.6 GB/s of 8+8 byte key-value pairs under ideal conditions. We measure the first and second partitioning architectures and identify a bottleneck in HBM for the former, resulting in only 0.44 GB/s with a (uniformly) random input, due to a strided access pattern. With a sorted input, the throughput is 2.18 GB/s. The second partitioning architecture is not affected by this and achieves a throughput of approximately 1.7 GB/s for both types of input. The sorter performs best, sorting buckets at approximately 2.7 GB/s. Synthesis results show that the target FPGA has enough resources for tens of partitioners and sorters, allowing to create a sorting hardware that saturates system bandwidth.
Advisors/Committee Members: Ars%2C%20Zaid%22%29&pagesize-30">
Al-
Ars,
Zaid (mentor),
Bertels, Koen (graduation committee),
Hofstee, Peter (graduation committee),
Fang, Jian (graduation committee),
Delft University of Technology (degree granting institution).
Subjects/Keywords: FPGA; acceleration; OpenCAPI; High Bandwidth Memory; HBM; partitioning; sorting; bucket
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Feenstra, B. (. (2020). High Throughput Sorting on FPGAs using High Bandwidth Memory. (Masters Thesis). Delft University of Technology. Retrieved from http://resolver.tudelft.nl/uuid:64d44f78-3c0b-400d-afbe-b33e6be9c023
Chicago Manual of Style (16th Edition):
Feenstra, Bastiaan (author). “High Throughput Sorting on FPGAs using High Bandwidth Memory.” 2020. Masters Thesis, Delft University of Technology. Accessed April 18, 2021.
http://resolver.tudelft.nl/uuid:64d44f78-3c0b-400d-afbe-b33e6be9c023.
MLA Handbook (7th Edition):
Feenstra, Bastiaan (author). “High Throughput Sorting on FPGAs using High Bandwidth Memory.” 2020. Web. 18 Apr 2021.
Vancouver:
Feenstra B(. High Throughput Sorting on FPGAs using High Bandwidth Memory. [Internet] [Masters thesis]. Delft University of Technology; 2020. [cited 2021 Apr 18].
Available from: http://resolver.tudelft.nl/uuid:64d44f78-3c0b-400d-afbe-b33e6be9c023.
Council of Science Editors:
Feenstra B(. High Throughput Sorting on FPGAs using High Bandwidth Memory. [Masters Thesis]. Delft University of Technology; 2020. Available from: http://resolver.tudelft.nl/uuid:64d44f78-3c0b-400d-afbe-b33e6be9c023

Delft University of Technology
26.
de Koning, Dorian (author).
Full-system After-cache Memory Tracing for Multi-core Systems using a Distributed Cache Simulator.
Degree: 2020, Delft University of Technology
URL: http://resolver.tudelft.nl/uuid:afb49e79-0595-4b6c-a337-ab6b448db543
► The gap between CPU and memory performance becomes increasingly larger. Together with a growing memory pressure caused by higher CPU core counts combined with multi-tenant…
(more)
▼ The gap between CPU and memory performance becomes increasingly larger. Together with a growing memory pressure caused by higher CPU core counts combined with multi-tenant systems, this causes the need for new memory technologies. Recently, various technologies are becoming available for commercial use. Examples of these technologies are memory types like non-volatile RAM. These technologies generally have different characteristics than traditional DRAM. To be able to fully utilise the potential of these new memory types, a better understanding of memory usage in modern systems is required. A way to gain a better understanding is through memory traces. Solutions that are currently available either do not support multi-core architectures or cause a severe slowdown. Therefore, this thesis presents a novel approach to gather full-system after-cache memory access traces. The proposed system is a hybrid framework which consists of the QEMU emulator combined with a custom distributed cache and page table simulator. A modified version of QEMU, called QMEMU, is devised to improve tracing performance and allow tracing instruction fetches. By leveraging the existing tracing functionality of QEMU only a small amount of modifications have to be made to QEMU. The traces produced by QMEMU contain virtual addresses. However, for accurate cache simulation, the physical addresses have to be used. Tracing the physical address instead of the virtual address for each memory access is shown to cause a 70% slowdown when using QMEMU. To find these physical addresses for the traced accesses, a novel approach is employed. This approach simulates the guest page tables outside the critical path for memory tracing and therefore does not decrease performance. Using QMEMU traces can be gathered with a speedup of up to 42.6 times over the gem5 simulator for benchmarks of the PARSEC suite. In the second part of the framework, which performs memory, cache, and page table simulation, cache simulation is found the most computationally intensive task. Therefore in the proposed framework cache simulation is performed in a parallel and distributed manner. Most modern systems use set-associative caches, simulation can be parallelised without reducing accuracy by dividing the memory access traces based on these cache sets. Using this approach 10 Million accesses can be processed per second by the simulator when simulating a single modern cache hierarchy. When simulating 7 different cache hierarchies concurrently a throughput of 6 Million accesses per second is reached. The simulated guest page tables provide additional information like the number of accesses or virtual memory size for each process of the guest workload. This information can be used to decrease the size of the semantic gap between memory traces and their meaning. The proposed framework is evaluated by comparing it to CMP$im and gem5 using the PARSEC benchmark suite.
Computer Science
Advisors/Committee Members: Rellermeyer, Jan S. (mentor), Pouwelse, Johan (mentor), Ars%2C%20Zaid%22%29&pagesize-30">
Al-
Ars,
Zaid (graduation committee),
Delft University of Technology (degree granting institution).
Subjects/Keywords: Cache simulation; QEMU; NVRAM; Distributed; Full-system; Tracing; Memory; gem5
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
de Koning, D. (. (2020). Full-system After-cache Memory Tracing for Multi-core Systems using a Distributed Cache Simulator. (Masters Thesis). Delft University of Technology. Retrieved from http://resolver.tudelft.nl/uuid:afb49e79-0595-4b6c-a337-ab6b448db543
Chicago Manual of Style (16th Edition):
de Koning, Dorian (author). “Full-system After-cache Memory Tracing for Multi-core Systems using a Distributed Cache Simulator.” 2020. Masters Thesis, Delft University of Technology. Accessed April 18, 2021.
http://resolver.tudelft.nl/uuid:afb49e79-0595-4b6c-a337-ab6b448db543.
MLA Handbook (7th Edition):
de Koning, Dorian (author). “Full-system After-cache Memory Tracing for Multi-core Systems using a Distributed Cache Simulator.” 2020. Web. 18 Apr 2021.
Vancouver:
de Koning D(. Full-system After-cache Memory Tracing for Multi-core Systems using a Distributed Cache Simulator. [Internet] [Masters thesis]. Delft University of Technology; 2020. [cited 2021 Apr 18].
Available from: http://resolver.tudelft.nl/uuid:afb49e79-0595-4b6c-a337-ab6b448db543.
Council of Science Editors:
de Koning D(. Full-system After-cache Memory Tracing for Multi-core Systems using a Distributed Cache Simulator. [Masters Thesis]. Delft University of Technology; 2020. Available from: http://resolver.tudelft.nl/uuid:afb49e79-0595-4b6c-a337-ab6b448db543

Delft University of Technology
27.
van Bremen, Lennart (author).
ρ-VEX ASIC: The Design of an ASIC for a Dynamically Reconfigurable VLIW Processor with 24-port Register File.
Degree: 2017, Delft University of Technology
URL: http://resolver.tudelft.nl/uuid:a007db49-10c7-4ae6-ac84-1cf0dfdb818d
► The ρ-VEX is a runtime reconfigurable VLIW processor. It is able to exploit both ILP as well as TLP by running one program in multiple…
(more)
▼ The ρ-VEX is a runtime reconfigurable VLIW processor. It is able to exploit both ILP as well as TLP by running one program in multiple lanes, or several programs concurrently. To accurately quantify its performance compared to other processors, it is implemented as an IC. A fully automatic scripted flow is described, constructing an optimized, error-free ASIC. The design area is limited to 3.5 um², to allow it to be manufactured with the cheapest 65 nm prototyping run available at IMEC. In order to achieve this small design area, a new 24-port register file design has to be devised. In its current state, it is implemented using standard logic cells. Clever optimizations involving cell padding and minimum overhead power routing are necessary to reduce a total routing overcongestion from 7% to zero. This implementation is expected achieve a clock speed of 141 MHz, only limited by the ρ-VEX reconfiguration logic. By lowering the frequency to 100 MHz, the energy dissipation is reduced by 96%, to a total of 367 uW/MHz. This is comparable to similar VLIW designs. Using LVDS communication clocked at four times the core frequency, a data bandwidth of 4 Gbit/s is achieved using only 26 external pins. This allows the main data and instruction memory to reside off-chip. To ensure a correct design, it is verified using DRC, LVS, and ERC. For improved reliability, IR-drop is kept within 1% of the core voltage, using minimal power routing. Furthermore, the effect of electromigration is kept extremely low such that the mean time to failure on nets is 90 years. The scripted flow will aid in prototyping, allowing accurate estimates to be calculated in a mere 7% of the original time. Proposals for a future design are expected to reduce the register file area and power dissipation by more than 40%. By improving the pipeline and shortening the critical path, a two to four fold improvement in clock speed can be expected, without adversely affecting power dissipation.
Advisors/Committee Members: Wong, Stephan (mentor), van Leuken, Rene (graduation committee), Ars%2C%20Zaid%22%29&pagesize-30">
Al-
Ars,
Zaid (graduation committee),
Delft University of Technology (degree granting institution).
Subjects/Keywords: r-VEX; VLIW; ASIC; LVDS; Register file; UMC; 65nm
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
van Bremen, L. (. (2017). ρ-VEX ASIC: The Design of an ASIC for a Dynamically Reconfigurable VLIW Processor with 24-port Register File. (Masters Thesis). Delft University of Technology. Retrieved from http://resolver.tudelft.nl/uuid:a007db49-10c7-4ae6-ac84-1cf0dfdb818d
Chicago Manual of Style (16th Edition):
van Bremen, Lennart (author). “ρ-VEX ASIC: The Design of an ASIC for a Dynamically Reconfigurable VLIW Processor with 24-port Register File.” 2017. Masters Thesis, Delft University of Technology. Accessed April 18, 2021.
http://resolver.tudelft.nl/uuid:a007db49-10c7-4ae6-ac84-1cf0dfdb818d.
MLA Handbook (7th Edition):
van Bremen, Lennart (author). “ρ-VEX ASIC: The Design of an ASIC for a Dynamically Reconfigurable VLIW Processor with 24-port Register File.” 2017. Web. 18 Apr 2021.
Vancouver:
van Bremen L(. ρ-VEX ASIC: The Design of an ASIC for a Dynamically Reconfigurable VLIW Processor with 24-port Register File. [Internet] [Masters thesis]. Delft University of Technology; 2017. [cited 2021 Apr 18].
Available from: http://resolver.tudelft.nl/uuid:a007db49-10c7-4ae6-ac84-1cf0dfdb818d.
Council of Science Editors:
van Bremen L(. ρ-VEX ASIC: The Design of an ASIC for a Dynamically Reconfigurable VLIW Processor with 24-port Register File. [Masters Thesis]. Delft University of Technology; 2017. Available from: http://resolver.tudelft.nl/uuid:a007db49-10c7-4ae6-ac84-1cf0dfdb818d

Delft University of Technology
28.
van der Vlag, Michiel (author).
Multi-GPU Brain: A multi-node implementation for an extended Hodgkin-Huxley simulator.
Degree: 2019, Delft University of Technology
URL: http://resolver.tudelft.nl/uuid:6e93f997-1b8a-4af6-bfaa-261c205a9b04
► Current brain simulators do no scale linearly to realistic problem sizes (e.g. >100,000 neurons), which makes them impractical for researchers. The goal of the thesis…
(more)
▼ Current brain simulators do no scale linearly to realistic problem sizes (e.g. >100,000 neurons), which makes them impractical for researchers. The goal of the thesis is to explore the use of true multi-GPU acceleration on computationally challenging brain models and to assess the scalability of such models given sufficient access to multi-node acceleration platforms. The brain model used is a state-of-the-art, extended HodgkinHuxley, biophysically- meaningful, three-compartmental model of the inferior-olivary nucleus. Not only the simulation of the cells, but also the setup of the network is taken into account when designing and benchmarking the multi-GPU version. For network sizes varying from 65K to 4M cells, 10, 100 and 1000 synapses per neuron are simulated. These simulations are executed on 8, 16, 32 and 48 GPUs. A Gaussian-distributed network of 4 million cells with a density of 1,000 synapses per neuron executed on 48 GPUs, is setup and simulated in 465.69 seconds, of which the cell-computation phase takes 4.57 seconds, obtaining a speedup of 50 times the execution time on a single GPU. A uniform-distributed network of same size and density is setup and simulated in 889.89 seconds of which the cell-computation phase takes 10.09 seconds, obtaining a speedup of 8 times the execution on a single GPU. For the implemented design, the inter-GPU communication becomes the major bottleneck, as latency increases when the sent packet sizes increase. This communication overhead does not dominate the overall execution while scaling network sizes is tractable. This scalable design gives a good prospect for neuroscientists, proving that large network-size simulations are possible, using a multi-GPU setup.
Computer Engineering
Advisors/Committee Members: Strydis, C (mentor), Ars%2C%20Zaid%22%29&pagesize-30">
Al-
Ars,
Zaid (graduation committee),
van Genderen, Arjan (graduation committee),
Delft University of Technology (degree granting institution).
Subjects/Keywords: multi GPU; brain simulator; cluster; Hodgkin-Huxley; gap junctions; neuron network; Cartesius; multi compartment; linear scaling
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
van der Vlag, M. (. (2019). Multi-GPU Brain: A multi-node implementation for an extended Hodgkin-Huxley simulator. (Masters Thesis). Delft University of Technology. Retrieved from http://resolver.tudelft.nl/uuid:6e93f997-1b8a-4af6-bfaa-261c205a9b04
Chicago Manual of Style (16th Edition):
van der Vlag, Michiel (author). “Multi-GPU Brain: A multi-node implementation for an extended Hodgkin-Huxley simulator.” 2019. Masters Thesis, Delft University of Technology. Accessed April 18, 2021.
http://resolver.tudelft.nl/uuid:6e93f997-1b8a-4af6-bfaa-261c205a9b04.
MLA Handbook (7th Edition):
van der Vlag, Michiel (author). “Multi-GPU Brain: A multi-node implementation for an extended Hodgkin-Huxley simulator.” 2019. Web. 18 Apr 2021.
Vancouver:
van der Vlag M(. Multi-GPU Brain: A multi-node implementation for an extended Hodgkin-Huxley simulator. [Internet] [Masters thesis]. Delft University of Technology; 2019. [cited 2021 Apr 18].
Available from: http://resolver.tudelft.nl/uuid:6e93f997-1b8a-4af6-bfaa-261c205a9b04.
Council of Science Editors:
van der Vlag M(. Multi-GPU Brain: A multi-node implementation for an extended Hodgkin-Huxley simulator. [Masters Thesis]. Delft University of Technology; 2019. Available from: http://resolver.tudelft.nl/uuid:6e93f997-1b8a-4af6-bfaa-261c205a9b04

Delft University of Technology
29.
Heiss, Jonathan (author).
Improving the Performance of the Variant Calling Workflow for DNA Sequencing.
Degree: 2017, Delft University of Technology
URL: http://resolver.tudelft.nl/uuid:7d02ec4a-0d99-453a-8950-f54287d91e2a
► The growing DNA data volumes that originate from novel efficient DNA sequencing methods expose new challenges to computer systems used to process this genomic data.…
(more)
▼ The growing DNA data volumes that originate from novel efficient DNA sequencing methods expose new challenges to computer systems used to process this genomic data. BigData technologies in the Hadoop environment, in particular Apache Spark and the Hadoop Distributed File System (HDFS), are increasingly adapted in state-of-the-art Bioinformatic tools. One application domain of such tools is the Variant Calling Workflow (VCW) that is subject of this work’s research. The application of Spark-based open source tools to execute the different VCW results in a tool chain of separate programs. The programs are executed consecutively and consume the data that is produced by the preceding program as input. This data sharing represents an additional workload as the data needs to be transformed into a file format, written to disk and read from disk again by the next application. In our first research question we examine whether performance can be increased by improving data sharing. As improving measures we propose (1) the elimination of the single output file generation that is native to most open source Bioinformatics tools and (2) the application of the distributed in-memory file system, Alluxio, as in-memory layer for data sharing between any two consecutive VCW applications. While we achieved in our experiments for (1) an impressive performance boost of 17 % we could not improve performance in our experiments for (2). In our second research questions we investigate how the data throughput can be improved by changing the execution modes of the VCW. The growing DNA data is mainly represented by a larger quantity of DNA samples. Hence, it is important to optimize the VCW execution for multi-sample input. As part of the second research question we propose different execution modes and show in our experiments that concurrent workflow execution can improve the overall runtime performance in our VCW using 3 input samples of 10GB by 15 % compared to sequential execution.
ICT Innovation (Cloud Computing and Services)
Advisors/Committee Members: Epema, Dick (mentor), Datema, E (mentor), Ars%2C%20Zaid%22%29&pagesize-30">
Al-
Ars,
Zaid (graduation committee),
Delft University of Technology (degree granting institution).
Subjects/Keywords: Variant Calling; Workflow; DNA Sequencing; Spark; Hadoop; In-Memory File System; Performance
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Heiss, J. (. (2017). Improving the Performance of the Variant Calling Workflow for DNA Sequencing. (Masters Thesis). Delft University of Technology. Retrieved from http://resolver.tudelft.nl/uuid:7d02ec4a-0d99-453a-8950-f54287d91e2a
Chicago Manual of Style (16th Edition):
Heiss, Jonathan (author). “Improving the Performance of the Variant Calling Workflow for DNA Sequencing.” 2017. Masters Thesis, Delft University of Technology. Accessed April 18, 2021.
http://resolver.tudelft.nl/uuid:7d02ec4a-0d99-453a-8950-f54287d91e2a.
MLA Handbook (7th Edition):
Heiss, Jonathan (author). “Improving the Performance of the Variant Calling Workflow for DNA Sequencing.” 2017. Web. 18 Apr 2021.
Vancouver:
Heiss J(. Improving the Performance of the Variant Calling Workflow for DNA Sequencing. [Internet] [Masters thesis]. Delft University of Technology; 2017. [cited 2021 Apr 18].
Available from: http://resolver.tudelft.nl/uuid:7d02ec4a-0d99-453a-8950-f54287d91e2a.
Council of Science Editors:
Heiss J(. Improving the Performance of the Variant Calling Workflow for DNA Sequencing. [Masters Thesis]. Delft University of Technology; 2017. Available from: http://resolver.tudelft.nl/uuid:7d02ec4a-0d99-453a-8950-f54287d91e2a

Delft University of Technology
30.
Dirks, Jacko (author).
Efficient inter-thread communication on the reconfigurable ρ-VEX.
Degree: 2018, Delft University of Technology
URL: http://resolver.tudelft.nl/uuid:9287bcea-f455-42fc-9efe-9417dbcbd4ad
► This thesis documents the implementation of atomic instructions for the ρ-VEX (reconfigurable VEX). These instructions enable threads to communicate enabling efficient multithreading. Furthermore, we investigate…
(more)
▼ This thesis documents the implementation of atomic instructions for the ρ-VEX (reconfigurable VEX). These instructions enable threads to communicate enabling efficient multithreading. Furthermore, we investigate the possibility to use inter-thread communication to improve performance of static ρ-VEX configurations without significantly increasing the area. Benchmark results show that the ρ-VEX can perform up to 1.33 times better because of the addition of atomic instructions. Moreover, the combination of reconfigurability and inter-thread communication is investigated to determine the possible performance improvement resulting from this combination. A theoretical model is created which predicts that a runtime reconfigurable ρ-VEX is able to outperform any static ρ-VEX setup. Given ideal circumstances, the runtime reconfigurable ρ-VEX can be 20% to 100% faster than any static ρ-VEX. In addition, this thesis documents the implementation of a bridge which intertwines the ρ-VEX with ARM’s ZYNQ system, a single chip containing an ARM processor and a Xilinx field-programmable gate array (FPGA). This bridge gives the ρ-VEX access to a 512 MiB memory on the Basys PYNQ.
Advisors/Committee Members: Wong, Stephan (mentor), Ars%2C%20Zaid%22%29&pagesize-30">
Al-
Ars,
Zaid (graduation committee),
Spaan, Matthijs (graduation committee),
Delft University of Technology (degree granting institution).
Subjects/Keywords: rVEX; multithreading; Performance
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Dirks, J. (. (2018). Efficient inter-thread communication on the reconfigurable ρ-VEX. (Masters Thesis). Delft University of Technology. Retrieved from http://resolver.tudelft.nl/uuid:9287bcea-f455-42fc-9efe-9417dbcbd4ad
Chicago Manual of Style (16th Edition):
Dirks, Jacko (author). “Efficient inter-thread communication on the reconfigurable ρ-VEX.” 2018. Masters Thesis, Delft University of Technology. Accessed April 18, 2021.
http://resolver.tudelft.nl/uuid:9287bcea-f455-42fc-9efe-9417dbcbd4ad.
MLA Handbook (7th Edition):
Dirks, Jacko (author). “Efficient inter-thread communication on the reconfigurable ρ-VEX.” 2018. Web. 18 Apr 2021.
Vancouver:
Dirks J(. Efficient inter-thread communication on the reconfigurable ρ-VEX. [Internet] [Masters thesis]. Delft University of Technology; 2018. [cited 2021 Apr 18].
Available from: http://resolver.tudelft.nl/uuid:9287bcea-f455-42fc-9efe-9417dbcbd4ad.
Council of Science Editors:
Dirks J(. Efficient inter-thread communication on the reconfigurable ρ-VEX. [Masters Thesis]. Delft University of Technology; 2018. Available from: http://resolver.tudelft.nl/uuid:9287bcea-f455-42fc-9efe-9417dbcbd4ad
◁ [1] [2] ▶
.