You searched for subject:(LU factorization)
.
Showing records 1 – 20 of
20 total matches.
No search limiters apply to these results.

Texas A&M University
1.
Belsare, Aditya Sanjay.
Sparse LU Factorization for Large Circuit Matrices on Heterogenous Parallel Computing Platforms.
Degree: MS, Computer Engineering, 2014, Texas A&M University
URL: http://hdl.handle.net/1969.1/153210
► Direct sparse solvers are traditionally known to be robust, yet difficult to parallelize. In the context of circuit simulators, they present an important bottleneck where…
(more)
▼ Direct sparse solvers are traditionally known to be robust, yet difficult to parallelize. In the context of circuit simulators, they present an important bottleneck where the key steps of
LU factorization and forward-backward substitution are repeatedly performed to reach the solution. Limited speedups have been obtained on multi-core CPUs as well as GPUs owing to the strong data dependency in these steps. With the advent of many-core coprocessors like the Intel Xeon Phi with fewer yet
powerful cores and wider vector units, sparse
LU factorization can be optimized for higher speedups compared to traditional
LU decomposition methods like the Gilbert Peierl's algorithm. In this thesis, we first establish Sparse Compressed Row (CSR) as the preferred data structure amongst other popular sparse matrix representations for parallelizing sparse circuit solvers, irrespective of the architecture used. Next, we propose and implement a sparse circuit solver suited for parallelization on both the Nvidia GPU and Intel Xeon Phi platform, which is amenable to vectorization and takes advantage of hardware support, if any, for gather-scatter operations. Finally, we analyze our implementation on multi-core, SIMD and SIMT architectures namely Intel Xeon CPU, Intel Xeon Phi coprocessor and an Nvidia GPU respectively, each using different programming models suited for the respective platform to determine the architecture best suited for parallelizing direct sparse matrix solvers. Our parallel sparse
LU factorization achieves an average speedup of 7.18x on the Xeon Phi
and 2.75x in case of the GPU implementation on GTX 680 over an Intel 4-core i7 CPU, which is up to 13x faster than a single threaded implementation.
Advisors/Committee Members: Liu, Jyh-Charn (Steve) (advisor), Hu, Jiang (advisor), Gratz, Paul V (committee member).
Subjects/Keywords: Sparse matrix solver; LU Factorization
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Belsare, A. S. (2014). Sparse LU Factorization for Large Circuit Matrices on Heterogenous Parallel Computing Platforms. (Masters Thesis). Texas A&M University. Retrieved from http://hdl.handle.net/1969.1/153210
Chicago Manual of Style (16th Edition):
Belsare, Aditya Sanjay. “Sparse LU Factorization for Large Circuit Matrices on Heterogenous Parallel Computing Platforms.” 2014. Masters Thesis, Texas A&M University. Accessed March 01, 2021.
http://hdl.handle.net/1969.1/153210.
MLA Handbook (7th Edition):
Belsare, Aditya Sanjay. “Sparse LU Factorization for Large Circuit Matrices on Heterogenous Parallel Computing Platforms.” 2014. Web. 01 Mar 2021.
Vancouver:
Belsare AS. Sparse LU Factorization for Large Circuit Matrices on Heterogenous Parallel Computing Platforms. [Internet] [Masters thesis]. Texas A&M University; 2014. [cited 2021 Mar 01].
Available from: http://hdl.handle.net/1969.1/153210.
Council of Science Editors:
Belsare AS. Sparse LU Factorization for Large Circuit Matrices on Heterogenous Parallel Computing Platforms. [Masters Thesis]. Texas A&M University; 2014. Available from: http://hdl.handle.net/1969.1/153210

University of Cincinnati
2.
THIYAGARAJAN, SANJEEV.
REDUCING MEMORY SPACE FOR COMPLETELY UNROLLED LU
FACTORIZATION OF SPARSE MATRICES.
Degree: MS, Engineering : Computer Engineering, 2001, University of Cincinnati
URL: http://rave.ohiolink.edu/etdc/view?acc_num=ucin990556295
► Complete Loop Unrolling (CLU) is a possible approach for improving the execution time of Sparse Lower/Upper (LU) factorization. The amount of memory space occupied by…
(more)
▼ Complete Loop Unrolling (CLU) is a possible approach
for improving the execution time of Sparse Lower/Upper (
LU)
factorization. The amount of memory space occupied by the
interpretive code generated using CLU technique is usually traded
off for better execution gain. But in some problem domains such as
analog circuit simulation, a large number of factorizations are
required to solve sparse matrices and the storage space occupied by
interpretive code for such problems are large and easily consumes
the available memory(or exceeds it!). This thesis investigates
compression techniques to reduce storage space for the interpretive
code sequence. In particular, we present the investigation of a
combination of four compression approaches namely 1) Delta Encoding
2) Clustering the interpretive code 3) Reordering the data elements
in memory and the general compression method 4) LZW compression.
The clustering method is unique to our study for which a greedy
partitioning technique is proposed and analyzed. Two other
partitioning schemes, random partitioning and partitioning on
divide instructions were implemented and compared with the greedy
partitioning approach. The results show that the greedy approach is
better than the other two. Experiments demonstrate that the greedy
partitioning technique applied on delta-encoded sequence is
effective in decreasing the storage space of interpretive code
sequence. We also explored the possibility of cost minimization by
reordering the data elements in memory such that the delta distance
between adjacent operands is decreased. Compression improvement was
seen for matrices 300 X 300 and smaller.
Advisors/Committee Members: Carter, Dr. Harold W. (Advisor).
Subjects/Keywords: compression; LU factorization; loop unrolling
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
THIYAGARAJAN, S. (2001). REDUCING MEMORY SPACE FOR COMPLETELY UNROLLED LU
FACTORIZATION OF SPARSE MATRICES. (Masters Thesis). University of Cincinnati. Retrieved from http://rave.ohiolink.edu/etdc/view?acc_num=ucin990556295
Chicago Manual of Style (16th Edition):
THIYAGARAJAN, SANJEEV. “REDUCING MEMORY SPACE FOR COMPLETELY UNROLLED LU
FACTORIZATION OF SPARSE MATRICES.” 2001. Masters Thesis, University of Cincinnati. Accessed March 01, 2021.
http://rave.ohiolink.edu/etdc/view?acc_num=ucin990556295.
MLA Handbook (7th Edition):
THIYAGARAJAN, SANJEEV. “REDUCING MEMORY SPACE FOR COMPLETELY UNROLLED LU
FACTORIZATION OF SPARSE MATRICES.” 2001. Web. 01 Mar 2021.
Vancouver:
THIYAGARAJAN S. REDUCING MEMORY SPACE FOR COMPLETELY UNROLLED LU
FACTORIZATION OF SPARSE MATRICES. [Internet] [Masters thesis]. University of Cincinnati; 2001. [cited 2021 Mar 01].
Available from: http://rave.ohiolink.edu/etdc/view?acc_num=ucin990556295.
Council of Science Editors:
THIYAGARAJAN S. REDUCING MEMORY SPACE FOR COMPLETELY UNROLLED LU
FACTORIZATION OF SPARSE MATRICES. [Masters Thesis]. University of Cincinnati; 2001. Available from: http://rave.ohiolink.edu/etdc/view?acc_num=ucin990556295

University of California – Riverside
3.
Davies, Teresa.
Checksum-Based Fault Tolerance for LU Factorization.
Degree: Computer Science, 2014, University of California – Riverside
URL: http://www.escholarship.org/uc/item/7tk439n9
► In high-performance systems, the probability of failure is higher for larger systems. The probability that a failure will occur before the end of the computation…
(more)
▼ In high-performance systems, the probability of failure is higher for larger systems. The probability that a failure will occur before the end of the computation increases as the number of processors used in a high performance computing application increases. For long running applications using a large number of processors, it is essential that fault tolerance be used to prevent a total loss of all finished computations after a failure. There are two main classes of errors: errors involving loss of data, and errors involving corruption of data. A fail-stop failure, where a process is lost along with its data, can be handled for any application with checkpointing. While checkpointing has been very useful to tolerate failures for a long time, it often introduces a considerable overhead especially when applications modify a large amount of memory between checkpoints and the number of processors is large. Therefore an application-specific approach to handling fail-stop failures allows fault tolerance with much lower overhead. Errors in calculations that cannot be detected by any other means are called soft errors, and usually are errors in the data that cause the program to deliver the wrong result at the end, but do not have any other effect that would indicate an error occurred. It has been proved that, if LU factorization is used to factor a checksum matrix, the checksum relationship will be preserved at the end of the computation. For some LU factorization algorithms, the checksum relationship in the input checksum matrix is not maintained in the middle of the computation. However, in the right-looking LU factorization algorithm, the checksum relationship will be maintained at each step. Based on this checksum relationship, errors in the LU factorization can be detected and corrected before they are propagated. The frequency of checking can be adjusted for the error rate, resulting in a flexible method of fault tolerance. Experimental results on the supercomputer Jaguar demonstrate that the fault tolerance overhead introduced by the proposed recovery scheme is minimal. These techniques are guaranteed to handle at least one error. In certain cases, multiple simultaneous errors can be handled, but it is not guaranteed. If multiple errors are likely, multiple checksums can be used to handle more than one simultaneous error. Multiple checksums can be created using coefficients to calculate different sums from the same set of elements. Since the calculations to recover from an error use real numbers, it is extremely important that the coefficients be chosen to minimize the error created by a recovery, and this is the main challenge for adding the ability to recover from multiple errors.
Subjects/Keywords: Computer science; algorithm-based recovery; fault tolerance; high performance computing; LU factorization
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Davies, T. (2014). Checksum-Based Fault Tolerance for LU Factorization. (Thesis). University of California – Riverside. Retrieved from http://www.escholarship.org/uc/item/7tk439n9
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
Chicago Manual of Style (16th Edition):
Davies, Teresa. “Checksum-Based Fault Tolerance for LU Factorization.” 2014. Thesis, University of California – Riverside. Accessed March 01, 2021.
http://www.escholarship.org/uc/item/7tk439n9.
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
MLA Handbook (7th Edition):
Davies, Teresa. “Checksum-Based Fault Tolerance for LU Factorization.” 2014. Web. 01 Mar 2021.
Vancouver:
Davies T. Checksum-Based Fault Tolerance for LU Factorization. [Internet] [Thesis]. University of California – Riverside; 2014. [cited 2021 Mar 01].
Available from: http://www.escholarship.org/uc/item/7tk439n9.
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
Council of Science Editors:
Davies T. Checksum-Based Fault Tolerance for LU Factorization. [Thesis]. University of California – Riverside; 2014. Available from: http://www.escholarship.org/uc/item/7tk439n9
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
4.
Herrmann, Julien.
Memory-aware Algorithms and Scheduling Techniques for Matrix Computattions : Algorithmes orientés mémoire et techniques d'ordonnancement pour le calcul matriciel.
Degree: Docteur es, Informatique, 2015, Lyon, École normale supérieure
URL: http://www.theses.fr/2015ENSL1043
► Dans cette thèse, nous nous sommes penchés d’un point de vue à la foisthéorique et pratique sur la conception d’algorithmes et detechniques d’ordonnancement adaptées aux…
(more)
▼ Dans cette thèse, nous nous sommes penchés d’un point de vue à la foisthéorique et pratique sur la conception d’algorithmes et detechniques d’ordonnancement adaptées aux architectures complexes dessuperordinateurs modernes. Nous nous sommes en particulier intéressésà l’utilisation mémoire et la gestion des communications desalgorithmes pour le calcul haute performance (HPC). Nous avonsexploité l’hétérogénéité des superordinateurs modernes pour améliorerles performances du calcul matriciel. Nous avons étudié lapossibilité d’alterner intelligemment des étapes de factorisation LU(plus rapide) et des étapes de factorisation QR (plus stablenumériquement mais plus deux fois plus coûteuses) pour résoudre unsystème linéaire dense. Nous avons amélioré les performances desystèmes d’exécution dynamique à l’aide de pré-calculs statiquesprenants en compte l’ensemble du graphe de tâches de la factorisationCholesky ainsi que l’hétérogénéité de l’architecture. Nous noussommes intéressés à la complexité du problème d’ordonnancement degraphes de tâches utilisant de gros fichiers d’entrée et de sortiesur une architecture hétérogène avec deux types de ressources,utilisant chacune une mémoire spécifique. Nous avons conçu denombreuses heuristiques en temps polynomial pour la résolution deproblèmes généraux que l’on avait prouvés NP-complet aupréalable. Enfin, nous avons conçu des algorithmes optimaux pourordonnancer un graphe de différentiation automatique sur uneplateforme avec deux types de mémoire : une mémoire gratuite maislimitée et une mémoire coûteuse mais illimitée.
Throughout this thesis, we have designed memory-aware algorithms and scheduling techniques suitedfor modern memory architectures. We have shown special interest in improving the performance ofmatrix computations on multiple levels. At a high level, we have introduced new numerical algorithmsfor solving linear systems on large distributed platforms. Most of the time, these linear solvers rely onruntime systems to handle resources allocation and data management. We also focused on improving thedynamic schedulers embedded in these runtime systems by adding static information to their decisionprocess. We proposed new memory-aware dynamic heuristics to schedule workflows, that could beimplemented in such runtime systems.Altogether, we have dealt with multiple state-of-the-art factorization algorithms used to solve linearsystems, like the LU, QR and Cholesky factorizations. We targeted different platforms ranging frommulticore processors to distributed memory clusters, and worked with several reference runtime systemstailored for these architectures, such as P A RSEC and StarPU. On a theoretical side, we took specialcare of modelling convoluted hierarchical memory architectures. We have classified the problems thatare arising when dealing with these storage platforms. We have designed many efficient polynomial-timeheuristics on general problems that had been shown NP-complete beforehand.
Advisors/Committee Members: Robert, Yves (thesis director).
Subjects/Keywords: Ordonnancement multi-critère; Algorithmes numériques; Factorisation LU; Factorisation QR; Factorisation Cholesky; Calcul haute performance; Systèmes linéaires; Différentiation automatique; Scheduling; Numerical algorithms; LU factorization; QR factorization; Cholesky factorization; High performance computing; Linear systems; Automatic differentiation
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Herrmann, J. (2015). Memory-aware Algorithms and Scheduling Techniques for Matrix Computattions : Algorithmes orientés mémoire et techniques d'ordonnancement pour le calcul matriciel. (Doctoral Dissertation). Lyon, École normale supérieure. Retrieved from http://www.theses.fr/2015ENSL1043
Chicago Manual of Style (16th Edition):
Herrmann, Julien. “Memory-aware Algorithms and Scheduling Techniques for Matrix Computattions : Algorithmes orientés mémoire et techniques d'ordonnancement pour le calcul matriciel.” 2015. Doctoral Dissertation, Lyon, École normale supérieure. Accessed March 01, 2021.
http://www.theses.fr/2015ENSL1043.
MLA Handbook (7th Edition):
Herrmann, Julien. “Memory-aware Algorithms and Scheduling Techniques for Matrix Computattions : Algorithmes orientés mémoire et techniques d'ordonnancement pour le calcul matriciel.” 2015. Web. 01 Mar 2021.
Vancouver:
Herrmann J. Memory-aware Algorithms and Scheduling Techniques for Matrix Computattions : Algorithmes orientés mémoire et techniques d'ordonnancement pour le calcul matriciel. [Internet] [Doctoral dissertation]. Lyon, École normale supérieure; 2015. [cited 2021 Mar 01].
Available from: http://www.theses.fr/2015ENSL1043.
Council of Science Editors:
Herrmann J. Memory-aware Algorithms and Scheduling Techniques for Matrix Computattions : Algorithmes orientés mémoire et techniques d'ordonnancement pour le calcul matriciel. [Doctoral Dissertation]. Lyon, École normale supérieure; 2015. Available from: http://www.theses.fr/2015ENSL1043

Université Paris-Sud – Paris XI
5.
Donfack, Simplice.
Methods and algorithms for solving linear systems of equations on massively parallel computers : Méthodes et algorithmes pour la résolution des systèmes d'équations linéaires sur les ordinateurs massivement parallèles.
Degree: Docteur es, Informatique, 2012, Université Paris-Sud – Paris XI
URL: http://www.theses.fr/2012PA112042
► Les processeurs multi-cœurs sont considérés de nos jours comme l'avenir des calculateurs et auront un impact important dans le calcul scientifique. Cette thèse présente une…
(more)
▼ Les processeurs multi-cœurs sont considérés de nos jours comme l'avenir des calculateurs et auront un impact important dans le calcul scientifique. Cette thèse présente une nouvelle approche de résolution des grands systèmes linéaires creux et denses, qui soit adaptée à l'exécution sur les futurs machines pétaflopiques et en particulier celles ayant un nombre important de cœurs. Compte tenu du coût croissant des communications comparé au temps dont les processeurs mettent pour effectuer les opérations arithmétiques, notre approche adopte le principe de minimisation des communications au prix de quelques calculs redondants et utilise plusieurs adaptations pour atteindre de meilleures performances sur les machines multi-cœurs. Nous décomposons le problème à résoudre en plusieurs phases qui sont ensuite mises en œuvre séparément. Dans la première partie, nous présentons un algorithme basé sur le partitionnement d'hypergraphe qui réduit considérablement le remplissage ("fill-in") induit lors de la factorisation LU des matrices creuses non symétriques. Dans la deuxième partie, nous présentons deux algorithmes de réduction de communication pour les factorisations LU et QR qui sont adaptés aux environnements multi-cœurs. La principale contribution de cette partie est de réorganiser les opérations de la factorisation de manière à réduire la sollicitation du bus tout en utilisant de façon optimale les ressources. Nous étendons ensuite ce travail aux clusters de processeurs multi-cœurs. Dans la troisième partie, nous présentons une nouvelle approche d'ordonnancement et d'optimisation. La localité des données et l'équilibrage des charges représentent un sérieux compromis pour le choix des méthodes d'ordonnancement. Sur les machines NUMA par exemple où la localité des données n'est pas une option, nous avons observé qu'en présence de perturbations systèmes (" OS noise"), les performances pouvaient rapidement se dégrader et devenir difficiles à prédire. Pour cela, nous présentons une approche combinant un ordonnancement statique et dynamique pour ordonnancer les tâches de nos algorithmes. Nos résultats obtenues sur plusieurs architectures montrent que tous nos algorithmes sont efficaces et conduisent à des gains de performances significatifs. Nous pouvons atteindre des améliorations de l'ordre de 30 à 110% par rapport aux correspondants de nos algorithmes dans les bibliothèques numériques bien connues de la littérature.
Multicore processors are considered to be nowadays the future of computing, and they will have an important impact in scientific computing. In this thesis, we study methods and algorithms for solving efficiently sparse and dense large linear systems on future petascale machines and in particular these having a significant number of cores. Due to the increasing communication cost compared to the time the processors take to perform arithmetic operations, our approach embrace the communication avoiding algorithm principle by doing some redundant computations and uses several adaptations to achieve better…
Advisors/Committee Members: Grigori, Laura (thesis director).
Subjects/Keywords: Factorisation LU; QR; Réduction des communications; Méthode de renumérotations; Techniques d'ordonnancement; Optimisations; Multi-coeurs; LU factorization; QR; Communication avoiding; Ordering; Scheduling technic; Optimization; Multicore
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Donfack, S. (2012). Methods and algorithms for solving linear systems of equations on massively parallel computers : Méthodes et algorithmes pour la résolution des systèmes d'équations linéaires sur les ordinateurs massivement parallèles. (Doctoral Dissertation). Université Paris-Sud – Paris XI. Retrieved from http://www.theses.fr/2012PA112042
Chicago Manual of Style (16th Edition):
Donfack, Simplice. “Methods and algorithms for solving linear systems of equations on massively parallel computers : Méthodes et algorithmes pour la résolution des systèmes d'équations linéaires sur les ordinateurs massivement parallèles.” 2012. Doctoral Dissertation, Université Paris-Sud – Paris XI. Accessed March 01, 2021.
http://www.theses.fr/2012PA112042.
MLA Handbook (7th Edition):
Donfack, Simplice. “Methods and algorithms for solving linear systems of equations on massively parallel computers : Méthodes et algorithmes pour la résolution des systèmes d'équations linéaires sur les ordinateurs massivement parallèles.” 2012. Web. 01 Mar 2021.
Vancouver:
Donfack S. Methods and algorithms for solving linear systems of equations on massively parallel computers : Méthodes et algorithmes pour la résolution des systèmes d'équations linéaires sur les ordinateurs massivement parallèles. [Internet] [Doctoral dissertation]. Université Paris-Sud – Paris XI; 2012. [cited 2021 Mar 01].
Available from: http://www.theses.fr/2012PA112042.
Council of Science Editors:
Donfack S. Methods and algorithms for solving linear systems of equations on massively parallel computers : Méthodes et algorithmes pour la résolution des systèmes d'équations linéaires sur les ordinateurs massivement parallèles. [Doctoral Dissertation]. Université Paris-Sud – Paris XI; 2012. Available from: http://www.theses.fr/2012PA112042

Universidade Estadual de Campinas
6.
Cantane, Daniela Renata.
Contribuição da atualização da decomposição LU no metodo Simplex: Contribution of the LU factorization update in the Simplex method.
Degree: 2009, Universidade Estadual de Campinas
URL: http://repositorio.unicamp.br/jspui/handle/REPOSIP/260212
► Abstract: Finding efficient solution of linear systems is fundamental in the linear programming problems and the first method to obtain success for this class of…
(more)
▼ Abstract: Finding efficient solution of linear systems is fundamental in the linear programming problems and the first method to obtain success for this class of problems was the Simplex method. With the objective to develop efficient alternatives to its implementation, techniques of the simplex basis
LU factorization update are developed in this thesis to improve the solution of the Simplex method linear systems towards a matrix columns static reordering. A simulation of the Simplex method is implemented, carrying through the change of basis obtained from MINOS and verifying its sparsity. Only the factored columns actually modified by the change of the base are carried through to obtain an efficient
LU factorization update. The matrix columns are reordered according to three strategies: minimum degree; block triangular form and the Björck strategy. Thus, sparse factorizations are obtained for any base without computational effort to obtain the order of columns, since the reordering of the matrix is static and base columns follow this ordering. The application of the block triangular form achieved the best results, for larger scale problems tested, in comparison to minimum degree method and the Björck strategy. Computational results for Netlib problems show the robustness of this approach and good computational performance, since there is no need of periodical factorizations as used in traditional updating methods. The proposed method obtained a reduction of the nonzero entries of the basis with respect to MINOS. This approach was applied in the cutting stock problems and the proposed method achieved a reduction of the computational time in the solution of such problems with respect to the GLPK.
Advisors/Committee Members: UNIVERSIDADE ESTADUAL DE CAMPINAS (CRUESP), Oliveira, Aurelio Ribeiro Leite de, 1962- (advisor), Lyra Filho, Christiano, 1951- (coadvisor), Universidade Estadual de Campinas. Faculdade de Engenharia Elétrica e de Computação (institution), Programa de Pós-Graduação em Engenharia Elétrica (nameofprogram), Campos Filho, Frederico Ferreira (committee member), Arenales, Marcon Nereu (committee member), Ferreira, Paulo Augusto Valente (committee member), Yamakami, Akebo (committee member).
Subjects/Keywords: Programação linear; Simplex (Matemática); Decomposição LU; Linear Programming; Simplex Method; LU Factorization Update
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Cantane, D. R. (2009). Contribuição da atualização da decomposição LU no metodo Simplex: Contribution of the LU factorization update in the Simplex method. (Thesis). Universidade Estadual de Campinas. Retrieved from http://repositorio.unicamp.br/jspui/handle/REPOSIP/260212
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
Chicago Manual of Style (16th Edition):
Cantane, Daniela Renata. “Contribuição da atualização da decomposição LU no metodo Simplex: Contribution of the LU factorization update in the Simplex method.” 2009. Thesis, Universidade Estadual de Campinas. Accessed March 01, 2021.
http://repositorio.unicamp.br/jspui/handle/REPOSIP/260212.
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
MLA Handbook (7th Edition):
Cantane, Daniela Renata. “Contribuição da atualização da decomposição LU no metodo Simplex: Contribution of the LU factorization update in the Simplex method.” 2009. Web. 01 Mar 2021.
Vancouver:
Cantane DR. Contribuição da atualização da decomposição LU no metodo Simplex: Contribution of the LU factorization update in the Simplex method. [Internet] [Thesis]. Universidade Estadual de Campinas; 2009. [cited 2021 Mar 01].
Available from: http://repositorio.unicamp.br/jspui/handle/REPOSIP/260212.
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
Council of Science Editors:
Cantane DR. Contribuição da atualização da decomposição LU no metodo Simplex: Contribution of the LU factorization update in the Simplex method. [Thesis]. Universidade Estadual de Campinas; 2009. Available from: http://repositorio.unicamp.br/jspui/handle/REPOSIP/260212
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
7.
Somers, Gregory W.
Acceleration of Block-Aware Matrix Factorization on Heterogeneous Platforms
.
Degree: 2016, University of Ottawa
URL: http://hdl.handle.net/10393/35128
► Block-structured matrices arise in several contexts in circuit simulation problems. These matrices typically inherit the pattern of sparsity from the circuit connectivity. However, they are…
(more)
▼ Block-structured matrices arise in several contexts in circuit
simulation problems. These matrices typically inherit the pattern of
sparsity from the circuit connectivity. However, they are also
characterized by dense spots or blocks. Direct factorization of those
matrices has emerged as an attractive approach if the host memory is sufficiently large to store the block-structured matrix. The approach proposed in this thesis aims to accelerate the direct factorization of general block-structured matrices by leveraging the power of multiple OpenCL accelerators such as Graphical Processing Units (GPUs).
The proposed approach utilizes the notion of a Directed Acyclic Graph representing the matrix in order to schedule its factorization on multiple accelerators. This thesis also describes memory management techniques that enable handling large matrices while minimizing the amount of memory transfer over the PCIe bus between the host CPU and the attached devices. The results demonstrate that by using two GPUs the proposed approach can achieve a nearly optimal speedup when compared to a
single GPU platform.
Subjects/Keywords: multi-GPU;
Parallel LU Factorization;
Circuit Simulation
…29
3.1
Gilbert-Peierls Column Level LU factorization… …handling the parallel execution and managing
the memory resources of block LU factorization.
1.4… …method [10]. The core computational part of the Newton
method is LU factorization of… …8 Challenges of Small Block Factorization
97
8.1
Introduction… …82
7.3
Hamrle1 matrix factorization details for GPU…
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Somers, G. W. (2016). Acceleration of Block-Aware Matrix Factorization on Heterogeneous Platforms
. (Thesis). University of Ottawa. Retrieved from http://hdl.handle.net/10393/35128
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
Chicago Manual of Style (16th Edition):
Somers, Gregory W. “Acceleration of Block-Aware Matrix Factorization on Heterogeneous Platforms
.” 2016. Thesis, University of Ottawa. Accessed March 01, 2021.
http://hdl.handle.net/10393/35128.
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
MLA Handbook (7th Edition):
Somers, Gregory W. “Acceleration of Block-Aware Matrix Factorization on Heterogeneous Platforms
.” 2016. Web. 01 Mar 2021.
Vancouver:
Somers GW. Acceleration of Block-Aware Matrix Factorization on Heterogeneous Platforms
. [Internet] [Thesis]. University of Ottawa; 2016. [cited 2021 Mar 01].
Available from: http://hdl.handle.net/10393/35128.
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
Council of Science Editors:
Somers GW. Acceleration of Block-Aware Matrix Factorization on Heterogeneous Platforms
. [Thesis]. University of Ottawa; 2016. Available from: http://hdl.handle.net/10393/35128
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation

KTH
8.
Netzer, Gilbert.
Efficient LU Factorization for Texas Instruments Keystone Architecture Digital Signal Processors.
Degree: Computer Science and Communication (CSC), 2015, KTH
URL: http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-170445
► The energy consumption of large-scale high-performance computer (HPC) systems has become one of the foremost concerns of both data-center operators and computer manufacturers. This…
(more)
▼ The energy consumption of large-scale high-performance computer (HPC) systems has become one of the foremost concerns of both data-center operators and computer manufacturers. This has renewed interest in alternative computer architectures that could offer substantially better energy-efficiency.Yet, the for the evaluation of the potential of these architectures necessary well-optimized implementations of typical HPC benchmarks are often not available for these for the HPC industry novel architectures. The in this work presented LU factorization benchmark implementation aims to provide such a high-quality tool for the HPC industry standard high-performance LINPACK benchmark (HPL) for the eight-core Texas Instruments TMS320C6678 digitalsignal processor (DSP). The presented implementation could perform the LU factorization at up to 30.9 GF/s at 1.25 GHz core clock frequency by using all the eight DSP cores of the System-on-Chip (SoC). This is 77% of the attainable peak double-precision floating-point performance of the DSP, a level of efficiency that is comparable to the efficiency expected on traditional x86-based processor architectures. A presented detailed performance analysis shows that this is largely due to the optimized implementation of the embedded generalized matrix-matrix multiplication (GEMM). For this operation, the on-chip direct memory access (DMA) engines were used to transfer the necessary data from the external DDR3 memory to the core-private and shared scratchpad memory. This allowed to overlap the data transfer with computations on the DSP cores. The computations were in turn optimized by using software pipeline techniques and were partly implemented in assembly language. With these optimization the performance of the matrix multiplication reached up to 95% of attainable peak performance. A detailed description of these two key optimization techniques and their application to the LU factorization is included. Using a specially instrumented Advantech TMDXEVM6678L evaluation module, described in detail in related work, allowed to measure the SoC’s energy efficiency of up to 2.92 GF/J while executing the presented benchmark. Results from the verification of the benchmark execution using standard HPL correctness checks and an uncertainty analysis of the experimentally gathered data are also presented.
Energiförbrukningen av storskaliga högpresterande datorsystem (HPC) har blivit ett av de främsta problemen för såväl ägare av dessa system som datortillverkare. Det har lett till ett förnyat intresse för alternativa datorarkitekturer som kan vara betydligt mer effektiva ur energiförbrukningssynpunkt. För detaljerade analyser av prestanda och energiförbrukning av dessa för HPC-industrin nya arkitekturer krävs väloptimerade implementationer av standard HPC-bänkmärkningsproblem. Syftet med detta examensarbete är att tillhandhålla ett sådant högkvalitativt verktyg i form av en implementation av ett bänkmärkesprogram för LU-faktorisering för den åttakärniga digitala…
Subjects/Keywords: LU factorization; digital signal processors; Texas Instruments; Keystone architecture; high-performance LINPACK; benchmark; performance; energy efficiency; software-pipelined loops; direct memory access; optimization; Computer Sciences; Datavetenskap (datalogi)
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Netzer, G. (2015). Efficient LU Factorization for Texas Instruments Keystone Architecture Digital Signal Processors. (Thesis). KTH. Retrieved from http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-170445
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
Chicago Manual of Style (16th Edition):
Netzer, Gilbert. “Efficient LU Factorization for Texas Instruments Keystone Architecture Digital Signal Processors.” 2015. Thesis, KTH. Accessed March 01, 2021.
http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-170445.
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
MLA Handbook (7th Edition):
Netzer, Gilbert. “Efficient LU Factorization for Texas Instruments Keystone Architecture Digital Signal Processors.” 2015. Web. 01 Mar 2021.
Vancouver:
Netzer G. Efficient LU Factorization for Texas Instruments Keystone Architecture Digital Signal Processors. [Internet] [Thesis]. KTH; 2015. [cited 2021 Mar 01].
Available from: http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-170445.
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
Council of Science Editors:
Netzer G. Efficient LU Factorization for Texas Instruments Keystone Architecture Digital Signal Processors. [Thesis]. KTH; 2015. Available from: http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-170445
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation

Delft University of Technology
9.
Sheng, Z. (author).
Hierarchically semi-separable representation and its applications.
Degree: Electrical Engineering, Mathematics and Computer Science, Information and Communication Technology (ICT), 2006, Delft University of Technology
URL: http://resolver.tudelft.nl/uuid:e7a452de-6859-42bc-97b6-3eb28815dc8b
► In this thesis, we study a important class of structured matrices: "Hierarchically Semi-Separable" matrices, for which an efficient hierarchically state based representation called Hierarchically Semi-…
(more)
▼ In this thesis, we study a important class of structured matrices: "Hierarchically Semi-Separable" matrices, for which an efficient hierarchically state based representation called Hierarchically Semi- Separable (HSS) representation can be used to represent it with a small amount of parameters that are linear in the dimension of the matrix. Under the framework of this HSS representation, efficient matrix transformation algorithms that are linear in the number of equations are given. In particular, a system Ax = b can be solved with linear complexity. Also, LU and URV factorization can be efficiently executed. There is a close connection between HSS matrices and classical Semi Separable matrices (sometimes called Sequentially Semi Separable matrices or SSS matrices or equivalently, time-varying systems), and a number of results of SSS theory can be translated to parallel results in the HSS theory (neither class contains the other however). We also discuss the limitation of the direct HSS algorithms, for which iterative solution algorithms based on HSS representation and its basic algorithms should come to rescue. A number of preconditioner construction algorithms based on HSS representation have been proposed.
Electrical Engineering, Mathematics and Computer Science
Advisors/Committee Members: Dewilde, P. (mentor), van der Veen, A.J. (mentor), van der Meijs, N. (mentor), Remis, R.F. (mentor).
Subjects/Keywords: hierarchically semi-separable matrix; linear solver; preconditioner; lu; urv factorization
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Sheng, Z. (. (2006). Hierarchically semi-separable representation and its applications. (Masters Thesis). Delft University of Technology. Retrieved from http://resolver.tudelft.nl/uuid:e7a452de-6859-42bc-97b6-3eb28815dc8b
Chicago Manual of Style (16th Edition):
Sheng, Z (author). “Hierarchically semi-separable representation and its applications.” 2006. Masters Thesis, Delft University of Technology. Accessed March 01, 2021.
http://resolver.tudelft.nl/uuid:e7a452de-6859-42bc-97b6-3eb28815dc8b.
MLA Handbook (7th Edition):
Sheng, Z (author). “Hierarchically semi-separable representation and its applications.” 2006. Web. 01 Mar 2021.
Vancouver:
Sheng Z(. Hierarchically semi-separable representation and its applications. [Internet] [Masters thesis]. Delft University of Technology; 2006. [cited 2021 Mar 01].
Available from: http://resolver.tudelft.nl/uuid:e7a452de-6859-42bc-97b6-3eb28815dc8b.
Council of Science Editors:
Sheng Z(. Hierarchically semi-separable representation and its applications. [Masters Thesis]. Delft University of Technology; 2006. Available from: http://resolver.tudelft.nl/uuid:e7a452de-6859-42bc-97b6-3eb28815dc8b
10.
Li, Nan.
Improving the Execution Time of Large System
Simulations.
Degree: MS, Electrical Engineering, 2012, Arizona State University
URL: http://repository.asu.edu/items/15851
► Today, the electric power system faces new challenges from rapid developing technology and the growing concern about environmental problems. The future of the power system…
(more)
▼ Today, the electric power system faces new challenges
from rapid developing technology and the growing concern about
environmental problems. The future of the power system under these
new challenges needs to be planned and studied. However, due to the
high degree of computational complexity of the optimization
problem, conducting a system planning study which takes into
account the market structure and environmental constraints on a
large-scale power system is computationally taxing. To improve the
execution time of large system simulations, such as the system
planning study, two possible strategies are proposed in this
thesis. The first one is to implement a relative new factorization
method, known as the multifrontal method, to speed up the solution
of the sparse linear matrix equations within the large system
simulations. The performance of the multifrontal method implemented
by UMFAPACK is compared with traditional LU factorization on a wide
range of power-system matrices. The results show that the
multifrontal method is superior to traditional LU factorization on
relatively denser matrices found in other specialty areas, but has
poor performance on the more sparse matrices that occur in
power-system applications. This result suggests that multifrontal
methods may not be an effective way to improve execution time for
large system simulation and power system engineers should evaluate
the performance of the multifrontal method before applying it to
their applications. The second strategy is to develop a small dc
equivalent of the large-scale network with satisfactory accuracy
for the large-scale system simulations. In this thesis, a modified
Ward equivalent is generated for a large-scale power system, such
as the full Electric Reliability Council of Texas (ERCOT) system.
In this equivalent, all the generators in the full model are
retained integrally. The accuracy of the modified Ward equivalent
is validated and the equivalent is used to conduct the optimal
generation investment planning study. By using the dc equivalent,
the execution time for optimal generation investment planning is
greatly reduced. Different scenarios are modeled to study the
impact of fuel prices, environmental constraints and incentives for
renewable energy on future investment and retirement in
generation.
Subjects/Keywords: Electrical engineering; LU factorization; Multifrontal Method; Network Reduction
…retained line i
GHG
Greenhouse gas
GPLU
A LU factorization algorithm developed by Gilbert
and… …multifrontal method on different types
of matrices is compared to the traditional LU factorization… …of the
multifrontal method is compared against traditional LU factorization on different… …consensus that the so-called traditional
sparse matrix methods for LU factorization are the… …method
[31] and traditional LU factorization. The results showed that the combined…
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Li, N. (2012). Improving the Execution Time of Large System
Simulations. (Masters Thesis). Arizona State University. Retrieved from http://repository.asu.edu/items/15851
Chicago Manual of Style (16th Edition):
Li, Nan. “Improving the Execution Time of Large System
Simulations.” 2012. Masters Thesis, Arizona State University. Accessed March 01, 2021.
http://repository.asu.edu/items/15851.
MLA Handbook (7th Edition):
Li, Nan. “Improving the Execution Time of Large System
Simulations.” 2012. Web. 01 Mar 2021.
Vancouver:
Li N. Improving the Execution Time of Large System
Simulations. [Internet] [Masters thesis]. Arizona State University; 2012. [cited 2021 Mar 01].
Available from: http://repository.asu.edu/items/15851.
Council of Science Editors:
Li N. Improving the Execution Time of Large System
Simulations. [Masters Thesis]. Arizona State University; 2012. Available from: http://repository.asu.edu/items/15851

University of Texas – Austin
11.
Ellis, Apollo Isaac Orion.
Jack Rabbit : an effective Cell BE programming system for high performance parallelism.
Degree: MSin Computer Sciences, Computer Science, 2011, University of Texas – Austin
URL: http://hdl.handle.net/2152/ETD-UT-2011-05-3624
► The Cell processor is an example of the trade-offs made when designing a mass market power efficient multi-core machine, but the machine-exposing architecture and raw…
(more)
▼ The Cell processor is an example of the trade-offs made when designing a mass market power efficient multi-core machine, but the machine-exposing architecture and raw communication mechanisms of Cell are hard to manage for a programmer. Cell's design is simple and causes software complexity to go up in the areas of achieving low threading overhead, good bandwidth efficiency, and load balance. Several attempts have been made to produce efficient and effective programming systems for Cell, but the attempts have been too specialized and thus fall short. We present Jack Rabbit, an efficient thread pool work queue implementation, with load balancing mechanisms and double buffering. Our system incurs low threading overhead, gets good load balance, and achieves bandwidth efficiency. Our system represents a step towards an effective way to program Cell and any similar current or future processors.
Advisors/Committee Members: Lin, Yun Calvin (advisor), Fussell, Donald S., 1951- (advisor).
Subjects/Keywords: Cell processor; Parallel processing (Electronic computers); Multi-core systems; High performance computing; Runtime; Barnes Hut; LU factorization; Mandelbrot; Double buffering; Thread pool; Work queue; Load balance
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Ellis, A. I. O. (2011). Jack Rabbit : an effective Cell BE programming system for high performance parallelism. (Masters Thesis). University of Texas – Austin. Retrieved from http://hdl.handle.net/2152/ETD-UT-2011-05-3624
Chicago Manual of Style (16th Edition):
Ellis, Apollo Isaac Orion. “Jack Rabbit : an effective Cell BE programming system for high performance parallelism.” 2011. Masters Thesis, University of Texas – Austin. Accessed March 01, 2021.
http://hdl.handle.net/2152/ETD-UT-2011-05-3624.
MLA Handbook (7th Edition):
Ellis, Apollo Isaac Orion. “Jack Rabbit : an effective Cell BE programming system for high performance parallelism.” 2011. Web. 01 Mar 2021.
Vancouver:
Ellis AIO. Jack Rabbit : an effective Cell BE programming system for high performance parallelism. [Internet] [Masters thesis]. University of Texas – Austin; 2011. [cited 2021 Mar 01].
Available from: http://hdl.handle.net/2152/ETD-UT-2011-05-3624.
Council of Science Editors:
Ellis AIO. Jack Rabbit : an effective Cell BE programming system for high performance parallelism. [Masters Thesis]. University of Texas – Austin; 2011. Available from: http://hdl.handle.net/2152/ETD-UT-2011-05-3624

University of Cincinnati
12.
Pathanjali, Nandini.
Pipelined IEEE-754 Double Precision Floating Point
Arithmetic Operators on Virtex FPGA’s.
Degree: MS, Engineering : Computer Engineering, 2002, University of Cincinnati
URL: http://rave.ohiolink.edu/etdc/view?acc_num=ucin1017085297
► Analog and mixed-signal circuit simulation often employs the use of the so-called LU decomposition method to solve a set of linear algebraic equations represented as…
(more)
▼ Analog and mixed-signal circuit simulation often
employs the use of the so-called
LU decomposition method to solve a
set of linear algebraic equations represented as Ax=b, where A is a
square matrix. The
LU method requires the
factorization of A into
two tri-diagonal matrices.
Factorization is of order ‘n power 3’
time and dominates the execution time of the
LU decomposition
method. A number of approaches have been developed for reducing the
execution time of
LU factorization. One approach is to unroll the
factorization algorithm and, considering each resulting assignment
statement to be a machine operation, interpret the instruction
stream. If an interpreter is implemented in a special purpose
hardware engine there may be efficiencies to be gained by using a
uniquely-developed floating-point unit within the hardware
interpreter. This thesis documents the research in exploring
alternatives in the design of a special purpose double precision
floating point unit for a hardware interpreter to perform
LU
factorization using unrolled code. Alternatives explored were
primarily looking at integer adder and multiplier units of the
floating-point unit to determine the speed and area for each
integer unit that were considered. A floating-point divider
algorithm was also explored and studied. The result of this study
for the pipelined floating-point unit gives the best performance
with the Block Carry-look-ahead integer adder, Booth-2 integer
multiplier and the SRT integer divider units. One of the
interesting aspects of this research was heavy use of rapid
prototyping. All models were implemented in VHDL to evaluate system
level performance, synthesized by Synopsys to gate level, and
analyzed for time and area at the logic level. Finally, chip level
area and space characteristics were obtained using Xilinx place and
route tools.
Advisors/Committee Members: Carter, Dr. Harold (Advisor).
Subjects/Keywords: Computer Science; pipelined; floating point unit; TEEE-754 format; lu factorization; virtex FPGA
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Pathanjali, N. (2002). Pipelined IEEE-754 Double Precision Floating Point
Arithmetic Operators on Virtex FPGA’s. (Masters Thesis). University of Cincinnati. Retrieved from http://rave.ohiolink.edu/etdc/view?acc_num=ucin1017085297
Chicago Manual of Style (16th Edition):
Pathanjali, Nandini. “Pipelined IEEE-754 Double Precision Floating Point
Arithmetic Operators on Virtex FPGA’s.” 2002. Masters Thesis, University of Cincinnati. Accessed March 01, 2021.
http://rave.ohiolink.edu/etdc/view?acc_num=ucin1017085297.
MLA Handbook (7th Edition):
Pathanjali, Nandini. “Pipelined IEEE-754 Double Precision Floating Point
Arithmetic Operators on Virtex FPGA’s.” 2002. Web. 01 Mar 2021.
Vancouver:
Pathanjali N. Pipelined IEEE-754 Double Precision Floating Point
Arithmetic Operators on Virtex FPGA’s. [Internet] [Masters thesis]. University of Cincinnati; 2002. [cited 2021 Mar 01].
Available from: http://rave.ohiolink.edu/etdc/view?acc_num=ucin1017085297.
Council of Science Editors:
Pathanjali N. Pipelined IEEE-754 Double Precision Floating Point
Arithmetic Operators on Virtex FPGA’s. [Masters Thesis]. University of Cincinnati; 2002. Available from: http://rave.ohiolink.edu/etdc/view?acc_num=ucin1017085297

University of Cincinnati
13.
Syed, Akber.
A Hardware Interpreter for Sparse Matrix LU
Factorization.
Degree: MS, Engineering : Computer Engineering, 2002, University of Cincinnati
URL: http://rave.ohiolink.edu/etdc/view?acc_num=ucin1024934521
► This thesis investigated a hardware interpreter for sparse matrix LU factorization. LU factorization is one of the most commonly used methods for solving a…
(more)
▼ This thesis investigated a hardware
interpreter for sparse matrix
LU factorization.
LU factorization is
one of the most commonly used methods for solving a system of
linear equations representing an electrical network. In this
method, a system of equations Ax=B is solved by factoring the
matrix A into its Lower (L) and Upper (U) triangular matrices. L
and U are then used to obtain the unknown vector x by forward and
backward substitution.
Factorization of A is
O(n
3) time-complex, requiring techniques to
speed it up. We explored a hardware-based interpreter for the
unrolled and compressed
LU factorization code. The inputs to the
hardware interpreter were a stream of instructions and a list of
non-zeros. The instructions were decompressed on-the-fly and
executed on the non-zero list using a special-purpose
floating-point unit. Three cases of a Harvard-type hardware
architecture were investigated; the architecture cases were modeled
at behavioral and structural levels of abstraction and verified for
functional and performance correctness. The well-known
linear-system-solver software,
SuperLU, was
used for performance comparison. We found that
all the architecture cases studied showed a significant memory
interface throttle; an architecture case which used a 4-port
interleaved memory for storing data, performed the fastest. Another
case explored was a faster to implement
all
FPGA solution, which used FPGA block-RAMs as a true
dual-port memory; this case did not perform as fast as the earlier
case with 4-port memory. The third case with standard one-port
memory was the slowest. We showed that with an efficient
implementation of the floating-point unit resulting in higher
frequency operations, all three cases of the proposed architecture
out-perform the software based
LU
decomposition.
Advisors/Committee Members: Carter, Dr. Hal (Advisor).
Subjects/Keywords: LU factorization; sparse matrices; hardware interpreter; hardware accelerator; loop unrolling
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Syed, A. (2002). A Hardware Interpreter for Sparse Matrix LU
Factorization. (Masters Thesis). University of Cincinnati. Retrieved from http://rave.ohiolink.edu/etdc/view?acc_num=ucin1024934521
Chicago Manual of Style (16th Edition):
Syed, Akber. “A Hardware Interpreter for Sparse Matrix LU
Factorization.” 2002. Masters Thesis, University of Cincinnati. Accessed March 01, 2021.
http://rave.ohiolink.edu/etdc/view?acc_num=ucin1024934521.
MLA Handbook (7th Edition):
Syed, Akber. “A Hardware Interpreter for Sparse Matrix LU
Factorization.” 2002. Web. 01 Mar 2021.
Vancouver:
Syed A. A Hardware Interpreter for Sparse Matrix LU
Factorization. [Internet] [Masters thesis]. University of Cincinnati; 2002. [cited 2021 Mar 01].
Available from: http://rave.ohiolink.edu/etdc/view?acc_num=ucin1024934521.
Council of Science Editors:
Syed A. A Hardware Interpreter for Sparse Matrix LU
Factorization. [Masters Thesis]. University of Cincinnati; 2002. Available from: http://rave.ohiolink.edu/etdc/view?acc_num=ucin1024934521

INP Toulouse
14.
Maurin, Julien.
Résolution des équations intégrales de surface par une méthode de décomposition de domaine et compression hiérarchique ACA : Application à la simulation électromagnétique des larges plateformes : Resolution of surface integral equations by a domain decomposition method and adaptive cross approximation : Application to the electromagnetic simulation of large platforms.
Degree: Docteur es, Electromagnétisme et Systèmes Haute Fréquence, 2015, INP Toulouse
URL: http://www.theses.fr/2015INPT0079
► Cette étude s’inscrit dans le domaine de la simulation électromagnétique des problèmes de grande taille tels que la diffraction d’ondes planes par de larges plateformes…
(more)
▼ Cette étude s’inscrit dans le domaine de la simulation électromagnétique des problèmes de grande taille tels que la diffraction d’ondes planes par de larges plateformes et le rayonnement d’antennes aéroportées. Elle consiste à développer une méthode combinant décomposition en sous-domaines et compression hiérarchique des équations intégrales de frontière. Pour cela, nous rappelons dans un premier temps les points importants de la méthode des équations intégrales de frontière et de leur compression hiérarchique par l’algorithme ACA (Adaptive Cross Approximation). Ensuite, nous présentons la formulation IE-DDM (Integral Equations – Domain Decomposition Method) obtenue à partir d’une représentation intégrale des sous-domaines. Les matrices résultant de la discrétisation de cette formulation sont stockées au format H-matrice (matricehiérarchique). Un solveur spécialement adapté à la résolution de la formulation IE-DDM et à sa représentation hiérarchique a été conçu. Cette étude met en évidence l’efficacité de la décomposition en sous-domaines en tant que préconditionneur des équations intégrales. De plus, la méthode développée est rapide pour la résolution des problèmes à incidences multiples ainsi que la résolution des problèmes basses fréquences
This thesis is about the electromagnetic simulation of large scale problems as the wave scattering from aircrafts and the airborne antennas radiation. It consists in the development of a method combining domain decomposition and hierarchical compression of the surface integral equations. First, we remind the principles of the boundary element method and the hierarchical representation of the surface integral equations with the Adaptive Cross Approximation algorithm. Then, we present the IE-DDM formulation obtained from a sub-domain integral representation. The matrices resulting of the discretization of the formulation are stored in the H-matrix format. A solver especially fitted with the hierarchical representation of the IE-DDM formulation has been developed. This study highlights the efficiency of the sub-domain decomposition as a preconditioner of the integral equations. Moreover, the method is fast for the resolution of multiple incidences and the resolution of low frequencies problems
Advisors/Committee Members: Barka, André (thesis director), Gobin, Vincent (thesis director).
Subjects/Keywords: Equations intégrales de surface; Décomposition en sous-domaines; Matrices hiérarchiques; Adaptive cross approximation; Factorisation LU approchée; Solveur itératif; Simulation électromagnétique; Larges plateformes; Surface integral equations; Domain decomposition; Hierarchical matrices; Adaptive cross approximation; Approximate LU factorization; Iterative solver; Computational electromagnetics; Large scale problems
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Maurin, J. (2015). Résolution des équations intégrales de surface par une méthode de décomposition de domaine et compression hiérarchique ACA : Application à la simulation électromagnétique des larges plateformes : Resolution of surface integral equations by a domain decomposition method and adaptive cross approximation : Application to the electromagnetic simulation of large platforms. (Doctoral Dissertation). INP Toulouse. Retrieved from http://www.theses.fr/2015INPT0079
Chicago Manual of Style (16th Edition):
Maurin, Julien. “Résolution des équations intégrales de surface par une méthode de décomposition de domaine et compression hiérarchique ACA : Application à la simulation électromagnétique des larges plateformes : Resolution of surface integral equations by a domain decomposition method and adaptive cross approximation : Application to the electromagnetic simulation of large platforms.” 2015. Doctoral Dissertation, INP Toulouse. Accessed March 01, 2021.
http://www.theses.fr/2015INPT0079.
MLA Handbook (7th Edition):
Maurin, Julien. “Résolution des équations intégrales de surface par une méthode de décomposition de domaine et compression hiérarchique ACA : Application à la simulation électromagnétique des larges plateformes : Resolution of surface integral equations by a domain decomposition method and adaptive cross approximation : Application to the electromagnetic simulation of large platforms.” 2015. Web. 01 Mar 2021.
Vancouver:
Maurin J. Résolution des équations intégrales de surface par une méthode de décomposition de domaine et compression hiérarchique ACA : Application à la simulation électromagnétique des larges plateformes : Resolution of surface integral equations by a domain decomposition method and adaptive cross approximation : Application to the electromagnetic simulation of large platforms. [Internet] [Doctoral dissertation]. INP Toulouse; 2015. [cited 2021 Mar 01].
Available from: http://www.theses.fr/2015INPT0079.
Council of Science Editors:
Maurin J. Résolution des équations intégrales de surface par une méthode de décomposition de domaine et compression hiérarchique ACA : Application à la simulation électromagnétique des larges plateformes : Resolution of surface integral equations by a domain decomposition method and adaptive cross approximation : Application to the electromagnetic simulation of large platforms. [Doctoral Dissertation]. INP Toulouse; 2015. Available from: http://www.theses.fr/2015INPT0079

Université Paris-Sud – Paris XI
15.
Rémy, Adrien.
Solving dense linear systems on accelerated multicore architectures : Résoudre des systèmes linéaires denses sur des architectures composées de processeurs multicœurs et d’accélerateurs.
Degree: Docteur es, Informatique, 2015, Université Paris-Sud – Paris XI
URL: http://www.theses.fr/2015PA112138
► Dans cette thèse de doctorat, nous étudions des algorithmes et des implémentations pour accélérer la résolution de systèmes linéaires denses en utilisant des architectures composées…
(more)
▼ Dans cette thèse de doctorat, nous étudions des algorithmes et des implémentations pour accélérer la résolution de systèmes linéaires denses en utilisant des architectures composées de processeurs multicœurs et d'accélérateurs. Nous nous concentrons sur des méthodes basées sur la factorisation LU. Le développement de notre code s'est fait dans le contexte de la bibliothèque MAGMA. Tout d'abord nous étudions différents solveurs CPU/GPU hybrides basés sur la factorisation LU. Ceux-ci visent à réduire le surcoût de communication dû au pivotage. Le premier est basé sur une stratégie de pivotage dite "communication avoiding" (CALU) alors que le deuxième utilise un préconditionnement aléatoire du système original pour éviter de pivoter (RBT). Nous montrons que ces deux méthodes surpassent le solveur utilisant la factorisation LU avec pivotage partiel quand elles sont utilisées sur des architectures hybrides multicœurs/GPUs. Ensuite nous développons des solveurs utilisant des techniques de randomisation appliquées sur des architectures hybrides utilisant des GPU Nvidia ou des coprocesseurs Intel Xeon Phi. Avec cette méthode, nous pouvons éviter l'important surcoût du pivotage tout en restant stable numériquement dans la plupart des cas. L'architecture hautement parallèle de ces accélérateurs nous permet d'effectuer la randomisation de notre système linéaire à un coût de calcul très faible par rapport à la durée de la factorisation. Finalement, nous étudions l'impact d'accès mémoire non uniformes (NUMA) sur la résolution de systèmes linéaires denses en utilisant un algorithme de factorisation LU. En particulier, nous illustrons comment un placement approprié des processus légers et des données sur une architecture NUMA peut améliorer les performances pour la factorisation du panel et accélérer de manière conséquente la factorisation LU globale. Nous montrons comment ces placements peuvent améliorer les performances quand ils sont appliqués à des solveurs hybrides multicœurs/GPU.
In this PhD thesis, we study algorithms and implementations to accelerate the solution of dense linear systems by using hybrid architectures with multicore processors and accelerators. We focus on methods based on the LU factorization and our code development takes place in the context of the MAGMA library. We study different hybrid CPU/GPU solvers based on the LU factorization which aim at reducing the communication overhead due to pivoting. The first one is based on a communication avoiding strategy of pivoting (CALU) while the second uses a random preconditioning of the original system to avoid pivoting (RBT). We show that both of these methods outperform the solver using LU factorization with partial pivoting when implemented on hybrid multicore/GPUs architectures. We also present new solvers based on randomization for hybrid architectures for Nvidia GPU or Intel Xeon Phi coprocessor. With this method, we can avoid the high cost of pivoting while remaining numerically stable in most cases. The highly parallel architecture of these accelerators…
Advisors/Committee Members: Baboulin, Marc (thesis director).
Subjects/Keywords: Systèmes linéaires denses; Factorisation LU; Bibliothèques logicielles pour l’algèbre linéaire dense; Bibliothèque MAGMA; Calcul hybride multicœur/GPU; Processeurs graphiques; Intel Xeon Phi; . ccNUMA; Communication-avoiding; Randomisation; Placement des processus légers; Dense linear systems; LU factorization; Dense linear algebra libraries; MAGMA library; Hybrid multicore/GPU computing; Graphics process units; Intel Xeon Phi; . ccNUMA; Communication-avoiding algorithms; Randomization; Thread placement
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Rémy, A. (2015). Solving dense linear systems on accelerated multicore architectures : Résoudre des systèmes linéaires denses sur des architectures composées de processeurs multicœurs et d’accélerateurs. (Doctoral Dissertation). Université Paris-Sud – Paris XI. Retrieved from http://www.theses.fr/2015PA112138
Chicago Manual of Style (16th Edition):
Rémy, Adrien. “Solving dense linear systems on accelerated multicore architectures : Résoudre des systèmes linéaires denses sur des architectures composées de processeurs multicœurs et d’accélerateurs.” 2015. Doctoral Dissertation, Université Paris-Sud – Paris XI. Accessed March 01, 2021.
http://www.theses.fr/2015PA112138.
MLA Handbook (7th Edition):
Rémy, Adrien. “Solving dense linear systems on accelerated multicore architectures : Résoudre des systèmes linéaires denses sur des architectures composées de processeurs multicœurs et d’accélerateurs.” 2015. Web. 01 Mar 2021.
Vancouver:
Rémy A. Solving dense linear systems on accelerated multicore architectures : Résoudre des systèmes linéaires denses sur des architectures composées de processeurs multicœurs et d’accélerateurs. [Internet] [Doctoral dissertation]. Université Paris-Sud – Paris XI; 2015. [cited 2021 Mar 01].
Available from: http://www.theses.fr/2015PA112138.
Council of Science Editors:
Rémy A. Solving dense linear systems on accelerated multicore architectures : Résoudre des systèmes linéaires denses sur des architectures composées de processeurs multicœurs et d’accélerateurs. [Doctoral Dissertation]. Université Paris-Sud – Paris XI; 2015. Available from: http://www.theses.fr/2015PA112138

University of Kentucky
16.
Lee, Eun-Joo.
Accurate and Robust Preconditioning Techniques for Solving General Sparse Linear Systems.
Degree: 2008, University of Kentucky
URL: https://uknowledge.uky.edu/gradschool_diss/650
Please download this dissertation to see the abstract.
Subjects/Keywords: Linear System; Preconditioning; Incomplete LU (ILU); Factorization; Indefinite Matrices; Sparse Approximate Inverse (SAI) Preconditioner; Computer Sciences
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Lee, E. (2008). Accurate and Robust Preconditioning Techniques for Solving General Sparse Linear Systems. (Doctoral Dissertation). University of Kentucky. Retrieved from https://uknowledge.uky.edu/gradschool_diss/650
Chicago Manual of Style (16th Edition):
Lee, Eun-Joo. “Accurate and Robust Preconditioning Techniques for Solving General Sparse Linear Systems.” 2008. Doctoral Dissertation, University of Kentucky. Accessed March 01, 2021.
https://uknowledge.uky.edu/gradschool_diss/650.
MLA Handbook (7th Edition):
Lee, Eun-Joo. “Accurate and Robust Preconditioning Techniques for Solving General Sparse Linear Systems.” 2008. Web. 01 Mar 2021.
Vancouver:
Lee E. Accurate and Robust Preconditioning Techniques for Solving General Sparse Linear Systems. [Internet] [Doctoral dissertation]. University of Kentucky; 2008. [cited 2021 Mar 01].
Available from: https://uknowledge.uky.edu/gradschool_diss/650.
Council of Science Editors:
Lee E. Accurate and Robust Preconditioning Techniques for Solving General Sparse Linear Systems. [Doctoral Dissertation]. University of Kentucky; 2008. Available from: https://uknowledge.uky.edu/gradschool_diss/650
17.
Estecahandy, Elodie.
Contribution à l'analyse mathématique et à la résolution numérique d'un problème inverse de scattering élasto-acoustique : Contribution to the mathematical analysis and to the numerical solution of an inverse elasto-acoustic scattering problem.
Degree: Docteur es, Mathématiques appliquées, 2013, Pau
URL: http://www.theses.fr/2013PAUU3022
► La détermination de la forme d'un obstacle élastique immergé dans un milieu fluide à partir de mesures du champ d'onde diffracté est un problème d'un…
(more)
▼ La détermination de la forme d'un obstacle élastique immergé dans un milieu fluide à partir de mesures du champ d'onde diffracté est un problème d'un vif intérêt dans de nombreux domaines tels que le sonar, l'exploration géophysique et l'imagerie médicale. A cause de son caractère non-linéaire et mal posé, ce problème inverse de l'obstacle (IOP) est très difficile à résoudre, particulièrement d'un point de vue numérique. De plus, son étude requiert la compréhension de la théorie du problème de diffraction direct (DP) associé, et la maîtrise des méthodes de résolution correspondantes. Le travail accompli ici se rapporte à l'analyse mathématique et numérique du DP élasto-acoustique et de l'IOP. En particulier, nous avons développé un code de simulation numérique performant pour la propagation des ondes associée à ce type de milieux, basé sur une méthode de type DG qui emploie des éléments finis d'ordre supérieur et des éléments courbes à l'interface afin de mieux représenter l'interaction fluide-structure, et nous l'appliquons à la reconstruction d'objets par la mise en oeuvre d'une méthode de Newton régularisée.
The determination of the shape of an elastic obstacle immersed in water from some measurements of the scattered field is an important problem in many technologies such as sonar, geophysical exploration, and medical imaging. This inverse obstacle problem (IOP) is very difficult to solve, especially from a numerical viewpoint, because of its nonlinear and ill-posed character. Moreover, its investigation requires the understanding of the theory for the associated direct scattering problem (DP), and the mastery of the corresponding numerical solution methods. The work accomplished here pertains to the mathematical and numerical analysis of the elasto-acoustic DP and of the IOP. More specifically, we have developed an efficient numerical simulation code for wave propagation associated to this type of media, based on a DG-type method using higher-order finite elements and curved edges at the interface to better represent the fluid-structure interaction, and we apply it to the reconstruction of objects with the implementation of a regularized Newton method.
Advisors/Committee Members: Barucq, Hélène (thesis director).
Subjects/Keywords: Interaction fluide-solide; Problème de diffraction; Fréquence de Jones; Inégalité de Gärding; Alternative de Fredholm; Espace de Sobolev à poids; Méthode de Galerkin discontinue; Méthode élément fini; Raffinement hp; Effet de pollution; Arêtes de frontière courbes; Factorisation LU; Différentiabilité au sens de Fréchet; Dérivée de domaine; Frontière Lipschitzienne; Théorème des fonctions implicites; Méthode de Newton; Régularisation de Tikhonov; Domaine étoilé; B-splines quadratiques; Fluid-solid interaction; Scattering problem; Jones frequency; Gärding's inequality; Fredholm alternative; Weighted Sobolev space; Discontinuous Galerkin method; Finite element method; Hp-refinement,; Pollution effect; Curved boundary edges; LU factorization; Fréchet differentiability; Domain derivative; Lipschitz boundary; Implicit function theorem; Newton method; Tikhonov regularization; Star domain; Quadratic B-splines.
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Estecahandy, E. (2013). Contribution à l'analyse mathématique et à la résolution numérique d'un problème inverse de scattering élasto-acoustique : Contribution to the mathematical analysis and to the numerical solution of an inverse elasto-acoustic scattering problem. (Doctoral Dissertation). Pau. Retrieved from http://www.theses.fr/2013PAUU3022
Chicago Manual of Style (16th Edition):
Estecahandy, Elodie. “Contribution à l'analyse mathématique et à la résolution numérique d'un problème inverse de scattering élasto-acoustique : Contribution to the mathematical analysis and to the numerical solution of an inverse elasto-acoustic scattering problem.” 2013. Doctoral Dissertation, Pau. Accessed March 01, 2021.
http://www.theses.fr/2013PAUU3022.
MLA Handbook (7th Edition):
Estecahandy, Elodie. “Contribution à l'analyse mathématique et à la résolution numérique d'un problème inverse de scattering élasto-acoustique : Contribution to the mathematical analysis and to the numerical solution of an inverse elasto-acoustic scattering problem.” 2013. Web. 01 Mar 2021.
Vancouver:
Estecahandy E. Contribution à l'analyse mathématique et à la résolution numérique d'un problème inverse de scattering élasto-acoustique : Contribution to the mathematical analysis and to the numerical solution of an inverse elasto-acoustic scattering problem. [Internet] [Doctoral dissertation]. Pau; 2013. [cited 2021 Mar 01].
Available from: http://www.theses.fr/2013PAUU3022.
Council of Science Editors:
Estecahandy E. Contribution à l'analyse mathématique et à la résolution numérique d'un problème inverse de scattering élasto-acoustique : Contribution to the mathematical analysis and to the numerical solution of an inverse elasto-acoustic scattering problem. [Doctoral Dissertation]. Pau; 2013. Available from: http://www.theses.fr/2013PAUU3022
18.
Bomhof, C.W.
Iterative and parallel methods for linear systems, with applications in circuit simulation.
Degree: 2001, University Utrecht
URL: https://dspace.library.uu.nl/handle/1874/860
;
URN:NBN:NL:UI:10-1874-860
;
1874/860
;
URN:NBN:NL:UI:10-1874-860
;
https://dspace.library.uu.nl/handle/1874/860
► Bij het ontwerp van elektronische schakelingen, ie gebruikt wor en in bijvoorbeeld CD-spelers en mobiele telefoons, maakt e ontwerper veelvul ig gebruik van circuitsimulatie Bij…
(more)
▼ Bij het ontwerp van elektronische schakelingen, ie gebruikt wor en in bijvoorbeeld CD-spelers
en mobiele telefoons, maakt e ontwerper veelvul ig gebruik van circuitsimulatie
Bij circuitsimulatie wor t het ge rag van een schakeling (circuit) oorgereken met een
computer Hier oor wor t het maken van ure prototypes groten eels overbo ig Ook
zou zon er eze simulaties het ontwerpen van complexe ge¨integreer e schakelingen, met
vele uizen en transistoren, con ensatoren, weerstan en en ergelijke, niet mogelijk
zijn Om snel een circuit te kunnen ontwerpen is het voor e ontwerper van belang at e
simulatie niet te veel (computer-)rekentij kost Met snellere (slimmere) rekenmetho en
en ook met snellere computers, kan e rekentij verkort wor en
Dit proefschrift gaat groten eels over metho en ie tot oel hebben e rekentij voor
het simuleren van een circuit korter te maken De nieuwe metho en ie we ontwikkel
hebben zou en echter ook nuttig kunnen zijn bij e simulatie van an ere verschijnselen,
zoals bijvoorbeel vloeistofstromingen en chemische processen
Bij het simuleren van circuits wor t e meeste rekentij gebruikt voor het oplossen
van grote stelsels lineaire algebra¨ische vergelijkingen Een stelsel van 2 vergelijkingen
met 2 onbeken en, x en y, is bijvoorbeel
3x +5y =14
2x 3y =3,
met als oplossing x =3eny = 1 Bij circuitsimulatie kunnen e stelsels zeer veel,
bijvoorbeel meer an 50000, vergelijkingen hebben en evenveel onbeken en Deze
stelsels hebben an wel een ijle structuur Dat wil zeggen at er veel vergelijkingen
zijn ie slechts van een klein aantal onbeken en afhangen Door op een slimme manier
gebruik te maken van eze structuur kan er veel rekentij bespaar wor en Na het
inlei en e eerste hoof stuk beschrijven we in e hoof stukken 2 en 3 een gecombineer e
irecte en iteratieve metho e voor het oplossen van eze stelsels vergelijkingen
Bij een irecte metho e wor en onbeken en weggewerkt oor een geschikt veelvou
van een vergelijking bij een an ere vergelijking op te tellen Op eze manier kan uitein-
elijk e oplossing van het grote stelsel uitgereken wor en Bij een iteratieve metho e
gebeurt ongeveer hetzelf e, maar e hoeveelhei rekenwerk wor t sterk beperkt oor op
geschikte plaatsen in het proces co¨effici¨enten te verwaarlozen Het resultaat is an wel
een bena ering van e oplossing in plaats van e exacte oplossing Men tracht e fout
in e oplossing te verkleinen oor een correctie op e oplossing aan te brengen Deze
correctie wor t gevon en oor een vergelijking voor e fout op te stellen en eze even-eens
bij bena ering op te lossen Dit wor t herhaal tot at een vol oen nauwkeurige
oplossing gevonden is In e praktijk maken circuitsimulatie-programma s vooral gebruik van irecte metho-
en, om at eze sneller bleken te zijn an e tot nu toe bestaan e iteratieve metho en
In hoof stuk 2 laten we zien at een gecombineer e irecte en iteratieve metho e wel
rie keer sneller kan zijn an een irecte metho e Een prettige bijkomstighei van eze
aanpak is at hij ook geschikt is voor parallelle computers Dat zijn…
Subjects/Keywords: Parallel methods; mixed direct/iterative method; sparse LU factorization; circuit simulation; iterative solution methods; Schur complement; preconditioned iterative method; p-cyclic matrix; periodic steady state; Krylov subspace methods; GMRES and MINRES methods; matrix polynomial; rational matrix function
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Bomhof, C. W. (2001). Iterative and parallel methods for linear systems, with applications in circuit simulation. (Doctoral Dissertation). University Utrecht. Retrieved from https://dspace.library.uu.nl/handle/1874/860 ; URN:NBN:NL:UI:10-1874-860 ; 1874/860 ; URN:NBN:NL:UI:10-1874-860 ; https://dspace.library.uu.nl/handle/1874/860
Chicago Manual of Style (16th Edition):
Bomhof, C W. “Iterative and parallel methods for linear systems, with applications in circuit simulation.” 2001. Doctoral Dissertation, University Utrecht. Accessed March 01, 2021.
https://dspace.library.uu.nl/handle/1874/860 ; URN:NBN:NL:UI:10-1874-860 ; 1874/860 ; URN:NBN:NL:UI:10-1874-860 ; https://dspace.library.uu.nl/handle/1874/860.
MLA Handbook (7th Edition):
Bomhof, C W. “Iterative and parallel methods for linear systems, with applications in circuit simulation.” 2001. Web. 01 Mar 2021.
Vancouver:
Bomhof CW. Iterative and parallel methods for linear systems, with applications in circuit simulation. [Internet] [Doctoral dissertation]. University Utrecht; 2001. [cited 2021 Mar 01].
Available from: https://dspace.library.uu.nl/handle/1874/860 ; URN:NBN:NL:UI:10-1874-860 ; 1874/860 ; URN:NBN:NL:UI:10-1874-860 ; https://dspace.library.uu.nl/handle/1874/860.
Council of Science Editors:
Bomhof CW. Iterative and parallel methods for linear systems, with applications in circuit simulation. [Doctoral Dissertation]. University Utrecht; 2001. Available from: https://dspace.library.uu.nl/handle/1874/860 ; URN:NBN:NL:UI:10-1874-860 ; 1874/860 ; URN:NBN:NL:UI:10-1874-860 ; https://dspace.library.uu.nl/handle/1874/860

Universiteit Utrecht
19.
Bomhof, C.W.
Iterative and parallel methods for linear systems, with applications in circuit simulation.
Degree: 2001, Universiteit Utrecht
URL: http://dspace.library.uu.nl:8080/handle/1874/860
► Bij het ontwerp van elektronische schakelingen, ie gebruikt wor en in bijvoorbeeld CD-spelers en mobiele telefoons, maakt e ontwerper veelvul ig gebruik van circuitsimulatie Bij…
(more)
▼ Bij het ontwerp van elektronische schakelingen, ie gebruikt wor en in bijvoorbeeld CD-spelers
en mobiele telefoons, maakt e ontwerper veelvul ig gebruik van circuitsimulatie
Bij circuitsimulatie wor t het ge rag van een schakeling (circuit) oorgereken met een
computer Hier oor wor t het maken van ure prototypes groten eels overbo ig Ook
zou zon er eze simulaties het ontwerpen van complexe ge¨integreer e schakelingen, met
vele uizen en transistoren, con ensatoren, weerstan en en ergelijke, niet mogelijk
zijn Om snel een circuit te kunnen ontwerpen is het voor e ontwerper van belang at e
simulatie niet te veel (computer-)rekentij kost Met snellere (slimmere) rekenmetho en
en ook met snellere computers, kan e rekentij verkort wor en
Dit proefschrift gaat groten eels over metho en ie tot oel hebben e rekentij voor
het simuleren van een circuit korter te maken De nieuwe metho en ie we ontwikkel
hebben zou en echter ook nuttig kunnen zijn bij e simulatie van an ere verschijnselen,
zoals bijvoorbeel vloeistofstromingen en chemische processen
Bij het simuleren van circuits wor t e meeste rekentij gebruikt voor het oplossen
van grote stelsels lineaire algebra¨ische vergelijkingen Een stelsel van 2 vergelijkingen
met 2 onbeken en, x en y, is bijvoorbeel
3x +5y =14
2x 3y =3,
met als oplossing x =3eny = 1 Bij circuitsimulatie kunnen e stelsels zeer veel,
bijvoorbeel meer an 50000, vergelijkingen hebben en evenveel onbeken en Deze
stelsels hebben an wel een ijle structuur Dat wil zeggen at er veel vergelijkingen
zijn ie slechts van een klein aantal onbeken en afhangen Door op een slimme manier
gebruik te maken van eze structuur kan er veel rekentij bespaar wor en Na het
inlei en e eerste hoof stuk beschrijven we in e hoof stukken 2 en 3 een gecombineer e
irecte en iteratieve metho e voor het oplossen van eze stelsels vergelijkingen
Bij een irecte metho e wor en onbeken en weggewerkt oor een geschikt veelvou
van een vergelijking bij een an ere vergelijking op te tellen Op eze manier kan uitein-
elijk e oplossing van het grote stelsel uitgereken wor en Bij een iteratieve metho e
gebeurt ongeveer hetzelf e, maar e hoeveelhei rekenwerk wor t sterk beperkt oor op
geschikte plaatsen in het proces co¨effici¨enten te verwaarlozen Het resultaat is an wel
een bena ering van e oplossing in plaats van e exacte oplossing Men tracht e fout
in e oplossing te verkleinen oor een correctie op e oplossing aan te brengen Deze
correctie wor t gevon en oor een vergelijking voor e fout op te stellen en eze even-eens
bij bena ering op te lossen Dit wor t herhaal tot at een vol oen nauwkeurige
oplossing gevonden is In e praktijk maken circuitsimulatie-programma s vooral gebruik van irecte metho-
en, om at eze sneller bleken te zijn an e tot nu toe bestaan e iteratieve metho en
In hoof stuk 2 laten we zien at een gecombineer e irecte en iteratieve metho e wel
rie keer sneller kan zijn an een irecte metho e Een prettige bijkomstighei van eze
aanpak is at hij ook geschikt is voor parallelle computers Dat zijn…
Subjects/Keywords: Wiskunde en Informatica; Parallel methods; mixed direct/iterative method; sparse LU factorization; circuit simulation; iterative solution methods; Schur complement; preconditioned iterative method; p-cyclic matrix; periodic steady state; Krylov subspace methods; GMRES and MINRES methods; matrix polynomial; rational matrix function
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Bomhof, C. W. (2001). Iterative and parallel methods for linear systems, with applications in circuit simulation. (Doctoral Dissertation). Universiteit Utrecht. Retrieved from http://dspace.library.uu.nl:8080/handle/1874/860
Chicago Manual of Style (16th Edition):
Bomhof, C W. “Iterative and parallel methods for linear systems, with applications in circuit simulation.” 2001. Doctoral Dissertation, Universiteit Utrecht. Accessed March 01, 2021.
http://dspace.library.uu.nl:8080/handle/1874/860.
MLA Handbook (7th Edition):
Bomhof, C W. “Iterative and parallel methods for linear systems, with applications in circuit simulation.” 2001. Web. 01 Mar 2021.
Vancouver:
Bomhof CW. Iterative and parallel methods for linear systems, with applications in circuit simulation. [Internet] [Doctoral dissertation]. Universiteit Utrecht; 2001. [cited 2021 Mar 01].
Available from: http://dspace.library.uu.nl:8080/handle/1874/860.
Council of Science Editors:
Bomhof CW. Iterative and parallel methods for linear systems, with applications in circuit simulation. [Doctoral Dissertation]. Universiteit Utrecht; 2001. Available from: http://dspace.library.uu.nl:8080/handle/1874/860
20.
Khabou, Amal.
Dense matrix computations : communication cost and numerical stability : Calculs pour les matrices denses : coût de communication et stabilité numérique.
Degree: Docteur es, Informatique, 2013, Université Paris-Sud – Paris XI
URL: http://www.theses.fr/2013PA112011
► Cette thèse traite d’une routine d’algèbre linéaire largement utilisée pour la résolution des systèmes li- néaires, il s’agit de la factorisation LU. Habituellement, pour calculer…
(more)
▼ Cette thèse traite d’une routine d’algèbre linéaire largement utilisée pour la résolution des systèmes li- néaires, il s’agit de la factorisation LU. Habituellement, pour calculer une telle décomposition, on utilise l’élimination de Gauss avec pivotage partiel (GEPP). La stabilité numérique de l’élimination de Gauss avec pivotage partiel est caractérisée par un facteur de croissance qui est reste assez petit en pratique. Toutefois, la version parallèle de cet algorithme ne permet pas d’atteindre les bornes inférieures qui ca- ractérisent le coût de communication pour un algorithme donné. En effet, la factorisation d’un bloc de colonnes constitue un goulot d’étranglement en termes de communication. Pour remédier à ce problème, Grigori et al [60] ont développé une factorisation LU qui minimise la communication(CALU) au prix de quelques calculs redondants. En théorie la borne supérieure du facteur de croissance de CALU est plus grande que celle de l’élimination de Gauss avec pivotage partiel, cependant CALU est stable en pratique. Pour améliorer la borne supérieure du facteur de croissance, nous étudions une nouvelle stra- tégie de pivotage utilisant la factorisation QR avec forte révélation de rang. Ainsi nous développons un nouvel algorithme pour la factorisation LU par blocs. La borne supérieure du facteur de croissance de cet algorithme est plus petite que celle de l’élimination de Gauss avec pivotage partiel. Cette stratégie de pivotage est ensuite combinée avec le pivotage basé sur un tournoi pour produire une factorisation LU qui minimise la communication et qui est plus stable que CALU. Pour les systèmes hiérarchiques, plusieurs niveaux de parallélisme sont disponibles. Cependant, aucune des méthodes précédemment ci- tées n’exploite pleinement ces ressources. Nous proposons et étudions alors deux algorithmes récursifs qui utilisent les mêmes principes que CALU mais qui sont plus appropriés pour des architectures à plu- sieurs niveaux de parallélisme. Pour analyser d’une façon précise et réaliste
This dissertation focuses on a widely used linear algebra kernel to solve linear systems, that is the LU decomposition. Usually, to perform such a computation one uses the Gaussian elimination with partial pivoting (GEPP). The backward stability of GEPP depends on a quantity which is referred to as the growth factor, it is known that in general GEPP leads to modest element growth in practice. However its parallel version does not attain the communication lower bounds. Indeed the panel factorization rep- resents a bottleneck in terms of communication. To overcome this communication bottleneck, Grigori et al [60] have developed a communication avoiding LU factorization (CALU), which is asymptotically optimal in terms of communication cost at the cost of some redundant computation. In theory, the upper bound of the growth factor is larger than that of Gaussian elimination with partial pivoting, however CALU is stable in practice. To improve the upper bound of the growth factor, we study a new pivoting strategy based on…
Advisors/Committee Members: Grigori, Laura (thesis director).
Subjects/Keywords: Factorisation LU; Élimination de Gauss avec pivotage partiel; Acteur de croissance, facto- risation QR avec forte révélation de rang; Minimisation de la communication; Algorithmes parallèles; Systèmes hiérarchiques; Modèles de performance; Stratégies de pivotage; LU factorization; Gaussian elimination with partial pivoting; Growth factor; Minimizing the communication cost; Parallel algorithms; Hierarchical systems; Performance models; Pivoting strategies
…avoiding LU factorization. . . . . 12
2.1
Upper bounds of the growth factor gW obtained from… …of these algorithms, a communication avoiding LU
factorization referred to as CALU was… …The first part introduces a new LU factorization using a new pivoting
strategy based on… …communication, this algorithm is more stable than the original communication avoiding LU factorization… …solving linear systems
of equations, that is the LU factorization and the QR factorization. In…
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Khabou, A. (2013). Dense matrix computations : communication cost and numerical stability : Calculs pour les matrices denses : coût de communication et stabilité numérique. (Doctoral Dissertation). Université Paris-Sud – Paris XI. Retrieved from http://www.theses.fr/2013PA112011
Chicago Manual of Style (16th Edition):
Khabou, Amal. “Dense matrix computations : communication cost and numerical stability : Calculs pour les matrices denses : coût de communication et stabilité numérique.” 2013. Doctoral Dissertation, Université Paris-Sud – Paris XI. Accessed March 01, 2021.
http://www.theses.fr/2013PA112011.
MLA Handbook (7th Edition):
Khabou, Amal. “Dense matrix computations : communication cost and numerical stability : Calculs pour les matrices denses : coût de communication et stabilité numérique.” 2013. Web. 01 Mar 2021.
Vancouver:
Khabou A. Dense matrix computations : communication cost and numerical stability : Calculs pour les matrices denses : coût de communication et stabilité numérique. [Internet] [Doctoral dissertation]. Université Paris-Sud – Paris XI; 2013. [cited 2021 Mar 01].
Available from: http://www.theses.fr/2013PA112011.
Council of Science Editors:
Khabou A. Dense matrix computations : communication cost and numerical stability : Calculs pour les matrices denses : coût de communication et stabilité numérique. [Doctoral Dissertation]. Université Paris-Sud – Paris XI; 2013. Available from: http://www.theses.fr/2013PA112011
.