Exploiting Task and Data Parallelism in ILUPACK's Preconditioned CG Solver on NUMA Architectures and Many-core Accelerators
![Thumbnail](/xmlui/bitstream/handle/10234/165072/Aliaga_2016_Exploiting.pdf.jpg?sequence=4&isAllowed=y)
View/ Open
Impact
![Google Scholar](/xmlui/themes/Mirage2/images/uji/logo_google.png)
![Microsoft Academico](/xmlui/themes/Mirage2/images/uji/logo_microsoft.png)
Metadata
Show full item recordcomunitat-uji-handle:10234/9
comunitat-uji-handle2:10234/7036
comunitat-uji-handle3:10234/8620
comunitat-uji-handle4:
INVESTIGACIONMetadata
Title
Exploiting Task and Data Parallelism in ILUPACK's Preconditioned CG Solver on NUMA Architectures and Many-core AcceleratorsAuthor (s)
Date
2016-05Publisher
ElsevierBibliographic citation
ALIAGA ESTELLÉS, José Ignacio; BADÍA SALA, Rosa María; BARREDA VAYÁ, María; BOLLHÖFFER, Matthias; DUFRECHOU, Ernesto; EZZATTI, Pablo; QUINTANA ORTÍ, Enrique S. Exploiting Task and Data Parallelism in ILUPACK's Preconditioned CG Solver on NUMA Architectures and Many-core Accelerators. Parallel Computing (2016), v. 54, pp. 97-107Type
info:eu-repo/semantics/articlePublisher version
http://www.sciencedirect.com/science/article/pii/S0167819115001581Subject
Abstract
We present specialized implementations of the preconditioned iterative linear
system solver in ILUPACK for Non-Uniform Memory Access (NUMA) platforms
and many-core hardware co-processors based on the Intel Xeon ... [+]
We present specialized implementations of the preconditioned iterative linear
system solver in ILUPACK for Non-Uniform Memory Access (NUMA) platforms
and many-core hardware co-processors based on the Intel Xeon Phi
and graphics accelerators. For the conventional x86 architectures, our approach
exploits task parallelism via the OmpSs runtime as well as a messagepassing
implementation based on MPI, respectively yielding a dynamic and
static schedule of the work to the cores, with di erent numeric semantics
to those of the sequential ILUPACK. For the graphics processor we exploit
data parallelism by o -loading the computationally expensive kernels to the
accelerator while keeping the numeric semantics of the sequential case. [-]
Is part of
Parallel Computing (2016), v. 54Rights
http://rightsstatements.org/vocab/CNE/1.0/
info:eu-repo/semantics/openAccess
info:eu-repo/semantics/openAccess
This item appears in the folowing collection(s)
- ICC_Articles [427]