Exploiting Task and Data Parallelism in ILUPACK's Preconditioned CG Solver on NUMA Architectures and Many-core Accelerators
Visualitza/
Impacte
Scholar |
Altres documents de l'autoria: Aliaga Estellés, José Ignacio; Badía Sala, Rosa María; Barreda Vayá, Maria; Bollhöffer, Matthias; Dufrechou, Ernesto; Ezzatti, Pablo; Quintana-Orti, Enrique S.
Metadades
Mostra el registre complet de l'elementcomunitat-uji-handle:10234/9
comunitat-uji-handle2:10234/7036
comunitat-uji-handle3:10234/8620
comunitat-uji-handle4:
INVESTIGACIONMetadades
Títol
Exploiting Task and Data Parallelism in ILUPACK's Preconditioned CG Solver on NUMA Architectures and Many-core AcceleratorsAutoria
Data de publicació
2016-05Editor
ElsevierCita bibliogràfica
ALIAGA ESTELLÉS, José Ignacio; BADÍA SALA, Rosa María; BARREDA VAYÁ, María; BOLLHÖFFER, Matthias; DUFRECHOU, Ernesto; EZZATTI, Pablo; QUINTANA ORTÍ, Enrique S. Exploiting Task and Data Parallelism in ILUPACK's Preconditioned CG Solver on NUMA Architectures and Many-core Accelerators. Parallel Computing (2016), v. 54, pp. 97-107Tipus de document
info:eu-repo/semantics/articleVersió de l'editorial
http://www.sciencedirect.com/science/article/pii/S0167819115001581Paraules clau / Matèries
Resum
We present specialized implementations of the preconditioned iterative linear
system solver in ILUPACK for Non-Uniform Memory Access (NUMA) platforms
and many-core hardware co-processors based on the Intel Xeon ... [+]
We present specialized implementations of the preconditioned iterative linear
system solver in ILUPACK for Non-Uniform Memory Access (NUMA) platforms
and many-core hardware co-processors based on the Intel Xeon Phi
and graphics accelerators. For the conventional x86 architectures, our approach
exploits task parallelism via the OmpSs runtime as well as a messagepassing
implementation based on MPI, respectively yielding a dynamic and
static schedule of the work to the cores, with di erent numeric semantics
to those of the sequential ILUPACK. For the graphics processor we exploit
data parallelism by o -loading the computationally expensive kernels to the
accelerator while keeping the numeric semantics of the sequential case. [-]
Publicat a
Parallel Computing (2016), v. 54Drets d'accés
http://rightsstatements.org/vocab/CNE/1.0/
info:eu-repo/semantics/openAccess
info:eu-repo/semantics/openAccess
Apareix a les col.leccions
- ICC_Articles [430]