Exploiting Task and Data Parallelism in ILUPACK's Preconditioned CG Solver on NUMA Architectures and Many-core Accelerators
Ver/ Abrir
Impacto
Scholar |
Otros documentos de la autoría: Aliaga Estellés, José Ignacio; Badía Sala, Rosa María; Barreda Vayá, Maria; Bollhöffer, Matthias; Dufrechou, Ernesto; Ezzatti, Pablo; Quintana-Orti, Enrique S.
Metadatos
Mostrar el registro completo del ítemcomunitat-uji-handle:10234/9
comunitat-uji-handle2:10234/7036
comunitat-uji-handle3:10234/8620
comunitat-uji-handle4:
INVESTIGACIONMetadatos
Título
Exploiting Task and Data Parallelism in ILUPACK's Preconditioned CG Solver on NUMA Architectures and Many-core AcceleratorsAutoría
Fecha de publicación
2016-05Editor
ElsevierCita bibliográfica
ALIAGA ESTELLÉS, José Ignacio; BADÍA SALA, Rosa María; BARREDA VAYÁ, María; BOLLHÖFFER, Matthias; DUFRECHOU, Ernesto; EZZATTI, Pablo; QUINTANA ORTÍ, Enrique S. Exploiting Task and Data Parallelism in ILUPACK's Preconditioned CG Solver on NUMA Architectures and Many-core Accelerators. Parallel Computing (2016), v. 54, pp. 97-107Tipo de documento
info:eu-repo/semantics/articleVersión de la editorial
http://www.sciencedirect.com/science/article/pii/S0167819115001581Palabras clave / Materias
Resumen
We present specialized implementations of the preconditioned iterative linear
system solver in ILUPACK for Non-Uniform Memory Access (NUMA) platforms
and many-core hardware co-processors based on the Intel Xeon ... [+]
We present specialized implementations of the preconditioned iterative linear
system solver in ILUPACK for Non-Uniform Memory Access (NUMA) platforms
and many-core hardware co-processors based on the Intel Xeon Phi
and graphics accelerators. For the conventional x86 architectures, our approach
exploits task parallelism via the OmpSs runtime as well as a messagepassing
implementation based on MPI, respectively yielding a dynamic and
static schedule of the work to the cores, with di erent numeric semantics
to those of the sequential ILUPACK. For the graphics processor we exploit
data parallelism by o -loading the computationally expensive kernels to the
accelerator while keeping the numeric semantics of the sequential case. [-]
Publicado en
Parallel Computing (2016), v. 54Derechos de acceso
http://rightsstatements.org/vocab/CNE/1.0/
info:eu-repo/semantics/openAccess
info:eu-repo/semantics/openAccess
Aparece en las colecciones
- ICC_Articles [413]