Balancing task- and data-level parallelism to improve performance and energy consumption of matrix computations on the Intel Xeon Phi
Impacto
Scholar |
Otros documentos de la autoría: Dolz, Manuel F.; Igual, Francisco D.; Ludwig, Thomas; Piñuel, Luis; Quintana-Orti, Enrique S.
Metadatos
Mostrar el registro completo del ítemcomunitat-uji-handle:10234/9
comunitat-uji-handle2:10234/7036
comunitat-uji-handle3:10234/8620
comunitat-uji-handle4:
INVESTIGACIONEste recurso está restringido
http://dx.doi.org/10.1016/j.compeleceng.2015.06.009 |
Metadatos
Título
Balancing task- and data-level parallelism to improve performance and energy consumption of matrix computations on the Intel Xeon PhiAutoría
Fecha de publicación
2015-08Editor
ElsevierISSN
0045-7906Tipo de documento
info:eu-repo/semantics/articleVersión de la editorial
http://www.sciencedirect.com/science/article/pii/S004579061500213XPalabras clave / Materias
Resumen
The emergence of new manycore architectures, such as the Intel Xeon Phi, poses new challenges in how to adapt existing libraries and applications to this type of systems. In particular, the exploitation of manycore ... [+]
The emergence of new manycore architectures, such as the Intel Xeon Phi, poses new challenges in how to adapt existing libraries and applications to this type of systems. In particular, the exploitation of manycore accelerators requires a holistic solution that simultaneously addresses time-to-response, energy efficiency and ease of programming. In this paper, we adapt the SuperMatrix runtime task scheduler for dense linear algebra algorithms to the many-threaded Intel Xeon Phi, with special emphasis on the performance and energy profile of the solution. From the performance perspective, we optimize the balance between task- and data-parallelism, reporting notable results compared with Intel MKL. From the energy-aware point of view, we propose a methodology that relies on core-level event counters and aggregated power consumption samples to obtain a task-level accounting for the energy. In addition, we introduce a blocking mechanism to reduce power and energy consumption during the idle periods inherent to task parallel executions. [-]
Publicado en
Computers & Electrical Engineering, 2015, vol. 46Derechos de acceso
Copyright © 2015 Elsevier Ltd. All rights reserved.
http://rightsstatements.org/vocab/InC/1.0/
info:eu-repo/semantics/restrictedAccess
http://rightsstatements.org/vocab/InC/1.0/
info:eu-repo/semantics/restrictedAccess
Aparece en las colecciones
- ICC_Articles [415]