Balancing task- and data-level parallelism to improve performance and energy consumption of matrix computations on the Intel Xeon Phi

Dolz, Manuel F.; Igual, Francisco; Ludwig, Thomas; Piñuel, Luis; Quintana-Orti, Enrique S.

dc.contributor.author	Dolz, Manuel F.
dc.contributor.author	Igual, Francisco
dc.contributor.author	Ludwig, Thomas
dc.contributor.author	Piñuel, Luis
dc.contributor.author	Quintana-Orti, Enrique S.
dc.date.accessioned	2016-04-25T17:29:23Z
dc.date.available	2016-04-25T17:29:23Z
dc.date.issued	2015-08
dc.identifier.issn	0045-7906
dc.identifier.uri	http://hdl.handle.net/10234/158945
dc.description.abstract	The emergence of new manycore architectures, such as the Intel Xeon Phi, poses new challenges in how to adapt existing libraries and applications to this type of systems. In particular, the exploitation of manycore accelerators requires a holistic solution that simultaneously addresses time-to-response, energy efficiency and ease of programming. In this paper, we adapt the SuperMatrix runtime task scheduler for dense linear algebra algorithms to the many-threaded Intel Xeon Phi, with special emphasis on the performance and energy profile of the solution. From the performance perspective, we optimize the balance between task- and data-parallelism, reporting notable results compared with Intel MKL. From the energy-aware point of view, we propose a methodology that relies on core-level event counters and aggregated power consumption samples to obtain a task-level accounting for the energy. In addition, we introduce a blocking mechanism to reduce power and energy consumption during the idle periods inherent to task parallel executions.	ca_CA
dc.description.sponsorShip	This research was supported by project CICYT TIN2011-23283, CICYT-TIN 2012-32180, FEDER, and the EU Project FP7 318793 “EXA2GREEN”. We thank Rafael Rodríguez, Sandra Catalán, and the members of the FLAME team for their support. This work was partially conducted while Francisco D. Igual and Enrique S. Quintana-Ortí were visiting The University of Texas at Austin, funded by the JTO visitor applications programme from the Institute for Computational Engineering and Sciences (ICES) at UT.
dc.format.extent	17 p.	ca_CA
dc.format.mimetype	application/pdf	ca_CA
dc.language.iso	eng	ca_CA
dc.publisher	Elsevier	ca_CA
dc.relation.isPartOf	Computers & Electrical Engineering, 2015, vol. 46	ca_CA
dc.rights	Copyright © 2015 Elsevier Ltd. All rights reserved.	ca_CA
dc.rights.uri	http://rightsstatements.org/vocab/InC/1.0/	*
dc.subject	Power-aware computing	ca_CA
dc.subject	High performance	ca_CA
dc.subject	Many-core architectures	ca_CA
dc.subject	Runtime task schedulers	ca_CA
dc.subject	Dense linear algebra	ca_CA
dc.title	Balancing task- and data-level parallelism to improve performance and energy consumption of matrix computations on the Intel Xeon Phi	ca_CA
dc.type	info:eu-repo/semantics/article	ca_CA
dc.identifier.doi	http://dx.doi.org/10.1016/j.compeleceng.2015.06.009
dc.rights.accessRights	info:eu-repo/semantics/restrictedAccess	ca_CA
dc.relation.publisherVersion	http://www.sciencedirect.com/science/article/pii/S004579061500213X	ca_CA

Fitxers en aquest element

Fitxers	Grandària	Format	Visualització
No hi ha fitxers associats a aquest element.

Aquest element apareix en la col·lecció o col·leccions següent(s)

ICC_Articles [420]

Mostra el registre parcial de l'element