Level-3 BLAS on a GPU: Picking the Low Hanging Fruit

Quintana-Ortí, Gregorio; Van de Geijn, Robert A.

dc.contributor.author	Quintana-Ortí, Gregorio
dc.contributor.author	Van de Geijn, Robert A.
dc.date.accessioned	2011-09-09T07:06:41Z
dc.date.available	2011-09-09T07:06:41Z
dc.date.issued	2009-04
dc.identifier.uri	http://hdl.handle.net/10234/27765
dc.description.abstract	The arrival of hardware accelerators has created a new gold rush to be the rst to deliver their promise of high performance for numerical applications. Since they are relatively hard to program, with limited language and compiler support, it is generally accepted that one needs to roll up one's sleeves and tough it out, not unlike the early days of distributed me- mory parallel computing (or any other period after the introduction of a drastically di erent architecture). In this paper we remind the community that while this is a noble endeavor, there is a lot of low hanging fruit that can be harvested easily. Picking this low hanging fruit bene ts the scienti c computing community immediately and prototypes the approach that the further optimizations may wish to follow. We demonstrate this by focusing on a widely used set of operations, the level-3 BLAS, targeting the NVIDIA family of GPUs
dc.description.abstract	La llegada de los aceleradores hardware ha creado una nueva fiebre del oro en ser los primeros en conseguir las prometidas elevadas prestaciones en aplicaciones numéricas. Ya que son relativamente difíciles de programar, con un soporte de lenguajes y compiladores limitado, se acepta que uno tiene que arremangarse la camisa y apretar los dientes, de forma no muy distinta a los primeros días de la programación de máquinas con memoria distribuida (o a cualquier otro periodo tras la introducción de una arquitectura drásticamente diferente). En este trabajo recordamos a la comunidad que mientras ésa es una actitud noble, hay un montón de fruta que puede ser recogida mucho más fácilmente. Recoger esta fruta beneficia a la comunidad científica inmediatamente y sirve para prototipar las aproximaciones que las subsiguientes optimizaciones deberían seguir. En este artículo demostramos lo anterior aplicándolo a un amplio conjunto de operaciones, el BLAS de nivel 3, orientado la la familia de GPUs de NVIDIA
dc.format.extent	12 p.
dc.language.iso	eng
dc.publisher	Departament d' Enginyeria i Ciència dels Computadors, Universitat Jaume I
dc.relation.isPartOfSeries	Informe técnico ICC;2009-04-01
dc.rights.uri	http://rightsstatements.org/vocab/CNE/1.0/	*
dc.subject	Numerical linear algebra
dc.subject	Hardware accelerators
dc.subject	BLAS-3
dc.subject	Algebra lineal numérica
dc.subject	Aceleradores Hardware
dc.title	Level-3 BLAS on a GPU: Picking the Low Hanging Fruit
dc.title.alternative	BLAS-3 sobre una GPU: Recogiendo la fruta fácil
dc.type	info:eu-repo/semantics/report
dc.rights.accessRights	info:eu-repo/semantics/openAccess

Ficheros en el ítem

Nombre:: ICC_2009-04-01.pdf
Tamaño:: 199.1Kb
Formato:: PDF

Ver/Abrir

Este ítem aparece en la(s) siguiente(s) colección(ones)

ICC_Reports [18]

Mostrar el registro sencillo del ítem