Strategies to parallelize a finite element mesh truncation technique on multi-core and many-core architectures

Badia, Jose M.; Amor-Martin, Adrian; BELLOCH, JOSE A.; Garcia-Castillo, Luis Emilio

dc.contributor.author	Badia, Jose M.
dc.contributor.author	Amor-Martin, Adrian
dc.contributor.author	BELLOCH, JOSE A.
dc.contributor.author	Garcia-Castillo, Luis Emilio
dc.date.accessioned	2023-01-30T08:48:17Z
dc.date.available	2023-01-30T08:48:17Z
dc.date.issued	2022-12-02
dc.identifier.citation	BADIA, Jose M., et al. Strategies to parallelize a finite element mesh truncation technique on multi-core and many-core architectures. The Journal of Supercomputing, 79, 7648–7664 (2023).	ca_CA
dc.identifier.issn	0920-8542
dc.identifier.issn	1573-0484
dc.identifier.uri	http://hdl.handle.net/10234/201465
dc.description.abstract	Achieving maximum parallel performance on multi-core CPUs and many-core GPUs is a challenging task depending on multiple factors. These include, for example, the number and granularity of the computations or the use of the memories of the devices. In this paper, we assess those factors by evaluating and comparing different parallelizations of the same problem on a multiprocessor containing a CPU with 40 cores and four P100 GPUs with Pascal architecture. We use, as study case, the convolutional operation behind a non-standard finite element mesh truncation technique in the context of open region electromagnetic wave propagation problems. A total of six parallel algorithms implemented using OpenMP and CUDA have been used to carry out the comparison by leveraging the same levels of parallelism on both types of platforms. Three of the algorithms are presented for the first time in this paper, including a multi-GPU method, and two others are improved versions of algorithms previously developed by some of the authors. This paper presents a thorough experimental evaluation of the parallel algorithms on a radar cross-sectional prediction problem. Results show that performance obtained on the GPU clearly overcomes those obtained in the CPU, much more so if we use multiple GPUs to distribute both data and computations. Accelerations close to 30 have been obtained on the CPU, while with the multi-GPU version accelerations larger than 250 have been achieved.	ca_CA
dc.description.sponsorShip	Funding for open access charge: CRUE-Universitat Jaume I
dc.format.extent	17 p.	ca_CA
dc.format.mimetype	application/pdf	ca_CA
dc.language.iso	eng	ca_CA
dc.publisher	Springer	ca_CA
dc.rights	© The Author(s) 2022	ca_CA
dc.rights.uri	http://creativecommons.org/licenses/by/4.0/	ca_CA
dc.subject	Parallel computing	ca_CA
dc.subject	CUDA	ca_CA
dc.subject	OpenMP	ca_CA
dc.subject	Finite elements	ca_CA
dc.subject	GPU	ca_CA
dc.title	Strategies to parallelize a finite element mesh truncation technique on multi-core and many-core architectures	ca_CA
dc.type	info:eu-repo/semantics/article	ca_CA
dc.identifier.doi	https://doi.org/10.1007/s11227-022-04975-6
dc.rights.accessRights	info:eu-repo/semantics/openAccess	ca_CA
dc.type.version	info:eu-repo/semantics/publishedVersion	ca_CA
project.funder.name	Gobierno de España	ca_CA
project.funder.name	Generalitat Valenciana	ca_CA
project.funder.name	Gobierno de la Comunidad de Madrid	ca_CA
oaire.awardNumber	PID2020-113656RB-C21	ca_CA
oaire.awardNumber	PID2019-106455GB-C21	ca_CA
oaire.awardNumber	PROMETEO/2019/109	ca_CA
oaire.awardNumber	MIMACUHSPACE-CM-UC3M	ca_CA

Ficheros en el ítem

Nombre:: badia_2022_strategies.pdf
Tamaño:: 869.0Kb
Formato:: PDF

Ver/Abrir

Este ítem aparece en la(s) siguiente(s) colección(ones)

ICC_Articles [424]

Mostrar el registro sencillo del ítem