Mostra el registre parcial de l'element

dc.contributor.authorCastelló, Adrián
dc.contributor.authorCatalán Carbó, Mar
dc.contributor.authorDolz, Manuel F.
dc.contributor.authorQuintana-Orti, Enrique S.
dc.contributor.authorDuato, José
dc.date.accessioned2022-02-16T12:12:36Z
dc.date.available2022-02-16T12:12:36Z
dc.date.issued2022-01-10
dc.identifier.citationCastelló, A., Catalán, M., Dolz, M.F. et al. Analyzing the impact of the MPI allreduce in distributed training of convolutional neural networks. Computing, 105, 1101–1119 (2023). https://doi.org/10.1007/s00607-021-01029-2ca_CA
dc.identifier.urihttp://hdl.handle.net/10234/196782
dc.description.abstractFor many distributed applications, data communication poses an important bottleneck from the points of view of performance and energy consumption. As more cores are integrated per node, in general the global performance of the system increases yet eventually becomes limited by the interconnection network. This is the case for distributed data-parallel training of convolutional neural networks (CNNs), which usually proceeds on a cluster with a small to moderate number of nodes. In this paper, we analyze the performance of the Allreduce collective communication primitive, a key to the efficient data-parallel distributed training of CNNs. Our study targets the distinct realizations of this primitive in three high performance instances of Message Passing Interface (MPI), namely MPICH, OpenMPI, and IntelMPI, and employs a cluster equipped with state-of-the-art processor and network technologies. In addition, we apply the insights gained from the experimental analysis to the optimization of the TensorFlow framework when running on top of Horovod. Our study reveals that a careful selection of the most convenient MPI library and Allreduce (ARD) realization accelerates the training throughput by a factor of 1.2× compared with the default algorithm in the same MPI library, and up to 2.8× when comparing distinct MPI libraries in a number of relevant combinations of CNN model+dataset.ca_CA
dc.format.extent19 p.ca_CA
dc.format.mimetypeapplication/pdfca_CA
dc.language.isoengca_CA
dc.publisherSpringerca_CA
dc.relation.isPartOfComputing (2023)ca_CA
dc.rights© The Author(s), under exclusive licence to Springer-Verlag GmbH Austria, part of Springer Nature 2021ca_CA
dc.rights.urihttp://rightsstatements.org/vocab/InC/1.0/ca_CA
dc.subjectmessage passing interface (MPI)ca_CA
dc.subjectcollective communication primitivesca_CA
dc.subjectAllreduceca_CA
dc.subjectdeep learningca_CA
dc.subjectdistributed trainingca_CA
dc.titleAnalyzing the impact of the MPI allreduce in distributed training of convolutional neural networksca_CA
dc.typeinfo:eu-repo/semantics/articleca_CA
dc.identifier.doihttps://doi.org/10.1007/s00607-021-01029-2
dc.rights.accessRightsinfo:eu-repo/semantics/openAccessca_CA
dc.type.versioninfo:eu-repo/semantics/publishedVersionca_CA
project.funder.nameMinisterio de Ciencia, Innovación y Universidades (Spain)ca_CA
project.funder.nameGeneralitat Valencianaca_CA
oaire.awardNumberTIN2017-82972-Rca_CA
oaire.awardNumberPrometeo/2019/109ca_CA
oaire.awardNumberCDEIGENT/2018/014ca_CA
oaire.awardNumberFJC2019-039222-Ica_CA


Fitxers en aquest element

Thumbnail

Aquest element apareix en la col·lecció o col·leccions següent(s)

Mostra el registre parcial de l'element