• closedAccess   Evaluating the performance and energy efficiency of the COSMO-ART model system 

      Charles, Joseph; Sawyer, William; Dolz, Manuel F.; Catalán, Sandra Springer Berlin Heidelberg (2015-05)
      In this paper we investigate the energy footprint and performance profiling of COSMO-ART on various HPC platforms. This model is an extension of the operational weather forecast model of the German weather service (DWD), ...
    • closedAccess   Exploring stream parallel patterns in distributed MPI environments 

      López-Gómez, Javier; Fernández Muñoz, Javier; del Río Astorga, David; Dolz, Manuel F.; García, J. Daniel Elsevier (2019)
      In recent years, the large volumes of stream data and the near real-time requirements of data streaming applications have exacerbated the need for new scalable algorithms and programming interfaces for distributed and ...
    • closedAccess   Finding parallel patterns through static analysis in C++ applications 

      del Río Astorga, David; Dolz, Manuel F.; Sánchez García, Luis Miguel; García, J. Daniel; DANELUTTO, MARCO; Torquati, Massimo Sage (2017-03)
      Since the ‘free lunch’ of processor performance is over, parallelism has become the new trend in hardware and architecture design. However, parallel resources deployed in data centers are underused in many cases, given ...
    • openAccess   High performance and energy efficient inference for deep learning on multicore ARM processors using general optimization techniques and BLIS 

      Castelló, Adrián; Barrachina Mir, Sergio; Dolz, Manuel F.; Quintana-Orti, Enrique S.; San Juan, Pau; Tomás Domínguez, Andrés Enrique Elsevier (2022-03-22)
      We evolve PyDTNN, a framework for distributed parallel training of Deep Neural Networks (DNNs), into an efficient inference tool for convolutional neural networks. Our optimization process on multicore ARM processors ...
    • closedAccess   Hybrid static–dynamic selection of implementation alternatives in heterogeneous environments 

      del Río Astorga, David; Dolz, Manuel F.; Fernández Muñoz, Javier; García Blas, Javier Springer (2019-09)
      With the emergence of heterogeneous architectures, developing parallel software has become an increasingly complex task. The ability of using multiple devices in a single application, such as CPUs, accelerators, or ...
    • closedAccess   Modeling power and energy consumption of dense matrix factorizations on multicore processors 

      Alonso-Jordá, Pedro; Dolz, Manuel F.; Mayo, Rafael; Quintana-Orti, Enrique S. Wiley (2013-10-11)
      In this paper, we propose a model for the energy consumption of the concurrent execution of three key dense matrix factorizations, with task parallelism leveraged via the Symmetric Multi-Processing Superscalar (SMPSs) ...
    • closedAccess   Modeling power and energy of the task-parallel Cholesky factorization on multicore processors 

      Alonso-Jordá, Pedro; Dolz, Manuel F.; Mayo, Rafael; Quintana-Orti, Enrique S. Springer Berlin Heidelberg (2014-05)
      In this paper we introduce a model for the total energy consumption of the Cholesky factorization on a multicore processor. Our model assumes a task-parallel execution of the factorization process, with concurrency leveraged ...
    • openAccess   Optimising Convolutions for Deep Learning Inference On ARM Cortex-M Processors 

      Maciá-Lillo, Antonio; Barrachina Mir, Sergio; Fabregat Llueca, German; Dolz, Manuel F. Institute of Electrical and Electronics Engineers Inc. (2024-04-30)
      We perform a series of optimisations on the convolution operator within the ARM CMSIS-NN library to improve the performance of deep learning tasks on Arduino development boards equipped with ARM Cortex-M4 and M7 microcontrollers. ...
    • closedAccess   Paving the way towards high-level parallel pattern interfaces for data stream processing 

      del Río Astorga, David; Dolz, Manuel F.; Fernández Muñoz, Javier; García, J. Daniel Elsevier (2018-10)
      The emergence of the Internet of Things (IoT) data stream applications has posed a number of new challenges to existing infrastructures, processing engines, and programming models. In this sense, high-level interfaces, ...
    • closedAccess   Performance modeling of the sparse matrix–vector product via convolutional neural networks 

      Barreda Vayá, Maria; Dolz, Manuel F.; Castaño Álvarez, María Asunción; Alonso-Jordá, Pedro; Quintana-Orti, Enrique S. Springer (2020-02-04)
      Modeling the execution time of the sparse matrix–vector multiplication (SpMV) on a current CPU architecture is especially complex due to (i) irregular memory accesses; (ii) indirect memory referencing; and (iii) low ...
    • openAccess   Performance–energy trade‑ofs of deep learning convolution algorithms on ARM processors 

      Dolz, Manuel F.; Barrachina Mir, Sergio; Martínez, Héctor; Castelló, Adrián; Maciá, Antonio; Fabregat Llueca, German; Tomás, Andrés E. Springer (2023)
      In this work, we assess the performance and energy efciency of high-performance codes for the convolution operator, based on the direct, explicit/implicit lowering and Winograd algorithms used for deep learning (DL) ...
    • openAccess   PyDTNN: A user-friendly and extensible framework for distributed deep learning 

      Barrachina Mir, Sergio; Castelló, Adrián; Catalán Carbó, Mar; Dolz, Manuel F.; Mestre Miravet, Jose Ignacio Springer (2021-02-22)
      We introduce a framework for training deep neural networks on clusters of computers with the following appealing properties: (1) It is developed in Python, exposing an amiable interface that provides an accessible entry ...
    • openAccess   Reformulating the direct convolution for high-performance deep learning inference on ARM processors 

      Barrachina Mir, Sergio; Castelló, Adrián; Dolz, Manuel F.; Low, Tze Meng; Martinez, Hector; Quintana-Orti, Enrique S.; Upasana, Sridhar; Tomás Domínguez, Andrés Enrique Elsevier (2022-12-20)
      We present two high-performance implementations of the convolution operator via the direct algorithm that outperform the so-called lowering approach based on the im2col transform plus the gemm kernel on an ARMv8-based ...
    • openAccess   Towards Automatic Parallelization of Stream Processing Applications 

      Dolz, Manuel F.; del Río Astorga, David; Fernández Muñoz, Javier; García, J. Daniel; Carretero, Jesús IEEE (2018-08)
      Parallelizing and optimizing codes for recent multi-/many-core processors have been recognized to be a complex task. For this reason, strategies to automatically transform sequential codes into parallel and discover ...
    • openAccess   Urban sound classifcation using neural networks on embedded FPGAs 

      BELLOCH, JOSE A.; Coronado, Raul; Valls, Oscar; Amor, Rocío del; Leon, German; Naranjo, Valery; Dolz, Manuel F.; Amor-Martin, Adrian; Piñero, Gema Springer (2024-03-01)
      Sound classifcation using neural networks has recently produced very accurate results. A large number of diferent applications use this type of sound classifers such as controlling and monitoring the type of activity in ...
    • openAccess   Urban sound classification using neural networks on embedded FPGAs 

      BELLOCH, JOSE A.; Coronado, Raul; Valls, Oscar; Amor, Rocío del; Leon, German; Naranjo, Valery; Dolz, Manuel F.; Amor-Martin, Adrian; Piñero, Gema Springer (2024-03-01)
      Sound classification using neural networks has recently produced very accurate results. A large number of different applications use this type of sound classifiers such as controlling and monitoring the type of activity ...
    • openAccess   Using machine learning to model the training scalability of convolutional neural networks on clusters of GPUs 

      Barrachina Mir, Sergio; Castelló, Adrián; Catalán Carbó, Mar; Dolz, Manuel F.; Mestre Miravet, Jose Ignacio Springer (2021-08-30)
      In this work, we build a general piece-wise model to analyze data-parallel (DP) training costs of convolutional neural networks (CNNs) on clusters of GPUs. This general model is based on i) multi-layer perceptrons (MLPs) ...