Listar ICC_Articles por autoría "5d6aa414-f441-49a6-ab1f-010e79dec6a0"
Mostrando ítems 1-6 de 6
-
Compressed basis GMRES on high-performance graphics processing units
Aliaga Estellés, José Ignacio; Anzt, Hartwig; Tomás Domínguez, Andrés Enrique; Quintana-Orti, Enrique S.; Grützmacher, Thomas Sage (2022-08-05)Krylov methods provide a fast and highly parallel numerical tool for the iterative solution of many large-scale sparse linear systems. To a large extent, the performance of practical realizations of these methods is ... -
Compression and load balancing for efficient sparse matrix-vector product on multicore processors and graphics processing units
Aliaga Estellés, José Ignacio; Anzt, Hartwig; Grützmacher, Thomas; Quintana-Orti, Enrique S.; Tomás Domínguez, Andrés Enrique John Wiley and Sons (2021)We contribute to the optimization of the sparse matrix-vector product by introducing a variant of the coordinate sparse matrix format that balances the workload distribution and compresses both the indexing arrays and the ... -
High performance and energy efficient inference for deep learning on multicore ARM processors using general optimization techniques and BLIS
Castelló, Adrián; Barrachina Mir, Sergio; Dolz, Manuel F.; Quintana-Orti, Enrique S.; San Juan, Pau; Tomás Domínguez, Andrés Enrique Elsevier (2022-03-22)We evolve PyDTNN, a framework for distributed parallel training of Deep Neural Networks (DNNs), into an efficient inference tool for convolutional neural networks. Our optimization process on multicore ARM processors ... -
Look-ahead in the two-sided reduction to compact band forms for symmetric eigenvalue problems and the SVD
Rodríguez Sánchez, Rafael; Catalán, Sandra; Herrero, José R.; Quintana-Orti, Enrique S.; Tomás Domínguez, Andrés Enrique Springer Verlag (2019)We address the reduction to compact band forms, via unitary similarity transformations, for the solution of symmetric eigenvalue problems and the computation of the singular value decomposition (SVD). Concretely, in the ... -
Reformulating the direct convolution for high-performance deep learning inference on ARM processors
Barrachina Mir, Sergio; Castelló, Adrián; Dolz, Manuel F.; Low, Tze Meng; Martinez, Hector; Quintana-Orti, Enrique S.; Upasana, Sridhar; Tomás Domínguez, Andrés Enrique Elsevier (2022-12-20)We present two high-performance implementations of the convolution operator via the direct algorithm that outperform the so-called lowering approach based on the im2col transform plus the gemm kernel on an ARMv8-based ... -
Tall-and-skinny QR factorization with approximate Householder reflectors on graphics processors
Tomás Domínguez, Andrés Enrique; Quintana-Orti, Enrique S. Springer (2020-01-24)We present a novel method for the QR factorization of large tall-and-skinny matrices that introduces an approximation technique for computing the Householder vectors. This approach is very competitive on a hybrid platform ...