Archetypal analysis with missing data: see all samples by looking at a few based on extreme profiles

Epifanio, Irene; Ibáñez Gual, Maria Victoria; Simó, Amelia

dc.contributor.author	Epifanio, Irene
dc.contributor.author	Ibáñez Gual, Maria Victoria
dc.contributor.author	Simó, Amelia
dc.date.accessioned	2018-12-19T12:22:08Z
dc.date.available	2018-12-19T12:22:08Z
dc.date.issued	2019-05-13
dc.identifier.citation	Irene Epifanio, M. Victoria Ibáñez & Amelia Simó (2020) Archetypal Analysis With Missing Data: See All Samples by Looking at a Few Based on Extreme Profiles, The American Statistician, 74:2, 169-183, DOI: 10.1080/00031305.2018.1545700	ca_CA
dc.identifier.uri	http://hdl.handle.net/10234/178254
dc.description.abstract	In this paper we propose several methodologies for handling missing or incomplete data in Archetype analysis (AA) and Archetypoid analysis (ADA). AA seeks to find archetypes, which are convex combinations of data points, and to approximate the samples as mixtures of those archetypes. In ADA, the representative archetypal data belong to the sample, i.e. they are actual data points. With the proposed procedures, missing data are not discarded or previously filled by imputation and the theoretical properties regarding location of archetypes are guaranteed, unlike the previous approaches. The new procedures adapt the AA algorithm either by considering the missing values in the computation of the solution or by skipping them. In the first case, the solutions of previous approaches are modified in order to fulfill the theory and a new procedure is proposed, where the missing values are updated by the fitted values. In this second case, the procedure is based on the estimation of dissimilarities between samples and the projection of these dissimilarities in a new space, where AA or ADA is applied, and those results are used to provide a solution in the original space. A comparative analysis is carried out in a simulation study, with favorable results. The methodology is also applied to two real data sets: a well-known climate data set and a global development data set. We illustrate how these unsupervised methodologies allow complex data to be understood, even by non-experts.	ca_CA
dc.format.extent	40 p.	ca_CA
dc.format.mimetype	application/pdf	ca_CA
dc.language.iso	eng	ca_CA
dc.publisher	American Statistical Association	ca_CA
dc.publisher	Taylor & Francis	ca_CA
dc.rights	© 2018 Taylor & Francis.	ca_CA
dc.rights.uri	http://rightsstatements.org/vocab/InC/1.0/	*
dc.subject	incomplete data set	ca_CA
dc.subject	archetype analysis	ca_CA
dc.subject	multidimensional scaling	ca_CA
dc.subject	partial distance strategy	ca_CA
dc.title	Archetypal analysis with missing data: see all samples by looking at a few based on extreme profiles	ca_CA
dc.type	info:eu-repo/semantics/article	ca_CA
dc.identifier.doi	https://doi.org/10.1080/00031305.2018.1545700
dc.relation.projectID	Spanish Ministry of Ciencia, Innovacin y Universidades (AEI/FEDER, EU) (grant DPI2017-87333-R) ; Universitat Jaume I (UJI-B2017-13 ).	ca_CA
dc.rights.accessRights	info:eu-repo/semantics/openAccess	ca_CA
dc.relation.publisherVersion	https://amstat.tandfonline.com/doi/abs/10.1080/00031305.2018.1545700#.XBo1VM3ZCUk	ca_CA
dc.type.version	info:eu-repo/semantics/submittedVersion	ca_CA

Ficheros en el ítem

Nombre:: Epifanio_2018.pdf
Tamaño:: 711.7Kb
Formato:: PDF
Descripción:: versió preprint

Ver/Abrir

Este ítem aparece en la(s) siguiente(s) colección(ones)

IF_Articles [318]
MAT_Articles [765]
Articles de publicacions periòdiques

Mostrar el registro sencillo del ítem