Data Reduction Techniques in Classification Processes

Lozano Albalate, Maria Teresa

dc.contributor	Universitat Jaume I. Departament de Llenguatges i Sistemes Informàtics
dc.contributor.author	Lozano Albalate, Maria Teresa
dc.date.accessioned	2011-04-12T20:03:43Z
dc.date.accessioned	2024-05-13T12:17:26Z
dc.date.available	2007-09-18
dc.date.available	2024-05-13T12:17:26Z
dc.date.issued	2007-07-25
dc.date.submitted	2007-09-18
dc.identifier.isbn	9788469084823
dc.identifier.uri	http://www.tdx.cat/TDX-0918107-132936
dc.identifier.uri	http://hdl.handle.net/10803/10479
dc.description.abstract	The learning process consists of different steps: building a Training Set (TS), training the system, testing its behaviour and finally classifying unknown objects. When using a distance based rule as a classifier, i.e. 1-Nearest Neighbour (1-NN), the first step (building a training set) includes editing and condensing data. The main reason for that is that the rules based on distance need many time to classify each unlabelled sample, x, as each distance from x to each point in the training set should be calculated. So, the more reduced the training set, the shorter the time needed for each new classification process. This thesis is mainly focused on building a training set from some already given data, and specially on condensing it; however different classification techniques are also compared.<br/>The aim of any condensing technique is to obtain a reduced training set in order to spend as few time as possible in classification. All that without a significant loss in classification accuracy. Some<br/>new approaches to training set size reduction based on prototypes are presented. These schemes basically consist of defining a small number of prototypes that represent all the original instances. That includes those approaches that select among the already existing examples (selective condensing algorithms), and those which generate new representatives (adaptive condensing algorithms).<br/>Those new reduction techniques are experimentally compared to some traditional ones, for data represented in feature spaces. In order to test them, the classical 1-NN rule is here applied. However, other classifiers (fast classifiers) have been considered here, as linear and quadratic ones constructed in dissimilarity spaces based on prototypes, in order to realize how editing and condensing concepts work for this different family of classifiers.<br/>Although the goal of the algorithms proposed in this thesis is to obtain a strongly reduced set of representatives, the performance is empirically evaluated over eleven real data sets by comparing not only the reduction rate but also the classification accuracy with those of other condensing techniques. Therefore, the ultimate aim is not only to find a strongly reduced set, but also a balanced one.<br/>Several ways to solve the same problem could be found. So, in the case of using a rule based on distance as a classifier, not only the option of reducing the training set can be afford. A different family of approaches consists of applying several searching methods. Therefore, results obtained by the use of the algorithms here presented are compared in terms of classification accuracy and time, to several efficient search techniques.<br/>Finally, the main contributions of this PhD report could be briefly summarised in four principal points. Firstly, two selective algorithms based on the idea of surrounding neighbourhood. They obtain better results than other algorithms presented here, as well as better than other traditional schemes. Secondly, a generative approach based on mixtures of Gaussians. It presents better results in classification accuracy and size reduction than traditional adaptive algorithms, and similar to those of the LVQ. Thirdly, it is shown that classification rules other than the 1-NN can be used, even leading to better results. And finally, it is deduced from the experiments carried on, that with some databases (as the ones used here) the approaches here presented execute the classification processes in less time that the efficient search techniques.	eng
dc.format.mimetype	application/pdf
dc.language.iso	eng
dc.publisher	Universitat Jaume I
dc.source	TDX (Tesis Doctorals en Xarxa)
dc.subject	NCN
dc.subject	dissimilarity
dc.subject	condensing
dc.subject	reduction
dc.subject	mixtures of Gaussians
dc.subject	sourrounding neighbourhood
dc.subject	Gaussians
dc.subject	NN
dc.subject.other	Llenguatges i Sistemes Informàtics
dc.title	Data Reduction Techniques in Classification Processes
dc.type	info:eu-repo/semantics/doctoralThesis
dc.type	info:eu-repo/semantics/publishedVersion
dc.subject.udc	004	cat
dc.contributor.director	Sánchez Garreta, José Salvador
dc.contributor.director	Pla Bañón, Filiberto
dc.rights.license	ADVERTIMENT. L'accés als continguts d'aquesta tesi doctoral i la seva utilització ha de respectar els drets de la persona autora. Pot ser utilitzada per a consulta o estudi personal, així com en activitats o materials d'investigació i docència en els termes establerts a l'art. 32 del Text Refós de la Llei de Propietat Intel·lectual (RDL 1/1996). Per altres utilitzacions es requereix l'autorització prèvia i expressa de la persona autora. En qualsevol cas, en la utilització dels seus continguts caldrà indicar de forma clara el nom i cognoms de la persona autora i el títol de la tesi doctoral. No s'autoritza la seva reproducció o altres formes d'explotació efectuades amb finalitats de lucre ni la seva comunicació pública des d'un lloc aliè al servei TDX. Tampoc s'autoritza la presentació del seu contingut en una finestra o marc aliè a TDX (framing). Aquesta reserva de drets afecta tant als continguts de la tesi com als seus resums i índexs.
dc.rights.accessLevel	info:eu-repo/semantics/openAccess
dc.local.notes	sanchez@uji.es, pla@uji.es

Ficheros en el ítem

Ficheros	Tamaño	Formato	Ver
No hay ficheros asociados a este ítem.

Este ítem aparece en la(s) siguiente(s) colección(ones)

Programa de Doctorat en Informàtica [94]

Mostrar el registro sencillo del ítem