Financial distress prediction using the hybrid associative memory with translation

This paper presents an alternative technique for ﬁnancial distress prediction systems. The method is based on a type of neural network, which is called hybrid associative memory with translation. While many different neural network architectures have successfully been used to predict credit risk and corporate failure, the power of associative memories for ﬁnancial decision-making has not been explored in any depth as yet. The performance of the hybrid associative memory with translation is compared to four traditional neural networks, a support vector machine and a logistic regression model in terms of their prediction capabilities. The experimental results over nine real-life data sets show that the associative memory here proposed constitutes an appropriate solution for bankruptcy and credit risk prediction, performing signiﬁcantly better than the rest of models under class imbalance and data overlapping conditions in terms of the true positive rate and the geometric mean of true positive and true negative rates.


Introduction
A large number of techniques have been developed to help decision-makers and analysts in predicting financial distress. Traditionally, decisions on credit risk of a corporate borrower were exclusively based upon subjective judgments made by human experts, using past experiences and some guiding principles [69]. However, two major problems with this approach are the difficulty to make consistent estimates and the fact that it tends to be reactive rather than predictive. The world financial crisis has led to increasing attention of banks and financial institutions on this question because of its significant impact on the decisions made [14], resulting in the development of numerous techniques to face the important challenge of credit risk and bankruptcy prediction from financial ratios using mathematical models. From the pioneer work by Altman [7], based on multivariate discriminant analysis, a variety of statistical and operations research methods have subsequently been applied to credit risk and bankruptcy prediction, including linear and logistic regression, multivariate adaptive regression splines, survival analysis, linear and quadratic programming, and multiple criteria programming. Most of these techniques typically rely on the assumptions of linear separability, multivariate normality and independence of the predictive variables, but they are very often violated in real-life problems [25,34,55].
Popular computational intelligence tools such as decision trees, neural networks, support vector machines, fuzzy systems, rough sets, artificial immune systems, and evolutionary algorithms are techniques that can deal with non-linearity. Besides, these methods are highly capable of extracting meaningful information from imprecise data and detecting trends that are too complex to be discovered by either humans or conventional systems. Despite various studies have concluded that no technique is clearly superior to other competing algorithms because it depends on the characteristics of the problem analyzed [13,15,16], different neural network architectures have shown good performance in comparison to other methods for a range of financial applications [10,19,48,53,78]. However, when the number of examples is relatively small, several works have demonstrated that the accuracy and generalization performance of a support vector machine (SVM) is usually better than that of statistical and other soft computing techniques [23,24,65,67]. While typical neural networks used in this context are the multi-layer perceptron (MLP), the radial basis function (RBF) and the probabilistic or Bayesian network (BN), other neural models such as the associative memories have not been explored as yet.
The ability of human brain to make associations from partial information has historically attracted great interest among researchers, leading to a variety of theoretical neural networks that act as associative memories. An associative memory [39] is an early type of artificial neural network that relates an input vector x with an output vector y. The functionality of associative memories is reached in two phases: learning and recall. The learning process consists of building a connection matrix W with a value for each association (x k , y k ). In the recall phase, an output vector y, which corresponds to the most similar to the input vector x, is obtained from the associative memory. These models are powerful computational tools due to their conceptual and implementational simplicity, their strong mathematical foundation, and their capability of storing huge amounts of data that allow to properly recover the most similar patterns to an input vector with low computational efforts [77].
Representative examples of associative memories are lernmatrix [66], the linear associator [8,38], the Moore-Penrose generalized inverse associative memory [40], the Hopfield network [28], the bidirectional associative memory [41], the fuzzy associative memory [42], the morphological associative memory [58], and the alpha-beta associative memory [2]. Some of these models have been used to solve very different problems. Sabourin and Mitiche [59] developed a Kohonen associative memory with selective multiresolution for OCR. A fuzzy associative memory was introduced to determine rock types from well-log signatures [17]. The bidirectional associative memory networks were used to find the relations between various cancers and elemental contents in serum samples with the aim of diagnosing cancer [81]. A hybrid classifier based on self-organizing maps and associative memories was designed for speaker recognition [31]. Zhang et al. [79] proposed a modular face recognition scheme by combining the wavelet subband representations and kernel associative memories. An associative memory based on the restricted Coulomb energy was also applied to human face recognition [49]. Namba and Zhang [50] devised an associative memory to recognize Braille images. A novel system for medical diagnosis based on associative memories was proposed by Aldape-Pérez et al. [5]. Itkar and Kulkarni [32] developed an efficient algorithm for mining frequent patterns using an auto-associative memory.
Apart from the associative memories just mentioned, Santiago-Montero [63] introduced the hybrid associative classifier and its extension, the hybrid associative classifier with translation (HACT). Both these associative memories are based on the learning phase of the linear associator and the recall phase of the Steinbuch's lernmatrix. This paper applies the HACT neural network to decision making problems for financial distress prediction and presents an empirical comparison with other popular prediction methods. To the best of our knowledge, this model has not been used for classification purposes, and even less in the context of finance and management. The aim of this paper therefore is four-fold: 1. To explore the capability of the HACT model in the prediction of bankruptcy and credit risk; 2. to analyze the behavior of this neural network under the presence of imbalance in class distribution, which constitutes a data complexity often neglected in financial applications; 3. to investigate how the class overlapping affects the performance of the asso-ciative memory; and 4. to compare the performance of HACT with that of other prediction techniques.
From now on, the paper is organized as follows. Section 2 provides a review of works related to neural networks used for corporate bankruptcy and credit risk prediction. Section 3 introduces the fundamental concepts of the associative memories and describes the bases of the HACT model. The experimental set-up and databases are given in Section 4, while the results are discussed in Section 5. Finally, Section 6 presents the concluding remarks and outlines some directions for future research.

A review of neural networks applied to financial distress prediction
From the beginning of the 1990's, the development of artificial neural network technologies for bankruptcy and credit risk prediction problems has been the subject of considerable attention and research efforts. The first reference to using neural networks can be found in the paper by Odom and Sharda [51], showing that a three-layer feed-forward perceptron is more accurate and robust than multi-variate discriminant analysis. After this seminal work, many other studies have proposed the use of neural networks in credit scoring, bankruptcy or business failure prediction. For instance, Tam and Kiang [68] compared neural network models to linear discriminant analysis, logistic regression, nearest neighbors and decision tree for evaluating bank status. Salchenberger et al. [61] reported that the neural networks produced fewer or equal number of total errors, type-I errors and type-II errors compared to the logit model. Lacher et al. [43] investigated the use of the Cascade-Correlation neural network architecture and compared its performance with that of the multivariate discriminant analysis approach. Chang et al. [18] applied the theory and numerical algorithms of the BN to risk scoring and compared the results with traditional methods for computing scores and posterior predictions of performance variables.
Desai et al. [22] concluded that the MLP and the modular neural network can be especially useful to correctly predict the bad loans, but logistic regression models are comparable to the neural networks when the performance is measured by the percentage of good and bad loans correctly classified. West [74] analyzed the credit scoring accuracy of the MLP, the RBF network and several statistical techniques, suggesting that the MLP may not be the most accurate neural network model. An auto-associative memory trained with only data of non-bankrupt firms was developed by Baek and Cho [11]. Baesens et al. [12] used Markov Chain Monte Carlo search to learn unrestricted Bayesian network classifiers for credit scoring, which gave a very good performance in terms of accuracy and area under the ROC curve. Also, Leong [46] showed that the BN performs well against logistic regression and MLP particularly with class imbalance, higher dimensions and a rejection sample, and it can be scaled efficiently when implemented onto a large data set.
The power of probabilistic and MLP neural networks was compared to that of discriminant analysis, probit analysis and logistic regression to evaluate credit risk in Egyptian banks [1]. Khashman [36] explored various back-propagation learning schemes to train three models (each with a different number of hidden neurons) of a three-layer supervised neural network. Angelini et al. [9] developed two neural network systems with a four-layer feed-forward topology, proving their applicability to credit risk prediction. An algorithm based on the threshold accepting meta-heuristic to train the principal component neural network architecture was investigated by Ravi and Pramodh [56], who inferred that their proposal outperformed other classifiers.
Lee and Chen [44] developed a credit scoring system using a hybrid modeling procedure with artificial neural networks whose input nodes were the variables obtained by multivariate adaptive regression splines. Hsieh [29] designed a credit scoring model that employed the SOM and K-means clustering algorithms to obtain the best inputs to a feed-forward MLP. Similarly, Lee et al. [45] explored the performance of credit scoring by integrating the linear discriminant analysis approach into a three-layer back-propagation neural network, revealing that the proposed hybrid approach converges much faster than the conventional neural network model and outperforms the discriminant analysis and logistic regression approaches. Cheng et al. [20] adopted an RBF to construct the financial prediction model and then carried out a logit analysis on the groups of similar firms present in the hidden layer of the network.
Chuang and Lin [21] proposed a reassigning credit scoring model involving two-stages: the classification stage builds a neural network-based credit scoring model, which classifies applicants with good or bad credits; then the reassign stage tries to reduce the type-I error by reassigning the rejected good credit applicants to the conditional accepted class by using a case-based reasoning classification technique. Khashei et al. [35] employed basic concepts of fuzzy logic and MLP neural networks to implement a hybrid binary credit risk prediction model, where fuzzy numbers were used so that the uncertainties and complexities in financial data sets can be better modeled. The emotional neural networks were successfully applied to credit scoring and evaluation [37], showing higher accuracy and lower computing time than the conventional neural models based on the back-propagation learning algorithm.
Another research direction refers to the application of ensembles of neural networks to credit scoring and bankruptcy prediction problems. For instance, West et al. [75] investigated bagging and boosting with the MLP as the base classifier and found that the ensembles were superior to the single best model in most cases. Conversely, Tsai and Wu [72] concluded that the single best neural network is more suitable than multiple neural network classifiers. Yu et al. [78] employed bagging with three-layer back-propagation neural networks. Hung and Chen [30] proposed a selective ensemble of three classifiers (decision tree, back-propagation neural network and support vector machine) integrated with the concept of the expected probability, showing that it performs better than other stacking ensembles using the weighting or voting strategies. Tsai et al. [70] carried out an extensive comparison of ensembles using MLP, support vector machines and decision trees as the base classifiers for bagging and boosting, suggesting that boosting with decision trees perform significantly better than the other ensembles.
Apart from the many proposals for using different neural network architectures, we can find numerous comprehensive reviews and comparative studies related to credit risk and bankruptcy prediction where the neural network approaches have played an important role. Zhang et al. [80] reviewed the application of neural networks to predict corporate distress and studied their robustness in terms of sampling variability. Atiya [10] gave a thorough survey on the problem of bankruptcy prediction using neural networks and claimed the superiority of these models over other techniques. Verikas et al. [73] presented a comprehensive review of hybrid and ensemble-based soft computing techniques applied to bankruptcy prediction. One of the most complete benchmarking studies of classification algorithms in credit scoring is the one by Baesens et al. [13], which has further been extended with novel learning methods, performance measures and techniques to reliably compare different classifiers [47]. It is also interesting to highlight the paper by Brown and Mues [16], in which an exhaustive comparison of prediction techniques for imbalanced credit scoring data sets was presented.

Hybrid associative classifier with translation
In its most general form, an associative memory is a content-addressable neural network based on matrix algebra [39,57] that maps input patterns (examples) to output patterns by using the p different associated pattern pairs (x k , y k ) stored during the learning phase. The associative memory takes the form of a connection weight matrix W = [w i,j ] m×n generated from a finite set of p encoded associations, called fundamental set of associations, {(x µ , y µ ) | µ = 1, 2, . . . , p}, where x µ ∈ R n are the fundamental input patterns of dimension n and y µ ∈ R m are the the fundamental output patterns of dimension m. Then, x µ j and y µ i denote the j-th component of an input pattern x µ and the i-th component of an output pattern y µ , respectively.
The associative memories can be of two types depending on the retrieved pattern: hetero-associative (e.g., lernmatrix and linear associator) and auto-associative (e.g., Hopfield network). A hetero-associative memory relates input patterns with output patterns of distinct nature and formats (x µ ̸ = y µ ), while an auto-associative memory is a particular case where x µ = y µ and n = m.
The HACT neural network is an associative memory that merges the learning (or encoding) phase of the linear associator with the recall (or decoding) phase of the Steinbuch's lernmatrix to exploit their strengths and improve the performance of the classifier. Basically the main advantages of HACT over its predecessors are: (i) the HACT model allows to operate with real-valued input patterns, while the lernmatrix only supports the binary values 0 and 1; and (ii) the input vectors are not required to be orthonormal, unlike those in the linear associator.
At this point it is worth stressing that the HACT approach cannot be deemed as a hybrid or combining prediction model in the sense of traditional hybridization. In general, hybrid learning methods are understood as systems that combine two or more different techniques in order to benefit from the synergistic effect between the individual components [71,76]. For instance, a hybrid prediction model may consist of one unsupervised learner to pre-process the training data into homogeneous clusters and one supervised algorithm to build the classifier from the clustering result [29,71], or it may use a feature selection strategy to choose the most relevant explanatory variables and then these are employed to design the predictor [44,45,56], or even it may be built from different cascading predictors in order to build an ensemble of classifiers [20,21,35]. However, as already described previously, the HACT model simply makes use of the fundamental ideas of two types of associative memories (one for the learning phase and the other for the recall phase), but there is not hybridization between them.

Learning phase
The learning phase of the HACT model, which is based on that of the linear associator, consists of constructing a matrix W such that when an input pattern x µ is presented, the stored pattern y µ associated with the input pattern is retrieved. This process comprises two basic steps: 1. For each association (x µ , y µ ) in the fundamental set, compute the outer product y µ (x µ ) T , where (x µ ) T is the transpose of the input vector x µ .
2. Sum the p outer products to obtain the matrix W = α ∑ p µ=1 y µ (x µ ) T , where α is the normalizing constant (usually set to 1/p).
Unlike the standard hybrid associative memory, the HACT model incorporates an initial step in the learning phase, which consists of a translation of the coordinate axes to a new origin located at the centroid of the fundamental input patterns. The aim of shifting the fundamental set is the representation of the fundamental input patterns in a new n-dimensional space where patterns belonging to two different classes are located diametrically opposite to each other and the midpoint of the diameter is defined by the mean vector x. This should allow for a better classification result because patterns of different classes will presumably be grouped quite far apart in different quadrants [63].
Let A = {x 1 , x 2 , . . . , x p } be a set of n-dimensional fundamental input patterns that belong to m classes, and letÂ = {x 1 ,x 2 , . . . ,x p } be the corresponding set of fundamental input patterns that have been translated to the new origin x. The implementation of the learning phase of HACT to construct the connection weight matrix W is described in Algorithm 1.

Recall phase
Assume that the matrix W has been constructed by using Algorithm 1, then the classification of a new input pattern x will consist of two steps: (i) the application of the same translation to x as that used in the learning phase (step 7) in order to obtainx, and (ii) the use of the recall phase of lernmatrix in order to assign x to a class.
The recall or decoding phase consists of determining the components of the vector y µ associated to a given input pattern x µ . The i-th component y µ i of the class vector y µ is calculated by means of the following bipolar output function: If a given input vectorx µ is assigned to class k, this expression leads to an m-dimensional output vector y µ with its k-th component equal to 1 (y µ k = 1) and all the remaining components equal to zero (y µ j = 0 for j = 1, 2, . . . , k − 1, k + 1, . . . , m).
It is worth pointing out that, despite an input pattern may contain errors and noise, the HACT model will still be able to retrieve the closest stored output pattern. Like most associative memories in general [52], the HACT network also shows a robust and fault tolerant behavior, which means that noise or errors will cause a certain decrease in generalization performance rather than a total degradation of the classifier effectiveness.

Experimental set-up
Nine data sets related to bankruptcy/creditworthiness have been employed in order to make a comprehensive comparison of the HACT model with four wellknown neural networks (MLP, RBF, BN and the voted perceptron, VP), whose architectures and parameter settings are reported in Table 1. In addition, an SVM with a linear kernel (widely acknowledged as one of the best soft computing techniques) and the logit model (a classical econometric method) have also been included in this study. Note that, except for the HACT technique, the WEKA [27] and KEEL [3,4] data mining and knowledge discovery suites have been used to conduct our experiments. Table 2 summarizes the main characteristics of the experimental databases, including the amount of explanatory variables with the number of categorical variables given in brackets, the percentage of default cases (a measure of imbalance in class distribution) and the Fisher's discriminant ratio (F1) whose value determines how well the two classes are separated from each other (the higher the value of F1, the easier the classification problem).
The Australian, German and Japanese data sets are from the UCI Machine Learning Database Repository (http://archive.ics.uci.edu/ml/). The Iranian data set comes from a modification to a corporate client database of a small private bank in Iran [60]. The Polish data set contains bankruptcy information of 120 companies recorded over a two-year period [54]. The SabiSPQ database [6] contains business information of 1180 companies whose accounts are placed on the Spanish Mercantile Registry. The Thomas data set [69] describes applicants for a credit product. The UCSD data set is a reduced version of a database used in the 2007 Data Mining Contest. The USA database [64] consists of the accounting statements from 8293 banks recorded by the Federal Deposit Insurance Corporation. The common way to assess the performance of financial distress prediction systems when databases are small or medium sized corresponds to k-fold crossvalidation [26]. Accordingly, a 10-fold cross-validation has been adopted for the experiments: each original data set has been randomly divided into ten stratified parts of equal (or approximately equal) size. For each fold, nine blocks have been pooled as the training data, and the remaining part has been employed as an independent test set. Stratification has been used to preserve the class proportions of the whole data set into each one of the subsets obtained by the sampling method, thus reducing the prior probability of data set shift and the variance in the estimation process [62]. The results from classifying the test samples using the training sets have been averaged across the ten runs and then evaluated for significant differences between models by means of statistical tests.

Evaluation scores
Standard performance evaluation scores in many financial applications as is the case of bankruptcy and credit risk prediction are usually calculated from a 2 × 2 confusion matrix (see Table 3), where each entry (i, j) contains the number of correct/incorrect predictions. Most prediction systems often employ the accuracy as the criterion for performance evaluation. It represents the proportion of the correctly predicted cases (positive and negative), Acc = (T P +T N )/(T P +F N +T N +F P ), but it is strongly biased towards the majority class when data are skewed [33]. As bankruptcy and credit risk databases are commonly imbalanced, alternative measures have been used with the aim of obtaining a trade-off between the performance evaluation on both classes. A well-known example is the geometric mean of the true positive and true negative rates: where T P R = T P/(T P +F N ) is the true positive rate (the percentage of defaulters that are correctly predicted) and T N R = T N/(T N + F P ) is the true negative rate (percentage of non-defaulters that are correctly classified).
Overall performance scores such as accuracy and geometric mean help decisionmakers to compare firms or borrowers against each other, but neglecting the cost of different error types. Thus two of the most popular measures to predict corporate or credit default are the true positive and true negative rates, which concentrate only on a part of the data and therefore, exhibit performance results on each class separately. This is especially important for this kind of financial applications because of the different misclassification costs associated to false positives and false negatives: the cost of predicting a defaulter as non-defaulter is generally much higher than the expected cost of false positives (non-defaulters classified as defaulters). Table 4 reports the true positive rate averaged across the 10 runs for each database, the average values across all the databases and the Friedman's average rank for each neural network approach (the one with the lowest average rank has to be deemed as the best solution). The values for the best performing method in each database are underlined. Based on the Friedman's average ranks, the results reveal that the HACT model corresponds to the algorithm with the best performance, followed by MLP and the logit method. What is more interesting, however, is that the associative memory has been the best in 6 out of the 9 databases, demonstrating the benefits of applying this technique to the prediction of financial distress. In order to make easier the comparison of the neural networks in terms of TPR, Figure 1 contains a spider plot showing the relationship between the true positive rate and the complexity of the problem: each radius represents a database and each star corresponds to a model. The databases have been placed in clockwise order from the most to the least complex problem based on the value of the percentage of default cases (a) and F1 (b). The performance of a model on a database is calculated as the distance from the center and therefore, a higher area corresponds to a better algorithm in terms of TPR. As can be seen, HACT has performed significantly better than the other techniques especially with the most complex databases, while there have not been differences when applied to the "easiest" problems.  Table 5 includes the true negative rates and the Friedman's average rank for each model. Again the best values in each database are underlined. It is interesting to compare these results with those in Table 4 with the purpose of better understanding the behavior of each method in function of class imbalance and overlapping. For instance, the only case where HACT has performed the worst in terms of TPR is the SabiSPQ database, which corresponds to a non-complex problem with perfectly balanced (50% of default cases) and well-separated classes (F1 = 3.19). Here logistic regression appears to be the best performing algorithm (TPR = 99.15, TNR = 99.49) since differences in true negative rates are not significant at all. Quite the contrary, when the classes are strongly imbalanced (Iranian, USA) and/or overlapped (German, Iranian, Thomas), the associative memory has made significantly better predictions on the default cases than any other method, as reflected by the high differences in the true positive rates. In these cases, the other techniques have in general achieved very high true negative rates (correct predictions on non-defaulters), but failed in the classification of the minority class. Three representative examples of this situation are the BN, RBF and SVM approaches applied to the Iranian database (5% of default samples and F1 = 0.36), which have reached a perfect classification of non-default cases (TNR ≃ 100) while mispredicting the class of all default patterns (TPR = 0). Similar comments can be made with regard to the Thomas database (26.4% of defaulters and F1 = 0.18) where MLP, BN, RBF, VP, SVM and logit have achieved high true negative rates (94% -100%) and very low true positive rates (0% -19%), whereas the associative model has shown a significant trade-off between both performance rates.

Experimental results and discussion
With the aim of checking whether or not the TPR results are significantly dif- ferent, the Iman-Davenport's statistic has been computed. This is distributed according to an F -distribution with K − 1 and (K − 1)(N − 1) degrees of freedom, where K denotes the number of models and N is the total number of data sets.
The p-value computed by F (6, 48) was 0.002955595039, which is less than a significance level of α = 0.05. Therefore, the null-hypothesis that all the prediction techniques perform equally well can be rejected. As the Iman-Davenport's statistic only allows to figure out differences among all methods, we have also carried on with the Holm's and Li's post hoc tests using the HACT model (the one with the lowest Friedman's rank) as the control algorithm. Values in Table 6 show that the Holm's procedure rejects the null-hypothesis of equivalence for those methods that have an unadjusted p-value ≤ 0.0125, proving that HACT has been significantly better than RBF and VP at a significance level of α = 0.05. On the other hand, the Li's post hoc test rejects those hypotheses that have an unadjusted p-value ≤ 0.047279, indicating that HACT has performed significantly better than RBF, VP and BN. Therefore, it is possible to conclude that the associative memory is statistically equivalent to SVM, logit and MLP, while significantly better than the remainder of the neural networks in terms of the true positive rate.
For each database, Figure 2 displays all the prediction models in the space spanned by the true positive rate on the x-axis and the true negative rate on the y-axis. A method with perfect prediction will be located on the upper right corner (100% TPR, 100% TNR) of the plot. Therefore the closer the classifier is to the upper right corner, the higher the performance on both classes. However, in financial distress applications it is preferable not to miss a defaulter rather than a non-defaulter, which means that it is more important to maximize TPR (points close to the right side) than to maximize TNR (points close to the upper side). In line with our previous findings, one can observe in Figure 2 that the associative memory mostly lies the closest to the right side of the chart and in general, it is not too far from the other models in terms of TNR. Very remarkable cases are the Iranian and Thomas databases where all techniques except HACT are very close to the upper left corner of the plot, which reveals that they have misclassified almost all the default patterns. Note that the classes in these both databases are strongly imbalanced and highly overlapped. A similar behavior can be viewed on the German, UCSD and USA databases, which correspond to data with high/moderate imbalance and/or high overlapping between classes. The averaged geometric means of the performance on both classes in Table 7 show that the HACT method has achieved the highest balanced trade-off between the true positive and true negative rates. The associative memory has been the best in 4 out of the 9 databases, and it is still very close to the VP algorithm on the Japanese data. The fact that the probabilistic neural network, the RBF model and the SVM show a geometric mean of 0 on the very strongly skewed Iranian database is because these methods have obtained a TPR = 0 (i.e., they have predicted all cases as non-defaulters), as already discussed with the results in Tables 4 and 5.
The low Friedman's average rank of HACT in Table 7 corroborates that this neural model is also one of the best performing methods in terms of an overall score. In this case, however, logistic regression shows the lowest Friedman's average rank, while the MLP and BN models are very close to HACT. Finally, the RBF, SVM and VP approaches have been the techniques with the poorest performances, as already seen with the Friedman's ranks calculated on the true positive rate.

Conclusions and future work
From the first works in the beginning of the 1990's, the artificial neural networks emerged as an effective method for bankruptcy and credit risk prediction. They differ from classical financial prediction systems, such as the models based on statistical techniques, mainly in their black-box nature and in the assumption of a non-linear relation among variables. In this paper, the hybrid associative memory with translation has been explored and compared to other well-known neural models (MLP, RBF, BN and VP), a generally well-performing soft computing technique (SVM) and a common econometric model (logit).
The experimental results over nine real-life financial databases suggest that the associative memories can be an appropriate approach to prediction of financial distress, especially in the case of databases where the classes are strongly imbalanced and/or overlapped. The HACT neural network has obtained the highest true positive rates, which means that this model predicts the default cases better than the remainder of the methods here analyzed. When evaluated with the geometric mean of both rates, HACT has still shown to be an appropriate solution, revealing that the degradation of the true negative rate has not been abrupt.
There are several very interesting open research questions to work on. For instance, a more thorough and extensive analysis on the HACT model with a large pool of prediction techniques such as instance-based learners and decision trees deserves further consideration. Another avenue for future research is to study the performance of other well-known associative memories (e.g., the Hopfield network and the bidirectional associative memory) when applied to the prediction of financial distress. Other issues include to investigate the potential of ensembles of associative memories, both using these as the only base classifier and also together with other prediction models.