Hyperspectral Unmixing Based on Dual-Depth Sparse Probabilistic Latent Semantic Analysis

This paper presents a novel approach for spectral unmixing of remotely sensed hyperspectral data. It exploits probabilistic latent topics in order to take advantage of the semantics pervading the latent topic space when identifying spectral signatures and estimating fractional abundances from hyperspectral images. Despite the contrasted potential of topic models to uncover image semantics, they have been merely used in hyperspectral unmixing as a straightforward data decomposition process. This limits their actual capabilities to provide semantic representations of the spectral data. The proposed model, called dual-depth sparse probabilistic latent semantic analysis (DEpLSA), makes use of two different levels of topics to exploit the semantic patterns extracted from the initial spectral space in order to relieve the ill-posed nature of the unmixing problem. In other words, DEpLSA defines a first level of deep topics to capture the semantic representations of the spectra, and a second level of restricted topics to estimate endmembers and abundances over this semantic space. An experimental comparison in conducted using the two standard topic models and the seven state-of-the-art unmixing methods available in the literature. Our experiments, conducted using four different hyperspectral images, reveal that the proposed approach is able to provide competitive advantages over available unmixing approaches.

present in the pixel. In general, the HU process has proven to be an excellent tool to uncover subpixel information from a hyperspectral imagery, because endmembers usually correspond to the materials appearing in the scene and, consequently, abundance maps often provide useful information to relieve limited spatial resolutions [4].

A. Brief HU Overview
In the literature, it is possible to find two main trends depending on the characterization scale of the HU process [5]: 1) linear and 2) nonlinear models. On the one hand the linear HU paradigm assumes that incident light interacts just with one material at a macroscopic scale; on the other hand, the nonlinear model takes into account more complex interactions among the solar radiation scattered by multiple materials in the scene. Despite the potential of nonlinear models, many works in the remote sensing field are focused on the linear approach, because common hyperspectral remote sensing sensors usually have a rather limited spatial resolution, what makes reasonable to assume that the mixing process occurs within the instrument itself [6].
Broadly speaking, HU methods can be categorized into three different groups [7]: 1) geometrical; 2) statistical; and 3) sparse regression-based methods. Geometrical methods assume that the endmembers of a hyperspectral image define a simplex of minimum volume enclosing the data set; therefore, the geometry of convex sets can be exploited to identify the simplex vertices [8], [9]. Despite their high computational efficiency, geometrical models tend not to capture highly mixed spectral signatures, because simplex facets are often ill-defined with the absence of pure pixels in the scene. In this case, both the statistical and regression-based methods provide a more powerful scheme to deal with the HU problem while accounting for an endmember variability.
Statistical algorithms make use of a probabilistic framework to infer endmember and abundance parameters as probability distributions that, precisely, are aimed at modeling the data variability. One of the most relevant works was presented by Nascimento and Bioucas-Dias [10], where abundance fractions are defined as mixtures of Dirichlet densities. In other works, the endmember variability is modeled using other kinds of distributions, for instance, the Gaussian distribution considered in [11].
Regarding sparse regression approaches, these models formulate the unmixing task as a linear regression problem over a 0196-2892 © 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
given spectral library [12]. That is, they use a semisupervised procedure to express the input image as a combination of spectral signatures that are known in advance. Additionally, they usually include some sort of sparsity regularization constraint to refine and alleviate the computational cost of the regression process.

B. Current Limitations and Trends
Each one of the aforementioned methodologies has shown to be effective under specific conditions. Geometric models are able to produce better endmember estimates when the pure pixel assumption is fulfilled in the scene [7], whereas the statistical and regression-based methods tend to obtain a better result in highly mixed scenarios. Nonetheless, statistical models usually lead to computationally demanding algorithms, but they have the advantage of not requiring the availability of a suitable spectral library.
Some recent research lines try to relieve the ill-posed nature of the unmixing problem by taking advantage of so-called semantic representations [13], [14], that is, modeling the structural patterns of the spectrum domain. On the other hand, the traditional hyperspectral image characterization scheme relies on directly using the low-level features captured by the spectral bands (e.g., reflectance values), and the semantic representation approach pursues to provide a higher level image characterization in which pixels are represented according to the input image spectral patterns. In other words, each pixel is managed as a composition of hyperspectral patterns instead of a collection of raw values. The rationale behind this methodology is based on the fact that spectral patterns can be useful to identify discriminative features in the spectra and, therefore, they can help to reduce the uncertainty when unmixing pixels.
These semantic patterns are often defined in a supervised [15] or semisupervised form [14] with the collaboration of expert users, who make manual associations between the chemical makeup of materials and the shape of absorption bands in spectral signals. However, when it comes to unsupervised learning, these patterns usually rely on a regular clustering process [16]. The high complexity of hyperspectral images makes this straightforward approach unable to capture complex spectral relationships, and this eventually limits the semantic power in HU. As a result, an additional study is required to improve the unmixing techniques under the unsupervised semantic representation research line.

C. Topic Models as Semantic Representations
During the past years, topic models have shown their potential to effectively cope with many kinds of tasks by providing data with a higher level of semantic understanding [17]. Text categorization [18], vocabulary reduction [19], image segmentation [20], object recognition [21], or even video retrieval [22] are some of the applications in which the semantic power of topic models has been successfully exploited.
From a practical point of view, latent topics represent a kind of probabilistic models which provide methods to automatically understand and summarize data collections by means of their hidden patterns. In particular, these models are able to express data as probability distributions according to their hidden semantic patterns instead of their low-level features, which makes it easier for the data to be managed at a higher abstraction level. Precisely, this is the point that makes topic models an interesting tool to improve the semantic characterization level in HU.

D. Work Objectives and Main Contributions
With the aforementioned considerations in mind, this paper is focused on implementing a new perspective on the unsupervised HU statistical approach by means of latent topics [17]. That is, we propose to tackle the unmixing problem as a latent topic-based approach in which endmembers and abundances can be estimated according to the semantics encapsulated by the latent topic space. Several works in the literature advocate the use of topic models for remote sensing applications where the image semantics may be important, such as image annotation [23], scene classification [24], [25], or image super-resolution [26]. Nonetheless, there are few research works within the HU field, and this is precisely the gap that motivates this paper. To the best of our knowledge, there is only one work in the literature which relates HU and topic modeling [27]. However, the approach presented here provides a more powerful HU scheme with a comprehensive motivation and a more robust experimental setting to shed light on the general use of topic models within the HU field.
First, we study the straightforward application of the standard probabilistic latent semantic analysis (pLSA) [28] and latent Dirichlet allocation (LDA) [29] models within the HU field. Later, we propose a new pLSA-based model extension, called dual-depth sparse probabilistic latent semantic analysis (DEpLSA), in order to better adapt the pLSA model to the peculiarities of the unmixing problem. Specifically, we develop a dual-depth pLSA-based architecture using two different levels of topics and a dual entropy-based regularization term to introduce the sparsity constraint, which is also widely taken into account in HU [7]. Finally, we conduct an experimental comparison including some of the most popular HU approaches available in the literature.
The rest of this paper is organized as follows. Section II presents the background of this paper. In Section III, the proposed DEpLSA model is defined, which is specially adapted to HU. Section IV presents the extended HU framework based on the proposed topic model. Section V shows the experimental part of this paper, where seven unmixing methods are tested over four different hyperspectral images. Finally, Section VI discusses the results, and Section VII draws the main conclusion arisen from this paper.

II. BACKGROUND ON TOPIC MODELS
Topic models [17] can be defined as probabilistic graphical models containing one or more hidden random variables useful to uncover the hidden structure of a data collection. Specifically, given the observed probability distribution p(w|d), which describes a corpus of documents D = {d 1 , d 2 , ..., d M } in a particular word space W = {w 1 , w 2 , ..., w N }, latent topic algorithms are able to obtain two probability distributions: 1) the description of topics in words p(w|z) and 2) the description of documents in topics p(z|d). More specifically, the interpretation that we make of these elements within the HU field is the following: documents (d) are considered hyperspectral image pixels, words (w) are represented by spectral bands, word counts [n(w, d)] contain the pixel-band reflectance values, and topics (z) depict the unmixing process in which probabilities p(w|z) and p(z|d) represent the uncovered endmembers and fractional abundances, respectively.
In general, topic methods can be grouped into two reference families, one based on pLSA [28] and another based on LDA [29]. Specifically, pLSA [ Fig. 1(a)] defines a semigenerative data model by introducing a latent context variable associated with the different word polysemy occurrences. The pLSA generative process is made as follows: 1) select a document d with probability p(d); 2) pick a latent class z with probability p(z|d); and 3) generate a word w with probability p(w|z). Nonetheless, this generative process is usually called ill-defined because documents set topic mixtures and simultaneously topics generate documents, thus there is not a natural way to infer previously unseen documents [17].
As a result, Blei et al. [29] proposed the LDA model [ Fig. 1(b)] as a more general framework to overcome pLSA limitations. LDA [ Fig. 1(b)] represents documents as a multinomial of topic mixtures generated by a Dirichlet prior which is able to predict new documents. The LDA generative process can be defined as follows.
2) Choose a parameter vector for the topic distribution, θ ∼ Diri chlet (α). The parameter α is a K -vector (K is the number topics) with components α k > 0 and θ is a K -vector so that θ k ≥ 0 and K k=1 θ k = 1. p(θ |α) is the probability density function of the Dirichlet distribution.
3) For each one of the N d words w n : a) Choose a topic t n ∼ Multinomial(θ ). b) Choose a word w n from p(w n |t n , β), a multinomial probability conditioned on the topic t n , where β is a K × N matrix (N is the number of terms in the vocabulary) so that β i j = P(w j |t i ) for all 1 ≤ j ≤ N and 1 ≤ i ≤ K . Although both the pLSA and LDA models have shown to be effective in many fields, there are some practical differences that need to be reviewed. The number of pLSA parameters grows linearly with the number of training documents which makes this model particularly memory demanding and susceptible to overfitting. LDA potentially overcomes these drawbacks by using two Dirichlet distributions, one to model documents θ ∼ Dir (α) and another to model topics p(w|t, β) ∼ Dir (β). However, the α and β hyperparameters have to be estimated during the topic extraction process, and it logically adds an extra computational time and makes LDA performance highly sensitive to the quality of this estimation. In practice, α and β are estimated by iterating over the document collection that results in LDA requiring relatively dense distributions to obtain a good hyperparameter estimation [30]. Even Chang et al. [31] stand that pLSA is able to obtain a topic structure more correlated with the human judgment than LDA, even though the perplexity metric may suggest the opposite. All these facts make the pLSA-based models usually preferred when relatively few training samples are available according to the complexity of the problem [31].
In the context of unsupervised HU, the amount of information available to estimate endmembers and abundances is generally rather limited due to the fact that the unmixing process is carried out using only the own input image. As a result, the pLSA-based models may take advantage of considering the document collection as model parameters in order to obtain a better spectral semantic characterization than LDA. Certainly, this hypothesis is supported by the fact that, in the unmixing field, it is usual to see approaches based on the nonnegative matrix factorization (NMF) [32]- [34] which is, in some sense, connected to pLSA.
The NMF approximation relies on a linear algebra to factorize an input matrix into two multiplicative factors in an analogous way to pLSA. Despite the relation between NMF and pLSA [35], there are important implications derived from the pLSA use that can be highly beneficial in HU. First, pLSA offers a highly consistent probabilistic framework to develop further model extensions [3]. Second, pLSA parameters represent probability distributions, whereas NMF factors are simply a set of values as vectors or arrays. This fact is especially relevant in HU, because the estimation of fractional abundances directly fits into this probabilistic nature. Besides, it also allows evaluating the importance of the estimated endmembers as semantic patterns. Third, pLSA topic-word distributions p(w|z) are identifiable in the vocabulary of mixture models, unlike NMF factors. This has some implications on the theoretical properties of the pLSA maximum likelihood estimator, such as a strong consistency [36].
All in all, pLSA offers a more convenient framework than NMF and this is precisely the reason why the HU model we propose in this paper is based on pLSA. Additionally, note that the way we use pLSA and LDA for the HU problem is by assuming that endmembers correspond to topic-word distributions, i.e., p(w|z), and abundance factors can be estimated according to the model by document-topic distributions, i.e., p(z|d).
In addition to the NMF approach, different kinds of unmixing models have been also proposed in the recent literature. For instance, Yang et al. [37] present a novel abundance estimation algorithm based on the bilinear mixture model, which constructs a group of hyperplanes in the low-dimensional feature space to reduce computational complexity. Other works, such as [38], take advantage of a modified Gaussian model to investigate the mineralogical extraterrestrial soil composition.
Halimi et al. [39] propose two novel hyperspectral mixture models that account for the presence of nonlinearities by considering a residual term in addition to the linear mixture of endmembers with the sum-to-one and nonnegativity abundance constraints. In addition, Ma et al. [40] define a robust sparse unmixing method that simultaneously handles noise and outliers by adopting a l 2,1 norm loss function.
When considering all these recent methods, the proposed approach is mainly different in three aspects. First, the proposed model is focused on the generative process of the data rather than defining a specific unmixing model equation. Logically, both concepts are related since the data eventually contain the result of the unmixing process, but it is important to highlight that the generative approach itself provides a more complete framework than considering a specific mixing equation, because the sum-to-one and nonnegativity properties of fractional abundances are inherently incorporated. Second, the proposed approach does not consider any kind of prior distribution but only the own data, which eventually simplifies the complexity of the model when compared to other Bayesian approaches that assume prior distributions with some hyperparameters [41], [42]. Third, the proposed approach is able to estimate both the endmembers and fractional abundances, whereas many of the existing Bayesian methods available in [43] and [44] are only focused on estimating abundances from a given set of spectral signatures, that is, they deal with the unmixing problem from a semisupervised perspective, whereas the proposed approach does not make use of a spectral library.

III. DUAL-DEPTH SPARSE PROBABILISTIC SEMANTIC ANALYSIS
The starting point of the proposed DEpLSA model is the asymmetric formulation of pLSA [ Fig. 1(a)], where a latent topic z is chosen for each document d conditionally to the p(z|d) probability distribution and then a word w is generated from that topic according to p(w|z). The proposed model extension introduces a new latent variable z in order to create a new level of topics when connecting documents d and words w. Additionally, we introduce two diverging regularization factors, i.e., δ d and δ z , to guarantee a dual sparsity constraint when estimating both the abundances and endmembers represented by p(z|d) and p(w|z) parameters in the model. Hereinafter, we use the terms deep topic and restricted topic to identify z and z, respectively. Fig. 2 shows the DEpLSA graphical model representation where shaded nodes represent visible random variables.
Likewise in pLSA, the DEpLSA generative process can be described as follows.
a) A restricted topic z is chosen according to the conditional distribution p(z|d) that expresses documents in restricted topics. b) A deep topic z is chosen according to the conditional distribution p(z |z) that encapsulates the relation between both the levels of topics. c) Finally, a word w is chosen according to the conditional distribution p(w|z ) that expresses deep topics in words. The rationale behind the use of DEpLSA in HU is based on using the deep topics z to generate the semantic representations of the input spectral data, and then using the restricted topics z to learn endmembers and abundances in this semantic space. That is, the deep latent space p(z|z ) initially projects the original spectra, defined by the visible words w, onto a high-dimensional space of K topics in order to unfold the semantic patterns of the input data. Then, K -restricted topics can be learned at a higher semantic level to uncover the K endmembers, i.e., p(w|z), and the corresponding abundance maps, i.e., p(z|d). The main difference between z and z random variables lies in two factors, the space dimensionality and the δ d and δ z sparsity constraints. That is, the deep topic space projects the data onto a high-dimensional space (K K ) in order to capture fine semantic image patterns of the spectral data. Then, the restricted topic space takes advantage of these patterns to conduct the unmixing process by fixing the number of restricted topics to the number of endmembers. The use of z is motivated by the fact that the deep topic space allows extracting endmembers and abundances over a semantic characterization space instead of the original spectra. In other words, this high-dimensional space enables connections among spectral signatures through the image patterns defined by topics, and therefore, it is able to generate a higher abstraction level for the unmixing process.
In addition, the proposed model makes use of two regularization factors to introduce some sparsity constraints over p(z |z) and p(z|d) probability distributions in order to reduce the uncertainty and noise. On the one hand, δ z aims at sparsing the p(z |z) distribution, which defines the probability of the deep topics z given the restricted topics z. Note that the number of deep topics (K ) is significantly higher than the number of endmembers (K ); therefore, it is reasonable to assume that each endmember (restricted topic) is modeled using only a limited number of deep topics. In a sense, this assumption is analogous to the one considered by other sparse coding-based methods, such as [45] and [46], to reduce the uncertainty when choosing endmembers from a given spectral library. Logically, the proposed method does not require any spectral library, but it can also take advantage of this sparsity constraint over the deep topic space. On the other hand, the δ d factor intends to reduce the entropy of the p(z|d) distribution in order to guarantee a better model convergence. That is, the uniform distribution 1/K is the most uninformative (highest entropy) abundance map configuration from an information theory perspective, and precisely, this regularization pursues to encourage sparser and more informative abundance map solutions. In other words, the δ d regularization aims at neglecting noisy components among the spectral patterns present in a specific pixel (document). This strategy is also common in the sparse-coding field to deal with the input noise (see [47]) and this is precisely the reason why we make use of it.

A. Model Relaxation
The proposed model takes an advantage of two different levels of topics to connect d and w random variables; however, this fact has an important implication related to the model inference cost: an additional freedom degree is required to capture the relationships between z and z hidden random variables. From a pLSA-based perspective, this additional level may become unaffordable as the input image size increases, because each variable marginalization over the posterior distribution requires to evaluate the Cartesian product between z and z . As a result, in order to alleviate the computational cost of managing two different levels of topics when estimating the model parameters, we propose to apply the following model relaxation based on two sequential steps.
1) Learning Deep Topics z (DEpLSA-1): In the first phase [ Fig. 3(a)], the proposed DEpLSA model is simplified to estimate the deep topic space using a regular pLSA approach. Specifically, components z, δ d , and δ z are removed from the model in order to approximate the deep-topic distribution, i.e., p(w|z ), directly from the observable input documents d. As Fig. 3(a) shows, the DEpLSA-1 model relaxation corresponds to the inner part of DEpLSA and it is equivalent to the standard pLSA model. Therefore, parameters ∼ p(z |d) and ∼ p(w|z ) can be initially estimated using pLSA over the input hyperspectral image in order to extract K deep topics. 2) Learning Restricted Topics z (DEpLSA-2): Once parameters and have been estimated, the outer part of DEpLSA corresponds to a dual sparse pLSA model where the random variable related to the deep topics z becomes observable as Fig. 3(b) shows. In particular, we use the parameter of DEpLSA-1 as the input word-document distribution for DEpLSA-2, i.e., n(d, z ) ≈ . Note that this assumption implies considering a uniform prior probability over deep topics, which is a quite general premise; however, different prior probability values could also be used instead to encourage specific topics. Eventually, the ∼ p(z|d) and ∼ p(z |z) parameters of the DEpLSA-2 model are estimated using K -restricted topics which represent the number of endmembers in the input scene. This model relaxation allows reducing the DEpLSA computational cost to the pLSA order. Since the DEpLSA-1 model relaxation corresponds to the standard pLSA formulation [28], in Section III-B, we only provide the formulation for the DEpLSA-2 model.

B. Expectation-Maximization Formulation for DEpLSA-2
DEpLSA-2 parameters, and , are estimated by maximizing the complete log-likelihood using the Expectation-Maximization (EM) algorithm [48]. First, let us define the likelihood function in terms of the density function of a document collection D (1) where N k represents the total number of deep topics required to generate the document d, K is the considered number of deep topics, and n(z , d) denotes the number of times the deep topic z occurs in the document d. The joint probability p(z , d) can be factorized according to the DEpLSA-2 model as follows: Note that K represents the number of considered restricted topics. Inserting (2) in (1), we obtain the expression of the complete likelihood . ( The target is to estimate the ∼ p(z|d) and ∼ p(z |z) parameters that maximize the complete likelihood function L c , nonetheless, multiplicative and exponential factors are hard to optimize. Due to the monotonic nature of the logarithmic function, we can equivalently maximize the complete loglikelihood (4) remaining the optimization problem as the following equation shows This expression is hard to maximize because of the summation inside the logarithm. Taking advantage of the log function properties, we can make use of the concave version of Jensen's inequality as follows: As a result, the expression to optimize remains as follows: Let us now introduce the normalization constraints for parameters p(z|d) and p(z |z) by inserting the appropriate Lagrange multipliers α and β Finally, the solution is regularized using the dual sparsity factors δ d and δ Z to maximize the Kullback-Leibler (KL) divergence between the uniform distribution over documents (U d = 1/D) and restricted topics (U z = 1/K ) with respect to parameters p(z|d) and p(z |z), respectively To maximize (9), we use the EM algorithm that works in two stages: 1) E-step, where given the current estimation of the parameters, the expected value of the likelihood is computed [estimating the posterior probability p(z|z , d)] and 2) M-step, where the new optimal values of the parameters are computed according to the current setting of the hidden variables.
For the M-step, we calculate (9) partial derivatives, set them equal to zero, and solve the equations to estimate p(z |z) (10) and p(z|d) (11) parameters. Note that α and β multipliers can be obtained from the normalization constraint on topics and documents, respectively For the E-step, p(z|z , d) probabilities can be computed by applying Bayes' rule and the chain rule is as follows: The EM process is performed as follows. First, p(z|d) and p(z |z) are randomly initialized. Then, the E-step (12) and the M-step (10) and (11) are alternated until p(z |z) and p(z|d) parameters converge. As convergence conditions, we use a 10 −6 stability threshold in the difference of the loglikelihood (4) between two consecutive iterations and a maximum number of 1000 EM iterations.

IV. HU FRAMEWORK BASED ON DEPLSA
In order to enable the use of LDA, pLSA, and DEpLSA models over hyperspectral images, we make use of the bagof-words characterization scheme [49] adapted to the spectral image domain. Specifically, pixels are considered as topic model documents, spectral bands define the vocabulary terms, and document word counts are represented by the reflectance values of the bands. Note that considering an image size of (r × c × b), this characterization generates a total of D = (r × c) documents with a N = b vocabulary size. Fig. 4 shows a general overview of the proposed HU framework based on the DEpLSA model. From left to right, the proposed methodology consists of three sequential steps: 1) learning the deep topic space; 2) extracting the restricted topics; and 3) generating the output endmemember signatures and abundance maps.
Once the input hyperspectral image is characterized as a collection of spectral documents, the first step is based on applying the DEpLSA-1 model to learn the deep topic space using K latent units. Note that this step aims at obtaining a high-dimensional deep topic space to unfold semantic patterns of the original spectral data; therefore, the number of deep topics K has to be substantially higher than the vocabulary size N.
In step (2), the DEpLSA-2 model is used to extract K -restricted topics over the deep topic space previously learned. That is, the ∼ p(z |d) parameter of DEpLSA-1 acts as the input word-document distribution for DEpLSA-2 in order to uncover K endmembers considering δ d and δ z sparsity factors.
Finally, step (3) is focused on obtaining the final endmember and abundance results. The ∼ p(z|d) parameter of DEpLSA-2 provides a direct estimate of the fractional abundances due to the fact that it expresses pixels according to the probability of belonging to each one of the K endmembers. However, the considered model relaxation (Section III-A) does not provide any straightforward endmember estimation. In order to obtain such result, we approximate it by factorizing the p(w|z) probability as follows: Note that the p(w|z) distribution represents the K -restricted topics in the initial vocabulary space of spectral bands; therefore, it provides an estimation of the spectral signatures. In particular, we initially factorize p(w|z) according to the proposed DEpLSA model (Section III). Then, the and parameters can be used to connect both the deep topic space and the restricted topic space estimated by the DEpLSA-1 and DEpLSA-2 models, respectively.

V. EXPERIMENTAL RESULTS
The experimental part of this paper aims at validating the performance of LDA, pLSA, and DEpLSA models within the HU field against several unmixing algorithms available in the literature. In particular, Section V-A introduces the five hyperspectral images used in the experiments, Section V-B describes the experimental setting, and Section V-C shows the obtained results.

A. Data Sets
We have considered five different hyperspectral data sets in our experiments, one synthetic image, called Fractal [50], and four real hyperspectral images, which are Samson, Jasper, Urban, and Cuprite data sets [51]. These images have been selected, because they are used in many recent works [52]- [55] and also because they are publicly available and can be easily donwloaded from websites [56], [57]. In the following, we provide a description of all the considered hyperspectral data sets. 1) Fractal [ Fig. 5 [51], [59], [60] is one of the most challenging data sets for HU that covers the Cuprite mining district in NV, USA This image contains 224 channels, ranging from 370 to 2480 nm. Similar to the other data sets, a region of 250 × 190 pixels is considered and we also remove the noisy channels (1-2 and 221-224) and water absorption channels (104-113 and 148-167) in order to maintain a total of 188 channels. A total of 12 types of minerals are present in the scene: Alunite, Andradite, Buddingtonite, Dumortierite, Kaolinite1, Kaolinite2, Muscovite, Montmorillonite, Nontronite, Pyrope, Sphene, and Chalcedony.

B. Experimental Settings
The proposed approach has been validated against nine different methods selected from the literature. In particular, three different families of methods have been included in this experimental comparison: geometrical-based method, NMF-based method, and latent topic-based method. Regarding the first group, the vertex component analysis (VCA) [8] and minimum volume simplex analysis (MVSA) [9] unmixing methods have been considered. For the second one, four different variations of the standard NMF procedure [61] have been taken into account: NMF-div [62], which uses the KL divergence criterion to perform the decomposition; NMF-mse [62], which employs an Euclidean objective function; NMF-sp [63], which also introduces a sparsity constraint; and collaborative nonnegative matrix factorization (CNMF) [34], which uses two different kinds of regularization terms. Finally, three different topic models have been tested for the unmixing problem, LDA [29], pLSA [28], and pLSA-sp [27], which adds a sparsity constraint over documents.
All these methods have been selected because their implementations are publicly available and besides they allow estimating both the endmembers and abundances in the same form as the proposed framework does. That is, we assume that the number of endmembers K is known in advance, therefore, all the tested methods make use of this information when conducting the unmixing experiments. Note that multiple works in the literature are focused on estimating the number of endmembers, thus some methods like [64] could be used as a preprocessing step to estimate this number.
Whenever possible, the considered methods have been tested using a similar parameter configuration in order to conduct experimental comparisons that are as fair as possible taking into account the approaches' diversity. In particular, the NMF-sc abundance and endmember sparsity constraints have been fixed to 10 −2 and 1/N, respectively. Similarly, the pLSA-sp abundance sparse factor has been set to 10 −2 . Finally, the δ d and δ z sparsity factors of the proposed approach have been fixed to 10 −2 and 1/K . Note that for the δ z term, we use 1/K instead of 1/N, because K is the vocabulary length of the restricted topics which is set to 1000 in this paper. Regarding the convergence of the algorithms, we have considered a maximum of 1000 iterations for both the NMF-based and topic-based models.
In order to perform a quantitative evaluation of the HU results, two reference metrics are used: the spectral angle distance (SAD) and the root-mean-squared error (RMSE). On the one hand, the SAD [65] is used to assess endmember estimates by considering each spectral band as a coordinate axis and then computing the average angle between the estimated endmembers M and the ground-truth ones M.
On the other hand, the RMSE metric is useful to evaluate the abundance estimation results by computing the absolute differences between the estimated abundances ( A) and the ground-truth ones (A) as the following equation shows. Note that R and C represent the input image size Regarding the ground-truth information used to compute these metrics, we use the ground-truth data available in [56] and [57]. In the case of the simulated image (Fractal), real endmembers and abundances are logically known in advance. For Samson, Jasper, and Urban images, we make use of the ground-truth information in [66], which has been obtained using a semisupervised approach also employed in [67] and [68]. First, the virtual dimensionality method [69] is used to find out the number of endmembers.
Second, the endmember spectral signatures are manually chosen from the USGS library 1 and other hyperspectral libraries, according to the acquisition area. Finally, the corresponding fractional abundances are generated by solving the constrained convex optimization also applied in [68]. For the Cuprite data set, we consider the USGS library signatures of the most representative minerals in the scene, available in [57].

C. Results
Tables I and II show the SAD and RMSE quantitative assessments for the considered unmixing methods and data sets. In the case of the synthetic Fractal data, four different levels of Gaussian noise are considered, i.e., without noise (noNoise), 30, 50, and 70 SNRs. In the case of the real data, four different hyperspectral images are included, i.e., Samson, Jasper, Urban, and Cuprite. Regarding the considered unmixing methods, two column groups are differentiated: 1) nontopic-based methods, i.e., VCA [8], MVSA [9], NMF-div [62], NMF-mse [62], NMF-sp [63], and CNMF [34] and 2) topic-based methods, i.e., LDA [29], pLSA [28], and pLSA-sp [27] together with the results   obtained by the proposed approach. Note that two different columns are shown for the proposed approach: DEpLSA-noReg, which shows the result obtained without regularization (that is, δ d = δ z = 0), and DEpLSA, where the configuration explained in Section V-B is used.
In addition to the quantitative evaluation provided by the SAD and RMSE metrics, some visual results are presented as a qualitative evaluation for the tested HU methods. Specifically, Figs. 6-8 show the abundance estimation results for Samson, Jasper, and Urban data sets obtained by a subset of the tested methods: VCA, NMF-sp, CNMF, LDA, pLSA, pLSA-sp, and the proposed DEpLSA model. Besides, the acronym GT represents the ground-truth abundances. Since, for the Cuprite data set, there is no GT abundance map that can be used as a reference, we do not display the abundance maps in this case. Additionally, Fig. 9 provides the Cuprite endmember results and abundance maps for the proposed method.

VI. DISCUSSION
According to the unmixing results obtained from the synthetic hyperspectral data (Tables I and II), the proposed approach is able to provide a remarkable performance improvement with respect to the rest of the tested methods when considering a certain amount of input noise. That is, DEpLSA obtains the best SAD and RMSE values for Fractal 30 SNR and the second and third best average quantitative results for Fractal 50 SNR, which indicates the good performance of the proposed approach under noisy conditions. Nonetheless, it is also possible to observe that other methods, especially CNMF, tend to obtain better results in a low-level noise scenario (Fractal 70 SNR) and in free-noise conditions (Fractal noNoise). Note that the noncomplex nature of the simulated data makes that pure pixels can easily occur when not considering noise or a small amount of it; therefore, straightforward methods may generate a more accurate unmixing estimates than the proposed approach probabilistic nature. However, the absence of noise is not a realistic premise in a real remotely sensed hyperspectral data production scenario where images are typically affected by many different kinds of perturbations and image corrections that introduce some noise as well. Precisely, this is the reason why we also use the real hyperspectral data in Section V of this paper.
The quantitative assessment reported in Tables I and II reveals that the proposed DEpLSA model is able to achieve a competitive HU performance with the four real considered hyperspectral data sets. When considering the SAD metric (Table I), the proposed approach shows a remarkable reduction on the angular deviation between the estimated endmembers and the ground-truth ones. Although the geometrical method (VCA) and the NMF-based one (CNMF) also exhibit good capabilities to extract endmembers, the result provided by DEpLSA tends to be more accurate as well as robust. That is, although VCA and CNMF decrease their effectiveness for Jasper and Urban data sets, the proposed approach is able to maintain a good performance for all the considered hyperspectral images.
In the case of the RMSE metric, it is possible to identify a similar behavior. More specifically, DEpLSA achieves the best results and the second best average value is obtained by NMF-sp. Nonetheless, the performance of the proposed approach remains significantly higher on an average. According to the reported results, we can also see that VCA and CNMF are certainly less effective to estimate abundances than to estimate endmembers.
When analyzing the unmixing results obtained by topic models, several interesting observations can be made. First, the pLSA model clearly shows a better performance for unmixing tasks than LDA. As it was mentioned in Section II, LDA requires an initial estimation of two Dirichlet hyperparameters and the quality of this estimation may be affected by the number of available documents. In the blind HU application, the number of pixels is constrained to the input image size, and this number is usually rather limited to generate a good initial hyperparameter estimate for LDA. However, pLSA takes an advantage of the use of input pixels as model parameters and, therefore, it is able to extract more accurate semantic patterns using less input information than LDA.
Another important point is related to the use of regularization to relieve the ill-posed nature of the HU problem. As we can see in Tables I and II, the standard LDA and pLSA models present poor global performance, especially when estimating endmembers. However, a substantial improvement can be reached when introducing sparse regularization terms. The presented results show that pLSA-sp and NMF-sp achieve an important performance improvement over standard models (i.e., pLSA, NMF-div, and NMF-mse) by means of considering sparsity constraints on the final estimation of fractional abundancies. In general, sparse regularization is an excellent tool to reduce the uncertainty in the unmixed space [7], and this is precisely the reason why this technique is also effective in a topic-based HU.
Overall, VCA and CNMF show good performance in the task of extracting endmembers, and NMF-sp in the task of estimating estimate abundances. Nonetheless the proposed approach is able to provide superior results in both the tasks. The abundance maps shown in Section V also support this observation. In the case of the Samson data set (Fig. 6), soil, tree, and water endmembers are represented using pure red, green, and blue colors, respectively. As we can see, VCA, CNMF, and DEpLSA are the methods that better distinguish among these three endmembers. However, DEpLSA is able to provide a more accurate result. In the coastal area of Fig. 6, VCA and CNMF generate a blurring effect, whereas DEpLSA obtains a result more similar to the ground-truth abundances. In Figs. 7 and 8, it is also possible to find similar examples to validate the results obtained by the proposed approach. Regarding the Cuprite endmember results provided in Fig. 9, we can also observe that the proposed approach is able to obtain spectral signatures very similar to the corresponding ground-truth endmembers.

A. Performance Analysis for Different Sparsity Factors
As it has been commented in Section V-B, the proposed approach δ d and δ z sparsity factors have been set to 10 −2 and 10 −3 , for all the data sets, in order to use a general configuration. That is, according to the information provided in Section III, these two regularization factors allow us to neglect those small noisy components appearing in p(z|d) and p(w|z) probability distributions throughout the EM optimization process. In order to highlight this point, Fig. 10 shows the quantitative assessment for different parameter configurations over the Samson data set.
Specifically, Fig. 10(a) and (b) provides the SAD and RMSE evaluation when considering δ d and δ z within the range 0.00-0.04. In addition, Fig. 10(c) and (d) shows the corresponding SAD and RMSE details in the range 0.000-0.004. As we can see in Fig. 10(a) and (b), the optimal δ d parameter seems to be between 0.002 and 0.03 for the SAD metric and between 0.00 and 0.02 when considering the RMSE result. Regarding the δ z parameter, it initially seems not to have a significant impact on the performance, at least for the considered value range. However, the details provided in Fig. 10(c) and (d) reveal that a small regularization is convenient in both the cases.
That is, both the SAD and RMSE details show that there is a small area close to the axis representing δ d = 0 and δ z = 0 where the metric performance tends to decrease. Precisely, this effect is produced because tiny probability values, which somehow can be considered noisy in a real scenario, are not regularized in the EM process when considering null sparsity factors. As a result, the proposed approach can take advantage of small regularization factors to increase the resulting performance and this is the reason why we set δ d and δ z to 10 −2 and 10 −3 as a general settings for all the data sets. Logically, this configuration may not be optimal for all the considered hyperspectral images; however, it pursues to avoid the aforementioned effect while providing the most general scheme. This assertion is also supported by the fact that the proposed approach with the considered configuration is able to outperform the corresponding nonregularized version (DEpLSA-noReg) for all the experiments except for the freenoise synthetic image.

B. Advantages and Limitations of the Proposed Approach
The main advantage of the proposed DEpLSA-based HU framework lies in the deep-topic structure that it offers to uncover endmembers and abundances. Even though some methods in the literature have tried to use pLSA [27] (or, analogously, NMF [34], [62], [63]) for the unmixing task, they mainly use these models as a straightforward data decomposition process. That is, endmembers and abundances are estimated over the initial spectral space, and this fact limits the semantic potential of topic models in HU. The proposed DEpLSA model introduces a new level of topics, called deep topics (z ), in order to extract the semantic representations of the input spectral data. Then, another level of topics, called restricted topics (z), is used to uncover the endmembers and abundances over these semantic representations. In this way, the unmixing process is conducted over a semantic representation space, unlike the classical straightforward approach. Additionally, we introduce a dual sparsity constraint over restricted topics to guarantee regularized solutions.
The underlying rationale behind the improvement provided by DEpLSA is based on its potential to better discern similar spectral patterns in the deep topic space. Let us explain this concept through a simple visual example. Fig. 11(a) shows the original Samson data projected onto the two first principal component analysis (PCA) components, where soil, vegetation, and water pure pixels are colored in red, green, and blue, respectively.
When using pLSA-sp over the original spectral space [ Fig. 11(b)], we can see that pure pixels tend to maintain essentially the same data variability, that is, the extracted topics do not provide any substantial improvement on the data simplex geometry. Nonetheless, the restricted topic space uncovered by the proposed approach [ Fig. 11(d)] is able to define a clearer geometry over the extracted topics by means of reducing the intramember variability. In other words, the deep  Fig. 11(c)] initially compacts the data that share the same hidden spectral patterns. Then, the restricted topic space [ Fig. 11(d)] is able to reduce the uncertainty in dense areas of the simplex while maintaining the variability in the less dense parts of it.
Despite the potential of the proposed approach, it still has some limitations that need to be mentioned at this point. Specifically, its computational time is one of them. Table III shows a summary of the computational time required by VCA, NMF-sp, CNMF, pLSA-sp, and DEpLSA unmixing methods over the tested images. As we can see, the proposed model is, on average, the most computationally demanding one and this fact may limit its application.
According to the model relaxation introduced in Section III-A, we can reduce the DEpLSA computational load to a pLSA order cost using a regular pLSA model to estimate the deep topic space and a regularized pLSA model to uncover the restricted topic space. However, the computational time that we obtain depends, in practice, on the number of deep topics, that is, K = 1000, which is significantly higher than the number of initial spectral bands (N), i.e., 156, 198, 162, and 188 bands for Samson, Jasper, Urban, and Cuprite, respectively. Even though the high dimensionality of the deep topic space may allow a faster convergence of the model, the total computational time of DEpLSA remains on average 1.3, 2.4, and 3.2 times higher than CNMF, NMF-sp, and pLSA-sp costs. At this point, it should also be mentioned the high efficiency of the geometrical method VCA.
Another limitation of the proposed approach is related to the absence of noise on the input image. Even though this may not be an actual limitation in a real-life scenario because of the inherent complexity of the real remotely sensed hyperspectral data, the DEpLSA model has shown a limited performance with the noise-free synthetic data. The proposed approach has been specially designed to deal with the hyperspectral data complexity through the semantic patterns uncovered by the deep topic space; however, the simpler nature of the simulated data makes other straightforward methods more convenient than the probabilistic nature of the proposed approach.

VII. CONCLUSION
In this paper, we have presented a new topic-based unmixing framework specially designed to estimate both the endmembers and abundances from remotely sensed hyperspectral imagery. Specifically, we introduce a so-called DEpLSA model in order to deal with the unmixing problem, following a pLSA-based dual-depth architecture. The proposed model generates a first level of deep topics to extract the semantic representations of the input hyperspectral data. Then, the second level of restricted topics is computed to estimate endmember spectral signatures and fractional abundances according to the uncovered spectral patterns. Our experimental comparison, conducted using four different hyperspectral data sets, reveals that the proposed approach is able to provide competitive performance with respect to standard topic models, as well as several state-of-the-art unmixing methods available in the literature.
One of the first conclusion that arises from this paper is the potential of the pLSA-based models to cope with the HU problem. In general, the pLSA-based models have shown to obtain better unmixing results than LDA, because they can take advantage of the use of spectral pixels as model parameters to generate better topic estimates.
Another important conclusion is related to the use of sparse regularization within the HU field. As the conducted experiments reveal, pLSA and NMF obtain a substantial performance improvement when considering sparsity constraints over spectral signatures and fractional abundances. In a sense, these sparse assumptions help unmixing models to avoid uninformative solutions.
Finally, the most relevant conclusion of this paper is related to the effectiveness of the proposed pLSA-based dual-depth architecture to cope with the unmixing problem, especially under real remotely sensed hyperspectral data. Although the common algorithm design trend relies on using some models like pLSA as a straightforward data decomposition process, the proposed DEpLSA model transforms this classical perspective into a new probabilistic framework under which the HU process is conducted according to the semantics encapsulated by the deep topic space.
Although the proposed approach results are encouraging as an HU technique using semantic representations, it still has some limitations that provide room for improvement to conduct more research on the topic-based HU. Specifically, our future work is aimed at the following directions: 1) the development of a parallel implementation of the DEpLSA model to significantly reduce its computational time, using graphics processing units; 2) the extension of our model to estimate the ideal sparsity factors for each input image; 3) the design of automatic procedures to set the most appropriate number of topics in the deep topic space; and 4) an extension of the proposed HU framework to automatically find the number of endmembers in the original hyperspectral image.

ACKNOWLEDGMENT
The authors would like to thank the Editors and the Anonymous Reviewers for their outstanding comments and suggestions, which greatly helped them to improve the technical quality and presentation of this paper.