Sparse Multi-modal probabilistic Latent Semantic Analysis for Single-Image Super-Resolution

This paper presents a novel single-image super-resolution (SR) approach based on latent topics in order to take advantage of the semantics pervading the topic space when super-resolving images. Image semantics has shown to be useful to relieve the ill-posed nature of the SR problem, however the most accepted clustering-based approach used to define semantic concepts limits the capability of representing complex visual relationships. The proposed approach provides a new probabilistic perspective where the SR process is performed according to the semantics encapsulated by a new topic model, the Sparse Multimodal probabilistic Latent Semantic Analysis (sMpLSA). Firstly, the sMpLSA model is formulated. Subsequently, a new SR framework based on sMpLSA is defined. Finally, an experimental comparison is conducted using seven learningbased SR methods over three different image datasets. Experiments reveal the potential of latent topics in SR by reporting that the proposed approach is able to provide a competitive performance.


Introduction
The objective of image Super-Resolution (SR) is to improve image resolution but not only by increasing the number of pixels but also by providing spatial details beyond the acquisition sensor precision. In the case of single-image SR (hereafter referred as SR), a single Low-Resolution (LR) image of the objective scene is used to generate the super-resolved output which pursues to recover High-Resolution (HR) features as if the input image were acquired using a sensor with a higher nominal resolution.
SR techniques have found a fertile domain in many applications where resolution enhancement is important. For instance, biometric identification, video surveillance, medical diagnosis, microscopic observation and remote sensing are some of the most popular application fields where SR is useful to overcome the acquisition sensor limits whatsoever.

Related work
In the literature, it is possible to find several quality works that provide a good overview of the existing SR algorithms [1,2,3,4]. Roughly speaking, SR algorithms can be categorized into three different groups, image REconstruction (RE), image LEarning (LE) and HYbrid (HY) methods.
RE methods try to reconstruct HR details in the super-resolved output assuming a specific degradation model along the image acquisition process. The imaging model is typically defined by the concatenation of three operators, blurring, decimation and noise. As a result, RE methods can be seen as an inverse problem of deblurring, upsampling and denoising the input LR image. Each RE method makes its own assumptions to introduce a certain prior knowledge to well pose the inverse nature of the SR problem. For instance, iterative back projection [5], gradient profile prior [6] or Point Spread Function deconvolution [7,8] are some of the most popular RE approaches. Although these and other RE methods have shown to be effective to reduce the noise as well as the blur and aliasing inherent to interpolation kernel functions, the lack of relevant highfrequency information in the LR input image limits their effectiveness to small magnification factors [9].
LE methods provide a more powerful scheme by learning the relationships between LR and HR domains from an external training set. Over the past years, different machine learning paradigms have been successfully applied in SR. Sparse coding [10], neighbourhood embedding [11] and mapping functions [12,13] are amongst the most popular LE methods in the literature.
Sparse coding-based techniques take advantage of the fact that natural images tend to be sparse when they are characterised as a linear combination of small patches. In this way, dictionary atoms can be initially learnt by forcing LR and HR training images to share the same sparse codes. Then, the LR input image sparse codes can be estimated using the LR dictionary and finally these sparse codes can be used over the HR dictionary to generate the final super-resolved output.
Neighbourhood embedding techniques assume that small image patches of LR images describe a low-dimensional non-linear manifold with a similar local geometry to their HR counterparts. As a result, HR patches can be generated as a weighted average of local neighbours using the same weights as those used in the LR domain. An example of this approach can be found in [11].
However, this work extends the classical idea of neighbourhood embedding by learning an initial sparse dictionary to reduce the number of atoms to perform the embedding and therefore reducing the computational time.
Mapping-based methods consider the SR task as a regression problem between the HR and LR spaces. The underlying idea is based on learning a mapping function between LR and HR images from a specific training set. Then, this function can be used to generate the final SR result from the LR input image. In the literature, we can find different kinds of techniques to perform that regression. Neural networks [12] and Bayesian models [13] are some of the most recent approaches. Despite the fact that LE methods are able to learn spatial details that are impossible to recover by RE approaches, their main limitation is based on the availability of a suitable training set containing HR images.
HY methods work towards reaching an agreement between RE and LE methods. In particular, they perform a training process but using only the LR input image. The rationale behind HY methods is based on the patch redundancy property pervading natural images which assumes that natural images tend to contain repetitive structures within the same scale and over scales as well. Taking this principle into account, it is possible to find patches which appear in a lower scale, without any blurring or decimation, and then extracting their corresponding HR counterparts from the higher scale image. Eventually, the super-resolved image can be generated using the LR/HR relations learnt across scales. Each specific HY approach defines its own assumptions about the imaging model and the patch searching criteria. For example, the work presented in [14] approximates the blur operator by a Gaussian kernel and the patch redundancy is carried out by an approximation of the nearest neighbour search.
In other works, such as in [15], the blur operator is estimated at the same time as the SR output is generated through an optimisation process. Despite their advantages, HY-based methods are not able to learn as many LR/HR relations as LE methods do and this limits their potential in SR. Note that the starting point in any HY method is a LR image and the lower the resolution the lower the probability to find patches satisfying the redundancy property at a lower scale.

Current limitations and trends
LE methods have shown to be the most effective ones under a suitable training data. However, each learning model has its own generalisation constraints what makes the SR performance highly application field dependant [3]. Recent research lines try to overcome this limitation by taking advantage of the socalled image semantics [16], that is, modelling the image visual interpretation humans do. Uncertainty is one of the most important issues in SR because of the ill-posed nature of the problem, therefore modelling semantic concepts may help to discover semantic connections among patches and consequently to alleviate some ambiguities when super-resolving LR images. The idea behind this methodology is based on learning a specific model for each semantic concept appearing in the training data and then super-resolving the LR input image using the most suitable model for each patch.
These semantic concepts are usually defined in an unsupervised way according to an initial clustering process over training patches. Then, a classifier is trained to predict the semantic concept related to each LR input patch and therefore the corresponding SR model to be used. A representative semanticbased method can be found in [17] where authors present a SR approach that make use of the Expectation-Maximisation (EM) algorithm to initially cluster the data and then a linear regression function can be learnt for each group.
Nonetheless, the high complexity of visual patterns in the image domain makes this straightforward approach unable to capture complex semantic concepts and relationships what eventually limits the semantic power in SR [16]. As a result, more research is required to keep improving the SR process via the image semantics research line.
During the last years, topic models have shown their potential to effectively cope with all kind of tasks by providing data with a higher level of semantic understanding [18]. Text categorisation [19], vocabulary reduction [20], visual encoding [21], image recognition [22] or even video retrieval [23] are some of the applications where topic models have been successfully used.
From a practical point of view, latent topics represent a kind of probabilistic models which provide methods to automatically understand and summarize data collections by means of their hidden patterns. Specifically, given the observed probability distribution p(w|d), which describes a corpus of documents algorithms are able to obtain two probability distributions: (1) the description of topics in words p(w|z) and (2) the description of documents in topics p(z|d).
Within the image processing field, image patches usually represent documents, patch pixel positions in each patch generally define the vocabulary words and document word-counts are typically represented by pixel intensity values. In this scenario, latent topics can be seen as distinctive pixel distributions that represent the hidden image patterns of the input data. In other words, p(w|z) is able to describe image patterns not explicitly present in the input data and consequently p(z|d) characterises image patches at a higher abstraction or semantic level.
The majority of topic methods can be grouped into two model families, one based on probabilistic Latent Semantic Analysis (pLSA) [24] and another based on Latent Dirichlet Allocation (LDA) [25]. Although both pLSA and LDA models have shown to be effective in many fields [26,27,28,29,30], pLSA usually takes advantage of considering the document collection as model parameters in order to obtain a set of topics more correlated to the human judgement than the topics obtained by LDA [31].
The point which makes pLSA and other topic models a suitable tool for SR is their capability to represent samples in a higher-level characterization space, the so-called topic-space Z = {z 1 ,z 2 ,...,z K }. In this space, documents are expressed as probability distributions according to their feature patterns instead of their low level features, which makes it easier for the documents to be managed at a higher abstraction level.
Despite the fact that several works in the literature advocate the use of topic models for semantic related image processing tasks [32,16], there are almost no research work done within the SR field. Besides, the few works using topic models are not taking advantage of the inherent semantics of the topic-space to super-resolve images. For instance, the work presented in [33] uses pLSA just as a clustering algorithm of a LE-based approach but not as a model to super-resolve the data.

Work objectives and main contributions
The main objective in this work is to super-resolve images following a generative framework provided by topic models in order to manage the SR semantic variability through the patterns defined by topics. That is, this work transforms the classical LE-based SR approach into a latent topic-based probabilistic approach where the SR process can be conducted according to the semantics encapsulated by the latent topic space. Specifically, we first define a pLSA-based extension, Sparse Multi-modal probabilistic Latent Semantic Analysis (sMpLSA), aimed at learning a common topic-space between LR and HR domains. Later, we use sMpLSA to super-resolve LR input images by superresolving latent topics instead of image patches themselves. In a sense, sMpLSA allows us to tackle the SR problem as a neighbourhood embedding approach but taking into account the semantic nature of the topic space when generating the super-resolved result.
This paper extends our previous work [34] where LDA model was initially used to super-resolve remote sensing imagery. In particular, this initial approach has two main limitations. On the one hand, the use of standard LDA makes that both LR and HR topics are independent, however this is not a real premise.
In fact, it seems logical to think that semantic patterns should be essentially the same whatever the resolution used to represent them. On the other hand, only remote sensing images were tested what limits the algorithm validation domain.
In the present work, the SR framework is extended and the topic model is revised using more realistic assumptions which leads to an improvement of the SR performance. In addition, this work extends the experimental part with a more comprehensive experimental comparison, adding more relevant methods in the literature and using more and different application domain databases.
The rest of the paper is organized as follows: in Section 2, the proposed sMpLSA model, which is specially designed to SR, is defined. Section 3 presents the extended SR framework based on the proposed topic model. Section 4 shows the experimental part of the work where nine LE methods are tested over three different image databases considering two scaling factors. Finally, Section 5 discusses the results and Section 6 draws the main conclusions arisen from the work.

Sparse Multi-modal probabilistic Latent Semantic Analysis
The starting point of sMpLSA is the asymmetric formulation of pLSA (Fig. 1a), where for each document d a latent topic z is chosen conditionally to the document according to p(z|d) probability distribution and then a word w is generated from that topic according to p(w|z). The proposed sMpLSA model extends pLSA by considering two diverging random variables to manage different vocabulary modalities, that is, w H to represent HR words and w L to represent LR words. Additionally, sMpLSA incorporates a λ factor to guarantee a certain level of sparsity when representing documents in the latent topic space.   Likewise in pLSA, the sMpLSA generative process can be described as follows: (ii) For each one of the N d words in the document d, that expresses documents in topics. (b) Words w H and w L are chosen according to conditional distributions Θ H ∼ p(w H |z) and Θ L ∼ p(w L |z) which express topics in HR and LR words respectively. Note that we use Θ * to refer to Θ H and Θ L .

Model relaxation
In order to alleviate the computational cost of managing two different vocabularies when estimating sMpLSA parameters, we propose to apply the following model relaxation based on three sequential steps: 1. Learning LR training topics (sMpLSA-L): As we can see in Fig-ure 2a, the LR part of sMpLSA corresponds to a sparse pLSA model, therefore parameters Φ tra ∼ p(z|d) and Θ L ∼ p(w L |z) can be initially estimated using pLSA structure over the LR training domain.  Note that this model relaxation enables dealing with the sMpLSA model with a pLSA-order computational cost.

Expectation-Maximisation learning framework
In this section, the three model reductions presented in Figure 2 are formulated. For the sMpLSA-L model, we provide a detailed description of the parameter estimation process. In the case of sMpLSA-H and sMpLSA-tst, we only provide the final expressions due to the similarity of the process.
sMpLSA-L parameters, Φ tra and Θ L , are estimated by maximising the complete log-likelihood using the Expectation-Maximisation (EM) algorithm. First, let us define the likelihood function in terms of the density function of a document collection D, where N represents the LR vocabulary size and n(w L ,d) represents the number of times the LR word w L occurs in the document d. The joint probability p(w L ,d) can be factorised according to the sMpLSA-L model as follows: Note that K represents the number of topics. Inserting Eq. (2) in Eq. (1), we obtain the expression of the complete likelihood: . ( The target is to estimate the Φ tra ∼ p(z|d) and Θ L ∼ p(w L |z) parameters which maximise the complete likelihood function L c , nonetheless multiplicative and exponential factors are hard to optimise. Due to the monotonic nature of the logarithmic function, we can equivalently maximise the complete log-likelihood (Eq. (4)) remaining the optimisation problem as Eq. (5) shows: argmax Φtra,Θ L , Even though the performed simplifications, this expression is still hard to maximise because of the summation inside the logarithm. Taking advantage of the log function properties, we can make use of the concave version of the Jensen's Inequality as follows, As a result, the expression to optimise remains as follows: Following, we introduce the normalisation constraints for parameters p(z|d) and p(w L |z) by inserting the appropriate Lagrange multipliers α and β: Finally, the solution is regularised using the sparsity factor λ to maximise the Kullback-Leibler divergence between the uniform distribution over topics (U ) and the parameter p(z|d): To maximise the above expression we use the EM algorithm which works in For the M-step, we calculate Eq. (9) partial derivatives, set them equal to zero and solve the equations to estimate p(w L |z) (Eq. (10)) and p(z|d) (Eq. (11)) parameters. Note that α and β multipliers can be obtained from the normalization constraint on topics and documents, respectively.
For the E-step, p(z|w L ,d) probabilities can be computed by applying the Bayes' rule and the chain rule as Eq. (12) shows.
The EM process is performed as Algorithm 1 shows. First, p(w L |z) and p(z|d) are randomly initialized. Then, the E-step (Eq. (12)) and the M-step (Eqs. (10)- (11)) are alternated until p(w L |z) and p(z|d) parameters converge.
As convergence conditions, we use a 10 −6 stability threshold in the difference of the log-likelihood (Eq. (4)) between two consecutive iterations or a maximum number of 1000 EM iterations. Following the same procedure, it is possible to deduce the equations for the sMpLSA-H and sMpLSA-tst models. In particular, sMpLSA-H lacks of sparsity regularisation and the Φ tra ∼ p(z|d) parameter is fixed to the estimation provided by sMpLSA-L. Therefore, the M-step and E-step equations for sMpLSA-H remain as Eqs. (13)-(14) show. Note that arguments n(w H ,d), K and p(z|d) are now the input of the EM process and the M-step only estimates Regarding sMpLSA-tst, this model remains essentially the same as sMpLSA-L but fixing Θ L ∼ p(w L |z). As a result, the M-step and E-step equations are given by Eqs. (15)- (16). Besides, the EM process takes n(w L ,d tst ), K, λ and p(w L |z) as input arguments and the M-step only estimates p(z|d tst ).

SR framework based on sMpLSA
Regarding the image characterisation framework, we make use of the Bagof-Words (BoW) approach [35]  In order to super-resolve multi-spectral RGB images, we follow the standard SR procedure based on the YC b C r color space transformation [2]. Initially, input RGB bands are converted to the YC b C r color space. Then, the luminance channel Y is super-resolved and the rest of the components, i.e. C b (black-difference) and C r (red-difference chroma), are interpolated to the target resolution. Finally, the inverse YC b C r transformation is used to generate the super-resolved output.

Topic-space learning
As a LE method, the proposed approach requires a suitable training set in order to learn the relationships between both LR and HR image domains.
Specifically, these relationships are learned following the sMpLSA model relaxation proposed in Section 2.1. First, training LR images I LR are up-sampled to the target resolution using a bi-cubic interpolation asĨ LR and ,subsequently, image patches are characterised as documents. Then, the sMpLSA-L model ( Fig. 2a) is used to obtain the LR topics, Θ L ∼ p(w L |z), and the shared latent topic space, Φ tra ∼ p(z|d), between LR and HR domains. Finally, the sMpLSA-H model (Fig. 2b) can be used to estimate the HR topics, Θ H ∼ p(w H |z), from the HR training images I HR by fixing the Φ tra parameter. Note that the number of topics K and the λ sparsity factor are training parameters when applying the sMpLSA-L model.

Document projection
In this step, the LR input test image I T ST is represented in the previously learnt LR topic space Θ L . Initially, I T ST is interpolated to the target resolution asĨ T ST . Then, documents are extracted following the aforementioned image patch characterisation scheme. Finally, the sMpLSA-tst model (Fig. 2c) is used to estimate the Φ tra ∼ p(z|d) parameter considering Θ L fixed. That is, the EM process (Algorithm 1) takes n(w L ,d tst ), K, λ and p(w L |z) as input arguments and the M-step only estimates the p(z|d tst ) parameter. Note that λ represents the sparsity factor of the Φ tst distribution.

Topic-based SR
The target in this step is to reconstruct an initial super-resolved result I * SR following the sMpLSA model generative scheme. To achieve this goal, we initially provide a guess of the probability distribution p(w L |d tst ) that eachĨ T ST test input patch d tst belongs to the HR vocabulary w H . This estimation can be easily worked out by marginalizing the sMpLSA model over topics as follows, Note that this distribution provides probability values but word-counts are required to reconstruct the super-resolved gray levels values. Therefore, we use the number of words in eachĨ T ST patch, represented by the δ tst prior term, to estimate the output number of words, Finally, we reconstruct I * SR using a Gaussian-like windowing function [36] to alleviate possible misregistration effects when reconstructing the image from nearby overlapping patches. That is, a Gaussian kernel is initially applied to each document (image patch). Then, the corresponding document word-counts

Post-processing
When considering a patch-based learning scheme, each patch is independently super-resolved and this may generate small pixel value discrepancies among patches in the final result, especially when the SR process is not conducted in the original image space. Precisely, this is the case of many manifoldbased approaches and also the case of the proposed approach. In this situation, it is possible to use a final post-processing step [37] in order to guarantee a super-resolved output with the same pixel intensity value range of the LR input image.
As a result, the final stage is a post-processing step, based on the single-

Computational complexity
Regarding the computational cost of the proposed TSR framework, we have to take into account two different complexities: the training cost (Sec. 3.1) and the test computational burden (Sec. 3.2-3.4). Since the latter is the actual cost required to super-resolve LR input images, we focus this analysis just on the test cost. In particular, three different operations are involved: the document projection (sMpLSA-tst), the topic-based SR (Eq. (19)) and the post-processing (Eq. (20)).
According to the standard pLSA model complexity [38], sMpLSA-tst cost is

Experiments
The experiments presented here are aimed at validating the proposed ap-   [37] and (c) PNOA-20, the remote sensing dataset proposed in [40]. We have considered a HR image size of 512 × 512 pixels, therefore datasets have been pre-processed accordingly.
Specifically, Kodak-20 images have been cropped to 512 × 512 pixels, L-20 images have been down-scaled to the considered HR size via the Matlab R2016b imresize 1 function and images from PNOA-20 dataset do not require any kind of pre-processing. Once the datasets' HR images have been created, the Matlab R2016b imresize function has been also used to generate the corresponding LR images according to the considered scaling factors.

Experimental settings
The proposed approach has been validated against 7 different reference LEbased SR methods selected from the literature. In particular, we have chosen for comparison purposes one sparse coding method, VSR [10], two neighbourhood embedding approaches, ANR+ [41] and GLR [11], and four mapping methods, namely CNN [42], JOR [17], SRF [43] and LKR [44]. Additionally, we use the bi-cubic interpolation kernel (BCI) as the baseline assessment method.
All these reference methods have been selected because their implementations are publicly available and besides they tend to introduce some kind of image semantics along the SR process [16]. With the exception of VSR and CNN, which represent the most classical sparse coding and deep learning-based approaches, each one of the tested methods uses a particular scheme to take advantage of the image semantics when super-resolving images. ANR+ and GLR use a correlation-based clustering process over trained dictionary atoms to learn multiple patch embeddings. JOR performs an EM clustering over training patches to learn a different mapping function for each cluster. SFR introduces an 2 -based regularisation term when learning the tree structure in order to grantee similar patches on leaves. LKR uses the k-means algorithm over dictionary atoms to train several kernel regressors.
Experiments have been conducted considering two different scaling factors, 2× and 4×, in order to achieve a super-resolve output with a size of 512 × 512 pixels. For each one of the three considered datasets (Fig. 4), the first sixteen images (from 01 to 16) have been used as a training set and the last four images (from 17 to 20) have been employed as test. Note that threequarters of the data are considered for training, which is a common scenario for hold-out validation in machine learning algorithms. Besides, the use of this configuration over the three considered datasets also guarantees a high data diversity when validating the considered learning-based models. In particular, all the SR methods have been trained for each dataset using a subset of 100,000 patches and their corresponding default settings for their algorithm parameters.
Regarding the proposed approach (TSR), we have followed a similar settings to the ones presented in [34]. In particular, a patch size s = 15, a number of topics K = 1000, a post-processing step with a Gaussian blurring operator (σ = 0.6) together with 100 back-projection iterations and a sparsity factor λ = 1. Note that we use the λ factor to control the entropy of the p(z|d) and p(z|d) tst probability distributions. Specifically, the second term of Eq. 11 and Eq. 15) deactivates the topic-document components (i.e. image patterns associated to a given image patch) with a probability value lower than λ/K.
As a result, λ = 1 allows neglecting the components under the probability of the uniform distribution 1/K which is the most uninformative configuration.
In order to perform the comparison as fair as possible, the number of atoms in sparse coding and neighbourhood embedding methods have been fixed to K = 1000 due to the fact that the number of topics plays a similar role. That is, the K parameter in TSR represents the amount of hidden patterns used to represent the data. Therefore, this value is comparable to the number of dictionary atoms considered in a sparse coding-based approach or to the number of neighbours used in a neighbourhood embedding method mainly because they all define the number of different components considered when super-resolving patches.
In this work, two reference metrics are used to assess the quality of the superresolved images, PSNR (Peak Signal to Noise Ratio) [2] and SSIM (Structural SIMilarity) [45]. On the one hand, PSNR measures the difference between the maximum power of the ground-truth image and the noise appearing in the super-resolved result. On the other hand, SSIM evaluates the correlation, intensity and contrast of the super-resolved image with respect to its groundtruth counterpart. Note that the higher the PSNR and SSIM values, the better the quality of the super-resolved result. Finally, it should be mentioned that a 7-pixel security image border has been discarded when computing these metrics, due to the fact that patch overlapping in the image borders is imprecise because partial neighbour information is not available.

Tables 1-2 present the assessment of the super-resolved test images for
Kodak-20, L-20 and PNOA-20 datasets in terms of the PSNR and SSIM metrics. Specifically, Table 1 contains the results when considering a 2× scaling factor and Table 2 the corresponding results for a 4× factor.
The super-resolution methods used in this work are shown in columns, that is, first the BCI baseline interpolation, subsequently the seven LE-based SR methods extracted from the literature (VSR, ANR+, GLR, CNN, JOR, SRF and LKR) and finally the proposed approach (TSR). In rows, we show for each test image of each database its corresponding SR assessment in terms of the PSNR and SSIM metrics. Note that the last row provides the methods' average computational time.
In addition to the quantitative evaluation provided by the PSNR and SSIM metrics, some visual results are provided as a qualitative evaluation for the tested SR methods. Specifically, Figures 5-6 show the super-resolved results obtained for K19 and P20 test images considering a 2× scaling factor. Besides, Figure 7 and Figure 8 present the results for K20 and L17 test images with a 4× scaling factor

Discussion
The quantitative assessment reported in Tables 1-2 show how the proposed approach is able to achieve a competitive performance in the three considered datasets. When considering a 2× scaling factor ( qualitative results to find out methods' singularities.
According to the visual results presented in Figures 5-8, each SR method tends to foster a particular kind of visual features on the super-resolved output.
Some methods, like JOR or LKR, are able to obtain sharper edges, while others, like VSR or ANR+, seem more robust to noise by generating smoother superresolved textures.
In terms of visual perceived quality, the proposed approach (TSR) achieves a remarkable performance. For instance, the fence detail in Fig. 5(j ) is certainly the most similar to its HR counterpart in Fig. 5(a). Even though the result provided by JOR ( Fig. 5(g)) seems to obtain a slightly better contrast on some parts of the image, the proposed approach is able to introduce more high-frequency information in the fence structure. Another illustrative example can be found in Fig. 6 where it is possible to see that the proposed approach introduces some fine details in the vegetation which are not present in other methods' results.
When considering a 4× scaling factor, the proposed approach also shows its capability to recover high-frequency details, however some other SR methods seem to generate more image contrast. For instance, it is the case of the result provided by JOR in Fig. 7 shown in Fig. 7.(j ), we can see that edges are not so contrasted but the aliasing distortion is slightly reduced while new high-frequency information of the stripe pattern is recovered. A similar behaviour can be observed in the window detail of Fig. 8. In this case, the proposed approach (Fig. 7.(j )) seems to recover the vertical pattern of the window better than JOR (Fig. 7.(g)).
Regarding the computational time, we can observe important differences among the tested methods. In particular, four algorithm groups can be identified when super-resolving LR input images: (i) BCI and ANR+, with an average time consumption per image under 3 seconds, (ii) GLR CNN and SRF, with a time between 10 and 60 seconds, (iii) LKR, which require between 60 and 120 seconds and (iv) VSR, JOR and TSR, with a computational time between 300 and 500 seconds. The proposed approach is definitely not one of the most computationally efficient methods, however it is able to obtain a computational cost similar to that of JOR which has shown to be one of the best methods.
Finally, another noteworthy point is related to the use of the post-processing step. As it was mentioned in Sec. 3.4, the proposed approach is able to take advantage of the IBP process in order to relieve some possible pixel value deviations generated in the n(w H |d tst ) estimation. In particular, the PSNR gain obtained by TSR when using the specified post-processing step is, on average, 0.05 dB. Additionally, we have also tested that the JOR method does not obtain, on average, a performance improvement when using such process.

Proposed approach advantages and limitations
When comparing the TSR results to the ones obtained by the other semanticbased SR methods, we can observe the proposed approach potential. Even though the straightforward clustering approach is the most extended way to introduce image semantics in the SR process, the effectiveness of this approach is limited by the intra-cluster semantic variability. Note that a clustering process naturally tends to group similar patches within the same cluster. However, the inherent information loss in the LR domain may produce that two patches related to two completely different semantic concepts become part of the same cluster.
In order to overcome the above mentioned limitation, the proposed approach works by super-resolving latent patterns instead of image patches themselves.
That is, the SR process is driven by the mixture of latent patterns appearing in the LR input image and this allows TSR to recover a richer variety of highfrequency patterns for a given LR patch. In a sense, the proposed method provides a more flexible scheme than the current semantic-based SR techniques because LR patches are allowed to have simultaneously multiple SR paths through the latent patterns defined by topics and therefore more HR patterns can be involved in the SR process.
However, this higher flexibility has a main implication: a blurring effect may appear if too many HR patterns are involved. In order to reduce this possible effect, we introduce the λ sparsity constraint to control the number of considered HR patterns when super-resolving LR images. In spite of this, it may be difficult to find the ideal sparsity factor because it logically depends on the input image features as well as the considered scaling factor. In this work, we assume a constant λ factor to define the sMpLSA model but further research could be directed to this extent.

Conclusions and future work
In this work, we presented a topic-based SR framework in order to superresolve LR images according to the semantic patterns encapsulated by the latent topic space. Specifically, we initially define the sMpLSA model and then we used this model to super-resolve LR images by super-resolving latent topics instead of image patches themselves. Finally, we conducted an experimental comparison over three different image datasets to show the proposed approach performance with respect to different reference LE-based SR methods available in the literature.
One of the main conclusions that arises from this work is the potential of topic models to cope with the SR problem because of their capabilities to manage data semantics. Whereas the common SR trend relies on using a clusteringbased process in the image patch representation to define the image semantics, we proposed to transform this classical perspective into a new probabilistic approach where the SR process can be performed using the semantics encapsulated by the sMpLSA model in the latent topic space.
According to the conducted experiments, the proposed approach obtains a competitive performance over the three considered databases in terms of both quantitative and qualitative results. Regarding the SSIM and PSNR metrics, the SR framework proposed in this work obtains, on average, a similar performance to the one obtained by the mapping approach JOR. Besides, it is able to outperform the rest of the tested methods. Considering the visual results, the proposed approach has shown to be one of the most effective methods especially when considering a 2× scaling factor.
Although the proposed approach results are encouraging as a semantic-based SR technique, it still has some limitations which provide room for improvement to conduct more research on topic-based SR. Specifically, future work is aimed at the following directions: (i) a sMpLSA extension to estimate the ideal sparsity factor for each input patch, (ii) automatic procedures to set the most appropriate number of topics and (iii) extending the proposed SR framework to a hybrid approach by exploiting the redundancy property over image scales.