Sentinel-2 and Sentinel-3 Intersensor Vegetation Estimation via Constrained Topic Modeling

This letter presents a novel intersensor vegetation estimation framework, which aims at combining Sentinel-2 (S2) spatial resolution with Sentinel-3 (S3) spectral characteristics in order to generate fused vegetation maps. On the one hand, the multispectral instrument (MSI), carried by S2, provides high spatial resolution images. On the other hand, the Ocean and Land Color Instrument (OLCI), one of the instruments of S3, captures the Earth’s surface at a substantially coarser spatial resolution but using smaller spectral bandwidths, which makes the OLCI data more convenient to highlight specific spectral features and motivates the development of synergetic fusion products. In this scenario, the approach presented here takes advantage of the proposed constrained probabilistic latent semantic analysis (CpLSA) model to produce intersensor vegetation estimations, which aim at synergically exploiting MSI’s spatial resolution and OLCI’s spectral characteristics. Initially, CpLSA is used to uncover the MSI reflectance patterns, which are able to represent the OLCI-derived vegetation. Then, the original MSI data are projected onto this higher abstraction-level representation space in order to generate a high-resolution version of the vegetation captured in the OLCI domain. Our experimental comparison, conducted using four data sets, three different regression algorithms, and two vegetation indices, reveals that the proposed framework is able to provide a competitive advantage in terms of quantitative and qualitative vegetation estimation results.

Regarding the spatial resolution of the sensor, OLCI has global resolution requirement of 300 m.
Although S2 and S3 missions have been designed to provide global data products of vegetation, soil and water cover, inland waterways and coastal areas, the spectral and spatial differences between MSI and OLCI sensors make each satellite more suitable for a particular application field. Whereas the higher spatial resolution in S2 enables the use of its products for characterization tasks, with the requirement of a high level of spatial details such as soil mapping or land use classification [5], S3 is able to capture imagery using smaller spectral bandwidths, which makes the OLCI data more convenient to highlight specific spectral responses that represent different features over the Earth's surface. Specifically, vegetation cover can exemplify this point [6]. In general, vegetation indices, such as the Normalized Difference Vegetation Index (NDVI) [7] and the Soil-Adjusted Vegetation Index (SAVI) [8], seek to exploit the correlation between the maximum chlorophyll absorption wavelength and the Red-Edge electromagnetic spectrum.
As a result, the smaller VNIR spectral bandwidth of the OLCI sensor makes that fewer wavelengths are involved in the NDVI and SAVI computations. This fact generates an enhanced response for plant surfaces which eventually increases the instrument sensitivity to detect those image areas with certain types of vegetation [4]. Precisely, these inter-sensor differences motivate the development of fused vegetation products to exploit MSI spatial resolution and OLCI spectral features.
In the literature, different kinds of regression algorithms have been successfully applied to conduct biophysical parameter estimations within the context of Sentinel missions. Specifically, Verrelst et al. [9] review several state-of-the-art machine learning regression algorithms for S2 and S3 satellites, and Caicedo et al. [10] assess multiple linear and nonlinear regression algorithms with a range of remotely sensed data. Despite the value of these and other related works, the regression process is often conducted from a single-sensor perspective and, usually, they only consider simulated Sentinel data [11]. This letter is focused on a more general objective, where S2 and S3 operational products are combined to generate fused vegetation maps with MSI spatial resolution and OLCI spectral characteristics. Specifically, the proposed model ( Fig. 1) considers two diverging hidden random variables, i.e. c and z, to represent constrained-topics and standard-topics, respectively.
Note that N d is the number of words in d, M is the total number of documents in the collection, and shaded nodes represent the observable variables in the model, by analogy with the document analysis application field [13]. For the M-step, we calculate CpLSA likelihood partial derivatives, set them as equal to zero, and solve the equations in order to obtain Eqs. (3)- (6), where n(w, d) represents the number of times the word w appears in the document d. The EM process is performed as follows. First, p(w|c), p(w|z), p(c|d) and p(z|d) are randomly initialized. Then, the E-step

B. Inter-Sensor Vegetation Estimation Framework
The proposed S2 and S3 inter-sensor vegetation estimation framework is made up of a two-step process (see Note that the conditional probability distribution p(c|d) defines how image pixels are described by the target S3 vegetation map, p(w|c) represents the reflectance patterns that generate this map and p(w|z) contains the rest of the patterns that can be considered noise from a vegetation-based perspec- 2) CpLSA-tst: Once the Φ parameter has been estimated, the proposed model is again applied to infer the output vegetation map at S2 spatial resolution with the S3 spectral properties. That is, CpLSA is used over I 2 by fixing the Φ parameter in order to generate Θ 1 ∼ p(c|d) and the resulting vegetation map as E 2 = p(c|d).

A. Datasets
In this work, four pairs of S2 MSI and S3 OLCI data products have been selected (Table I) (7)- (8) show, to unify their corresponding value ranges for assessment purposes.    These results are supported by the conducted statistical analysis. In particular, the Friedman's test (Table IIIa) ranks