Optimization of electronic nose drift correction applied to tomato volatile profiling

E-noses can be routinely used to evaluate the volatile profile of tomato samples once the sensor drift and standardization issues are adequately solved. Short-term drift can be corrected using a strategy based on a multiplicative drift correction procedure coupled with a PLS adaptation of the component correction. It must be performed specifically for each sequence, using all sequence signals data. With this procedure, a drastic reduction of sensor signal %RSD can be obtained, ranging between 91.5 and 99.7% for long sequences and between 75.7 and 98.8% for short sequences. On the other hand, long-term drift can be fixed up using a synthetic reference standard mix (with a representation of main aroma volatiles of the species) to be included in each sequence that would enable sequence standardization. With this integral strategy, a high number of samples can be analyzed in different sequences, with a 94.4% success in the aggrupation of the same materials in PLS-DA two-dimensional graphical representations. Using this graphical interface, e-noses can be used to developed expandable maps of volatile profile similitudes, which will be useful to select the materials that most resemble breeding objectives or to analyze which preharvest and postharvest procedures have a lower impact on the volatile profile, avoiding the costs and sample limitations of gas chromatography.


Introduction
The objective evaluation of flavor in crops such as tomato is expensive and time-consuming; consequently, this trait has been usually disregarded. Today, it is known that one of the main factors under the loss of flavor relies on the loss of alleles related to the contents of aroma volatiles [1], and the use of delayed ripening genes that alter the aroma profile, an effect that depends on the genetic background [2]. Additionally, tomato flavor can also be altered by the preharvest and postharvest management of the crop that also alter the production of volatiles [3][4][5][6].
In order to satisfy the demands of high-quality markets, it would be necessary to include flavor evaluation, and especially the volatile profile, during the development of breeding programs [7], cultivation, and postharvest procedures. In this context, the use of trained panelists or the precise volatile quantifications by gas chromatography-mass spectrometry is discarded considering that these evaluations are too expensive and time-consuming and, consequently, not adequate to evaluate a high number of samples.
As an alternative, electronic noses (e-noses) were designed to evaluate the volatile profiles of agricultural products [8]. For this purpose, they have been usually applied to classify materials considering their quality characteristics, their origin, the variety or the presence of diseases, additives, adulterations, and off-flavors in different fruits and vegetables (tomato, kiwifruits, peach, nectarine, apple, banana, persimmon, grape, watermelon, strawberry, blackberry, onion, potato, pumpkin, broccoli, etc.), grains (wheat, rice, maize, peanuts, etc.), aromatic and medicinal plants (tea, coffee, saffron, cocoa, oregano, ginseng, etc.), processed products (oils, juices), livestock and poultry meat, and fish [8][9][10][11][12]. Most of these applications were modeled and tested in a short-term scenario, using a limited number of samples. However, the application of this technology to the evaluation of materials in breeding programs and food industry makes it necessary to assure the capability to process a high number of samples in the same day, as well as being able to compare them with data obtained in previous assays. By doing so, it would be possible to apply e-noses to selection and quality control programs, in which each new sample is compared with reference values or fingerprints obtained in previous assays with elite materials grown and handled in ideal conditions. From this point of view, the objective would not be centered on classifying a new sample, but to have an idea of its distance to elite reference samples. Consequently, it would be possible to select the best individuals or those preharvest or postharvest procedures that minimize their impact on the volatile profile.
In order to take advantage of the capabilities of e-noses, it would be necessary to overcome the effects of sensor drift. This phenomenon is defined as temporary or gradual changes in one or some sensor properties which causes distorted response measures and reduces the validity of the electronic fingerprints. It is inevitable and caused by complex and dynamic processes, such as changes in room environmental conditions (temperature or humidity), changes in the composition of measured samples (component interactions), and instrument operational disturbances (sensors thermal and memory effects, aging or poisoning) [13,14]. These changes can be noticed in both signals within a work sequence (short-term drift) and signals obtained in different work sequences (long-term drift). The improvement of sensor technology at the manufacturing stage to enhance its stability over time has contributed to reduce these problems. However, despite the advances obtained, a regular calibration is still required to limit the effects of sensor drift. It can be performed using external standards and statistical multivariate calibration models. Nonetheless, multivariate calibration requires a large number of samples and frequent re-calibrations of the sensor arrays and this would limit the number of new samples analyzed. Therefore, a new model calibration transfer or update and signal standardization using only a small number of reference samples would represent an interesting solution to keep the system operative for long periods [14].
In the last two decades, an enormous research effort has been made on different methodologies aimed to properly process signals and data from e-noses (reviewed by [13][14][15]). Nevertheless, it seems clear that, despite the high amount of research on drift correction and calibration update methods developed, these proposals were not routinely used, except for component correction and directed standardization methods. The best solutions proposed up to now rely on the analysis of a high number of samples to develop robust models or use simple volatile mixes. These approaches are distant from the real context of tomato evaluation. This species has a complex volatile profile with more than 400 compounds, with nearly 30 of them playing an important role in tomato aroma perception [2]. On the other hand, the need to develop models with a high number of samples would not be realistic in high-throughput evaluations, as the models would have to be recalculated each time a sensor has to be changed.
In this context, although the use of commercial electronic noses for the evaluation of volatile profiles has a huge potential, it is necessary to develop an operating methodology enabling the routine evaluation of wide collections of real samples. This is, in fact, the aim of this paper, to propose a practical methodology to correct drift within and between sequences, using a minimum number of samples to calibrate the models and a tomato-like complex synthetic reference mix to standardize sequences. Finally, the development of long-term expandable partial least squares discriminant analysis (PLS-DA) graphical maps of e-nose volatile profiles is proposed as a valuable tool to enable the routine evaluation of the volatile profile of new samples, analyzing the relative distance to reference points.

Plant material and tomato-like synthetic standards
Tomato-like synthetic standards were developed to obtain a synthetic mixture of main volatile compounds of an average real tomato sample, but with higher stability and reproducibility. For this purpose, a high-concentration standard mixture was prepared (TomSSt_4), containing 30 individual volatile compounds at concentrations (Table 1) corresponding to the mean values of representative tomato cultivars with different aromatic profiles [16]. Three alternative standards were obtained diluting TomSSt_4 to 70% (TomSSt_3), 50% (TomSSt_2), and 30% (TomSSt_1). The dilutions were obtained to cover a wide range of volatile sample concentrations. TomSSt_2 was employed as a reference sample for intersequence standardization in long-term drift correction. These working solutions were prepared by volume dilution from more concentrated stock solutions which were stored in a freezer at −30°C in sealed vials. They have an established stability of 1 year for the main stock solutions (around 500 ppm or higher) and of 1 month for the ppb to sub ppm solutions. As preparation of synthetic standards is carried out by dilution in volume of stock standards, this process can be reliably and reproducibly performed producing adequate standard solutions in the routine laboratory. For sequences run in different months, the specific standard mixtures were prepared de novo to provide restrictive conditions.
Tomato varieties evaluated in this work represented a wide diversity of fruit shapes, colors, genotypic structures (commercial hybrids and landraces), and origins ( Table 2). The plant material included four commercial hybrids, "Zayno RZ", "Divyne RZ", "Vinchy RZ" (Rijk Zwaan Iberica, Almería, Spain), and "Caramba" (De Ruiter Seeds, Almería, Spain); four experimental tomato breeding lines (UJI008, UJI011, UJI014, and UJI028) with different fruit sizes; one cherry tomato type accession (BGV004587); and five accessions of local landraces, UJI023 of "de penjar" landrace, BGV005477, accession of a "Morado" landrace, BGV005651, an accession of "Muchamiel" landrace, BGV005718, an accession of "Amarillo" landrace, and BGV005655, an accession belonging to the "Valenciano". The "de penjar" landrace carries with alcobaça, alç, longlife mutation allelic to the nor gene [17] and it results in a very specific aroma volatile evolution [18], "Morado" landrace has external pink color due to the transparent peel typical of the yellow, y, mutation which alters the synthesis of polyphenols, and "Amarillo" has yellow flesh color typical of the impairment of carotenoid synthesis resulting from the presence of the yellow-flesh, r, mutation (reviewed by [19]) and it, therefore, affects the synthesis of apocarotenoid volatiles.
UJI accessions were obtained from Universitat Jaume I and BGV accessions from the genebank of the Instituto

Experimental design
Three different assays were performed. In the first assay, 18 samples with different compositions were used. These samples included real tomato samples from 14 varieties obtained homogenizing whole fruits (Table 2) and the four tomato-like synthetic standards (TomSSt) with variable volatile composition. Three sequences were run on different days. Each sequence included four specific varieties (that were included only in one sequence) and two varieties that were included as controls in the three sequences. The 4 tomato-like synthetic standards were also included in all the sequences. Tomato samples were replicated 7 times and tomato-like synthetic standards 4 times in each sequence. All the samples were randomly distributed in each working sequence.
For a deeper study of the short-term drift, a second assay was designed to include a higher number of repetitions (12) per sample. Two consecutive long work sequences (22 h each) were planned to test seven tomato and one tomato-like synthetic standard (TomSSt_2). All the samples were also randomly distributed within the first replicate of each sequence, and the order was maintained in the rest of the replicates. This design provided data to compare the performance in a whole sequence (12 repetitions/sample in 22 h) or a short sequence (4 repetitions/ sample in 8 h approximately) to test the performance of the drift correction strategy proposed in different scenarios.
Finally, a third assay was performed to analyze the effect of long-term drift. To ensure the inclusion of long-term drift in the signal responses, the sequences of this trial were carried out in a 3-month period (one sequence per month) included in the normal routine usage of the equipment. During this period, other samples from tomato and other vegetable crops were analyzed in the equipment. The short-term drift correction was applied before analyzing the results.
In a first step, the effect of long-term drift was analyzed using the four tomato-like standard solutions in three sequences. Then, the effect of long-term drift was also checked adding two tomato varieties analyzed in three sequences. Long-term drift correction via sequence standardization was then applied and its validity checked. The independent study of each one of these three sequences was used to test and correct short-term drift within a work-day sequence. The joint data of all these sequences were used to test the performance of the long-term drift correction between sequences and standardization strategies proposed in this work.
Once the reliability of the long-term drift correction had been checked, it was applied to analyze the data obtained from the analysis of the 14 tomato varieties distributed in three sequences, using the data from TomSST2 for sequence standardization.

Electronic nose and data acquisition
A FOX 4000 (Alpha MOS, Toulouse, France) e-nose system was used. The system included 18 metal oxide semiconductor sensors (MOS) installed in three chambers, an autosampler system (CombiPAL HS100, CTC Analytics, Zwingen, Switzerland), and a software package (AlphaSoft v11) to control and process initial data. The sensor response in MOS sensors is a resistance variation due to a reaction caused by the chemical species on the surface of the active layer of the sensor. As usual for MOS sensors, the signal was expressed as normalized resistance variation of the signal highest point ((R i -R max )/R i ), where R i is resistance at time zero and R max is resistance in the signal highest point of the sensor [20].
The analysis parameters related to general aspects of equipment operation were fixed following manufacturer recommendations, while those that directly determine the response quality (influence headspace generation) were established from previous tests based on the methodology for the analysis of tomato aroma developed by the group [16]. For each sample, 2 g of homogenate (2 mL in the case of tomato synthetic standards) was introduced into a 1 0 -m L v i a l a n d s e a l e d . E a c h s a m p l e r e p l i c a t e corresponded to an independent vial. Samples were incubated in the autosampler at 45°C for 10 min to generate the headspace and then 2 mL of it was injected into the sensor chambers for analysis. The sensors' response was recorded over 2 min with 18 min between each measurement to allow the baseline recovery. Between samples, dry clean synthetic air flowed over the sensor array for 2 min to remove residues of the previous sample, following manufacturer recommendations. The gas flow rate was 150 mL min −1 . Instrument maintenance (daily auto test and 2-week diagnosis) was routinely performed following supplier protocols to ensure proper operation. Drift correction and inter-sequence standardization A multivariate adaptation of the multiplicative drift correction procedure proposed by Salit and Turk [21], combined with a partial least squares (PLS) adaptation of the component correction strategy [22] to model time-dependent drift, was used both to remove intra-sequence short-term drift and to perform inter-sequence standardization to counteract long-term drift. The Salit and Turk method is based on an interpolative projection of sample signal onto a smooth function defined by fitting to signals from regularly interspersed standards. Component correction strategy is based on the assumption that there is a subspace direction that captures only the drift variance and can be modeled (they use principal component analysis) and subtracted from the measurement matrix X to provide drift-corrected signals. Two assumptions were considered: (i) drift, regardless of its type, is a function of time, and (ii) drift for our electronic nose instrument is multiplicative (i.e., the magnitude of the perturbations is dependent on the signal level). Additionally, it had to be considered that the nature of the samples being analyzed could not be contemplated by the model, as they were unpredictable. A practical guide of our proposed intra-sequence drift correction methodology is included in Fig. S1 (see Supplementary Information (ESM)). According to [21], when multiplicative drift appears, the signal measured in a sample i evaluated with j repetitions in each of the k sensors of the system (S i j ð Þ ;k measured ) could be decomposed as: being S i truth the true signal for sample i, E drift (t) the drift estimation as a function of time, and E noise the estimation of the background noise (independent of time). S i truth can be estimated using the mean of all b S i;k measured: Then, the multiplicative deviation pretreatment for each measured signal ( To estimate time-dependent drift, a multivariate PLS regression between the pretreated signal measurements for all system sensors as independent variables (X matrix) and the time of analysis as a dependent variable (Y vector) was performed. As the PLS drift model finds latent variables that explain the variability in the deviation of electronic signals due only to time evolution, this model function provides the estimate of E i, k drift (t) and the residuals of these model provide the estimation of E i, k noise . Accordingly, as proposed by [22], after the drift model was fitted, the matrix product of resulting loadings and scores of the model was used to calculate the matrix of E drift (t) components. Then, the initial signal measured values were corrected for multiplicative drift using the following equation from [21]: A similar strategy was used to perform inter-sequence standardization to correct long-term drift. A practical guide is included in Fig. S2 (see ESM). For different work sequences, a generalization of Eq. (1) was considered to decompose signal measured in a sample i evaluated with j repetitions in each of the k sensors of the system. This generalization assumes that in this case the truth signal can be estimated using two components, the mean of all S i j ð Þ repetitions and an inter-sequence standardization coefficient. To calculate this inter-sequence standardization coefficient, the difference of the signals of the same reference sample measured in two different sequences was used. The tomato-like standard TomSSt_2 was used as a reference sample in all work sequences.
Consequently, the multiplicative deviation pretreatment used for each measured signal was: where b S TomSSt1;k and b S TomSStn;k are the signal means of all repetitions for the tomato-like synthetic standard reference sample in sequences 1 and n, respectively, for each k sensor. The generalization of Eq. (1) also assumes that, when considering several work sequences, the time-dependent drift can be decomposed in two components: where E short (t) represents the short-term (between-sample within-run) signal drift and E long (t) the long-term (betweenrun) drift. Inter-sequence standardization was applied to all sequences after short-drift correction. Doing that, timedependent drift would be equivalent to the long-term drift that appears between sequences. Consequently, after applying pretreatment of Eq. (3) when drift was modeled by PLS regression as explained previously, it was possible to calculate the matrix of E long components and to use it to standardize sequence signals applying Eq. (4). The PLS regressions were performed using venetian blinds (with as many groups as samples evaluated) as a resampling procedure, in order to calculate error models and to select the number of latent variables used in the model. Outliers were detected and removed, using Hotelling T 2 and Q residuals [23].

Graphical maps and data analysis tools
Drift-corrected sensor signals were graphically plotted in a 2D PLS-DA scatterplot map; as with this dimensional reduction representation technique, the distance between projected points preserves sample similarities [24]. Confidence ellipsoids (p = 0.05) were calculated and plotted for samples with more than four replicates. In some cases, after removing outliers, there were not enough points to calculate these intervals, and data points were just linked with lines to provide rapid identification of groups. The closer the points, the higher the similarity between signals. This procedure enables the comparison of sample volatile profile similarities, for example, for selection purposes. The objective was not to classify samples in predefined groups. This would be a typical objective in a quality assurance control, but in breeding programs, the objective is to select those materials closer to specific volatile profile targets. Nevertheless, to assess the performance of the proposed drift correction strategy, classification results were compared with those obtained using other reputed drift correction methods: the original method proposed by Salit and Turk [21], independent component a n a l y s i s ( I C A ) , a n d p a r a l l e l f a c t o r a n a l y s i s 2 (PARAFAC2) [25]. ICA is a signal processing method that separates a multivariate signal into additive subcomponents assuming that the subcomponents are non-Gaussian signals and that they are statistically independent from each other. PARAFAC methods are generalizations of the principal component analysis (PCA) to higher order arrays. PARAFAC2 is an improvement of the original PARAFAC method in which the strict trilinearity is no longer required. Compared with PCA methods, PARAFAC methods have the advantages of no rotation problem, as in PCA, easier to interpret, and higher statistical robustness.
Once the correction was obtained, three frequent classification techniques were applied. K nearest neighbors (KNN) classification, soft independent modeling of class analogy (SIMCA), and discriminant analysis based on partial least square regression (PLS-DA) [24]. KNN is a non-parametric classification method in which a data point is assigned to the class most common among its k nearest neighbors. SIMCA classification is mainly based on principal component analysis and an object is assigned to a class if its residual distance is below the statistical limit for the class. In PLS-DA, the predictive modeling comprises two main procedures, a PLS component development (i.e., dimension reduction for selecting variables for classification) and a prediction model construction (i.e., discriminant analysis) to predict class assignment for the data.
KNN, SIMCA, PLS and PLS-DA, analysis, and graphics were performed using PLS_Toolbox v 8.6 (Eigenvector Research Inc., Wenatchee, WA, USA) for Matlab v 9 (MathWorks Inc., Natick, MA, USA). ICA models [26] were calculated with the FastICA toolbox for Matlab developed at the Helsinki University of Technology. PARAFAC2 models were performed using a graphical user interface, SENSABLE [20].
To justify the need for standardization procedures, tests for significant differences between the same sample signals in different sequence work using the MANOVA and Roy test were used [24]. These analyses were performed using IBM SPSS v.24 (IBM Corp., Armonk, NY, USA).

Short-term drift correction
In the first assay, high levels of short-term drift were observed leading to a high variation in the position of each sample replicate in the two-dimensional representation of the PLS analysis obtained with raw signals (Fig. 1a-c). This variation could be related to a possible lack of homogeneity of real tomato samples, but a considerable variability was also detected in tomato-like standards which are highly homogeneous. As a consequence, despite having a different aroma volatile profile, the confidence ellipsoids of each variety overlapped. Thus, it was impossible to discriminate the materials. This effect of short-term drift was detected in the three independent sequences tested, but it affected each sequence differentially. As an example, the confidence ellipsoid of the tomato-like standard TomSSt_2 was small and data points plotted close in the first sequence (Fig. 1a), while the ellipsoid was considerably wider in the second (Fig. 1b) and third (Fig. 1c) sequences. The contrary was observed in the case of TomSSt1, with higher variability in the first sequence and lower in the second and third. As the samples were randomly distributed for each replicate in the sequence, the differences observed in confidence ellipsoids suggest that the effect of drift changes between sequences. This spurious trend confirmed the difficulty of extrapolating short-term drift effects on different analysis sessions.
The effect of sequence duration on short-term drift was analyzed in-depth comparing the performance of long (22-h) and short (8-h) sequences using 8 samples, including 7 tomato varieties and one tomato-like standard ( Table 2). This time, samples were randomized in the first replicate, but the order was maintained in the rest of the replicates to enable comparisons between varieties. The long sequence (22 h), typical of situations where a high number of samples is to be analyzed, was obtained increasing the number of repetitions per sample up to 12. Raw sensor data from these analyses revealed, for all the samples and in all the sequences, the presence of an important drift effect that affected all the sensors. The drift affected differentially each variety, with the highest effects detected in the samples of the variety "Caramba" (Fig. 2).
These drift effects were more evident and important at the end of the sequence (Fig. 2a), showing a complex non-linear time-dependent variation, with positive and negative signals tending to converge to 0. In the case of "Caramba" samples, raw signals (12 repetitions distributed in a sequence of 60 analysis) showed a very high relative standard deviation (%RSD) for the complete sequence for all the sensors (Fig.  2b, first data in parenthesis).

a b
c d e f Fig. 1 Similarity PLS-DA maps of volatile electronic profiles from raw signals (on the left) and short-term drift-corrected signals (on the right) data from samples tested in three different work sequences assayed in the first assay. Sample codes as indicated in Table 2. TomSSt, tomato-like synthetic standard. Ellipsoids represent confidence intervals (p = 0.05) for samples with more than three replicates. For samples with a lower number of replicates, confidence intervals cannot be calculated and sample points are only connected with lines to easily identify them In order to provide a reference, these values obtained with "Caramba", were compared to those obtained by Xu et al. [27] corresponding to 6 analyses with the same apparatus equipped with the same sensors (Fig. 2b in square brackets). %RSD values obtained in the present work were considerably higher. Thus, the use of long sequences such as these would be unacceptable. It should be considered though that the material used by [27] was Semen arecae, a dried seed preparation from Areca catechu L. Therefore, differences in %RSD would be explained by changes both in the sample matrix and in the number of hours of work of the sensors per sequence.
When shorter sequences were considered (8 working hours, i.e., 4 "Caramba" samples distributed in a sequence of 18 injections), the drift levels were lower, but they continued to be excessive (Fig. 2d, first data in parenthesis).
The main factors contributing to e-nose drift effects in sensor performance are usually due to differences in temperature, humidity, changes in samples analyzed due to components interactions, or other uncontrolled effects. In the long term, the stability of MOS sensors could progressively be affected by sensor aging or poisoning affecting their performance. This includes changes in the morphology of the sensing layer and irreversibly binding of some sample compounds to metal oxides which diminish the catalytic oxidation of sample volatiles and affecting the sensors' resistance response [14]. In practice, the data distortion caused by sensor drift in short time scenarios (one or few work sequences) has many times been avoided when the use of the data collected was strictly for classification purposes. In these cases, the use of advanced multivariate statistical classification methodologies makes it possible to obtain subjacent information from the raw signal characteristic of each sample group, discarding the rest of the signal information and thus diminishing the drift distortions problems (see examples in [28][29][30]). Unconsciously, when using a multivariate classifying technique, the analysis identifies and, to preserve sample group characteristic information, it discards the "non-characteristic" part of the raw sensor signal which is normally related to noise, drift, and other nonrelevant information. Nevertheless, this "signal cleaning" is a collateral effect (unwanted effect) and, consequently, the success of this strategy is variable since the characteristic subjacent information of the group is highly dependent on the samples and the number of latent variables used to build up the classification model. When a reduced number of samples with important differences between them are evaluated or when the volatile composition of the samples is not complex, the "signal cleaning" effect would work well, making it possible to classify the samples in a quite satisfactory way [31].  Legends (b, d) show the evolution of sensor signals %RSD before and after applying intra-sequence drift correction (first and second value in parenthesis). External reference %RSD values using the same equipment and 6 injections are provided in square brackets [27] But, with this approach, it is not always possible to completely avoid drift distortion effects. It would be the case of complex samples (complex matrix and/or very complex mixtures of volatiles) or collections of samples with similar volatile profiles. Consequently, a drift correction strategy would be more convenient in those cases. In order to correct short-term drift effects, sensor drift of the second assay was modeled and subtracted from the raw signals. To do that, a multivariate adaptation of the multiplicative drift correction procedure proposed by Salit and Turk [21] combined with a PLS adaptation of the highly used component correction strategy [22] to specifically model each drift present in each sequence was performed. The following assumptions were considered: (i) sensors of the array have similar drift behavior, (ii) this drift has a specific direction in the data hyperspace which allows its modelization by regression, and (iii) this drift is time-dependent. After modeling short-term drift for each sequence, drift components for each signal in the data matrix were calculated. Later, matrix subtraction was performed in a Matlab environment to remove drift from the raw sensor signal data, thus providing a corrected sensor data matrix, which was used to plot the data (Fig. 2e). Compared with the raw sensor signals (Fig. 2a), the corrected signals were much more stable during the whole sequence for all sensors, even those with higher %RSD. Accordingly, an impressive %RSD decrease was observed for all the sensors (Fig. 2b), ranging from between 91.5 and 99.7% for long sequences and between 75.7 and 98.8% for short sequences. Maximum %RSD values were 0.65% for long sequences and 0.72% for short sequences. Those values were between 1 (T40/2 sensor) and 27 (LY2/LG sensor) times lower than those reported by Xu et al. [27] with a lower number of injections. As expected, the use of shorter work sequences (18 injections in 8-h sequence) resulted in better performance after drift correction (Fig. 2d, f), as it avoided the higher levels of drift detected at the end of long sequences.
It should be considered though that the increase in stability entailed a small decrease in the absolute value of signals after correction. This side effect mainly affects long sequences (Fig. 2a vs. Fig. 2e), while this decrease is imperceptible in shorter sequences (Fig. 2c vs. Fig. 2f). Consequently, despite the powerful short-term drift correction capabilities obtained, it would be preferable to use short (8 h) work sequences.
When this drift correction strategy was applied to the signals of the first assay, an impressive reduction of the sample signal variability was attained, enabling a clear comparison of similitudes between samples in the new PLS-DA similarity map obtained (corrected: Fig. 1d-f vs. raw: Fig. 1a-c). Indeed, after this correction, it was easy to ascertain similarities in the volatile signal profile between samples, and the confidence intervals did not overlap as it had happened with the raw data.
The use of similitude maps to compare volatile profiles is a novel alternative. Therefore, in order to compare this strategy with previous works, it was necessary to assess its performance using classification methodologies, which are rather popular in e-nose preceding literature. Consequently, using the data of the second assay, the new drift correction strategy was compared with alternative drift correction methods including the original approach by Salit and Turk, [21], ICA [25], using KNN, SIMCA, and PLS-DA as classifying methods. In general, SIMCA outstood in the classifications. KNN and PLS-DA had a similar performance, which varied depending on the variety considered (Table 3). Considering different alternatives, the new correction proposed in this work offered the best results (classification effectiveness) compared to the alternatives evaluated independently of the classification method. In fact, SIMCA and KNN classification with the proposed short-term drift correction allowed to classify correctly a 100% of the samples, assigning them to the variety to which they belonged.

Long-term drift and sequence standardization
Once the problem of short-term drift was solved, the focus was set on the effects of the variability detected among sequences. This variability, as stated above, can be generated by different causes originating a long-term drift effect. A solution to this effect is critical when a high number of samples are to be analyzed, as samples have to be distributed in different sequences that would be run on several days.
Regardless of the cause of inter-sequence variability, the effects can be considerable and unpredictable, as was pointed out in the comparison of the three sequences of the first assay. Consequently, it seemed clear that some reference samples should be included in each sequence to assess how longterm drift affected the signal. At this point, it would not be advisable to use real tomato samples as references. The storage capability of these samples would be limited, and longterm evolution in a freezer would introduce an undesirable noise in the system, thus increasing long-term drift. Accordingly, it was decided to include tomato-like synthetic standard volatile solutions, which were designed and created for this purpose. As tomato volatile profile is highly complex, with more than 400 volatiles being involved, it was decided to focus on a group of compounds ( Table 1) that had been suggested to hold a prominent role in the aroma perception [32,33]. Standards were created from stock solutions for each session. Nonetheless, in the future and for practical reasons, standards can be created and stored in sealed vials at −30°C for 1 month with a high stability. In this case, over a 3-month span, the standards were created specifically for each session, thus providing more restrictive conditions.
In a first step, three different sequences with the tomatolike standards at different concentrations were run. After applying the proposed short-term drift correction, a PLS-2D map was obtained (Fig. 3a). Samples from the same tomatolike standard tended to group together, but still, a considerable level of variation was detected. In some cases, the confidence intervals of the same samples run in different sequences did not overlap and intervals of different standards did overlap in one case. Considering the homogeneous nature of these standards, this variability would not be mainly related with the nature of the sample. To check this point, the analysis was repeated including samples from two tomato varieties "Rayno RZ" and "Amarillo". Again, wide variability was detected, which was not specifically higher in the real tomato samples than in the standard solutions, despite their more complex nature (Fig. 3b).
This time, even in the case of the control with lower variability (TomSSt_1), the fluctuations of signal values were rather high for some sensors, reaching RSD values above 20% (e.g., LY2/gCTl and LY2/GH sensors) or very close to this threshold (e.g., LY2/G sensor). In fact, a MANOVA for TomSSt_1 using the data from the three sequences showed significant inter-sequence differences (Roy test α < 0.03). Higher levels of variation were found in the rest of the controls. Consequently, despite the use of the routine instrument calibration recommended by the equipment manufacturer, the unacceptable inter-sequence variance for each sample caused important bias in the graphs constructed joining the data from several sequences. Therefore, it makes necessary the use of a data standardization step before merging data from different sessions.
In order to tackle this long-term drift effect, the data from the tomato-like synthetic standard TomSSt_2 was selected to standardize sequence signals. The use of a real sample as reference had to be discarded, as its volatile profile would evolve during their conservation and it would also have a finite nature. On the opposite, a homogeneous synthetic standard including main tomato volatiles, representing the complex nature of its aroma, can be generated expressly for each sequence.
Following this premise, in order to standardize sequences, sensor signals from each sample after short-term drift correction were transformed using the deviation observed between the corrected signals of the synthetic standard in the different sequences. Once the signals were transformed, they were related to a time vector using PLS regression. Time vector values were obtained adding the time of each analysis, including the different sequences consecutively.
New PLS-DA 2D maps were then obtained, and the efficiency of correction was evident (Fig. 3a vs. Fig. 3b). For five of the six controls, no significant inter-sequence differences were found, and the confidence ellipsoids overlapped. Only in the case of the samples of the tomato landrace "Amarillo" (coded 2_1 in Fig. 3) significant differences (Roy test α < 0.001) were found between the first sequence and the Table 3 Percentage of samples correctly classified using KNN (K = 8), SIMCA, and PLS-DA classification methods for seven tomato cultivars and the tomato-like synthetic standard 2, before (raw data) and after intra-sequence drift correction using the proposed correction based on an adaptation of [21] and PLS component correction method, the Salit and Turk [21], ICA [25], and PARAFAC2 [20] methods. Average data of three work sequences is provided (variation range in brackets)

Sample
Raw data  remaining two. Nonetheless, the three samples plotted at a short distance. The standardization procedure showed a grouping correction efficiency of 94.4%, as 17 of the 18 sample groups were correctly ascribed with their equals ran in different sequences and their confidence intervals overlapped. This result represents a similar efficiency compared to other strategies regarding long-term drift counteraction methods [15,[34][35][36][37][38][39] or better [40,41]. It was confirmed, then, that data from different sequences could be pooled in order to work with a high number of samples. Considering the good performance obtained with these controls, the sequence standardization procedure was applied to the data obtained with three sequences, with 14 tomato varieties and TomSSt_2 as a reference. When both shortterm drift correction and sequence standardization were applied (Fig. 4b), the variation observed per sample was highly reduced compared to the use of raw data (Fig. 4a). Again, the replicates analyzed in different sequences tended to overlap their confidence intervals, and only one of the replicates of the "Amarillo" landrace could not be grouped with the rest of the corresponding replicates (coded 2.1 in Fig. 4b). Therefore, this procedure enables a realistic comparison of similitudes and differences in the volatile signal profile between samples run in different sequences.
Other works [15,34,40] deal with adaptations of the component correction strategy applied to a long-term drift counteraction. These works use a group of training samples to model the drift using different regression methodologies (PLS, OSC, or CPCA) and, then subtract the drift modeled from the signals of new samples. These strategies assume that with a good training set, the calibration model can be useful for a long time for practical purposes. However, it is obvious that to extend the period of use, large training sets are needed. Gutierrez-Osuna [42] used a training set of 5 to 10 samples for a drift correction period of 3 months in samples of 4 very different spices. Padilla et al. [34] used training sets higher than 100 samples for a drift correction period of 10 months in samples of individual chemical compounds at different concentrations. A similar application was tested by Ziyatdinov [40] with training sets higher than 1000 samples for a drift correction period of 7 months. Nevertheless, it seems also obvious that when sensor degradation increases, the usefulness of these calibration models will decrease and, at any moment, they would need an update. Additionally, training sets have been used with mixes of a Fig. 3 PLS-DA similitude maps from electronic nose fingerprints obtained in three different sequences and applying shortterm drift correction using (a) the 4 synthetic tomato-like standards and two tomato samples, and (b) the standards and two real tomato samples applying sequence standardization. Confidence ellipsoids (p = 0.05) are represented for samples with more than 4 repetitions. TomSSt, tomato-like synthetic standards. 1, "Zayno RZ"; 2, BGV005718 (real tomato samples used as controls). _1, _2, _3, sequence number few volatiles, and real tomato samples consist of more than 400 volatiles [33].
In the present study, specific training set samples were not used. Instead, the information of the samples evaluated in each sequence was used to calculate the specific drift correction model. Four injections per sample would be enough to model short-term drift and at the same time providing a reliable confidence interval. By doing so, each sequence would have its proper model and, consequently, it would always be up to date. The unpredictable nature of short-term drift in different sequences using tomato matrices would limit the efficiency of other alternatives.
On the other hand, the use of one reference synthetic standard has proven to be highly efficient to standardize sequences in order to reduce inter-sequence variability, enabling the comparison of samples analyzed in different sequences. This strategy would also be useful when a replacement of sensors is performed or when different instruments are used to enlarge the processing capabilities of the lab. Tomic et al. [36] tried a similar component correction strategy based on PCA and complemented with multiplicative drift correction to accomplish a successful calibration transfer between instruments. Other calibration transfer strategies which use sophisticated correction methods and algorithms have been also applied to the expansion of calibration update models [37,38,43] but they need a higher number of training samples (10 to more than 400 depending on the methodology) and were tested only for the detection of individual chemical compounds, so the efficiency in more complex samples still needs to be tested [14].
Combining short-term and sequence standardization and PLS-DA 2D similitude maps, it is possible to easily identify differences in the volatile signal profiles of the samples. It is then possible to make rapid identification of those samples with a volatile profile more similar to high-quality reference materials.  Fig. 4 PLS-DA similitude map obtained using 14 tomato varieties evaluated in three different work sequences: a using raw data, b using short-term intrasequence drift correction + sequence standardization. Confidence ellipsoids (p = 0.05). 1, "Zayno RZ"; 2, BGV005718; 3, "Caramba"; 4, UJI011; 5, "Divyne RZ"; 6, "Vinchy RZ"; 7, UJI023; 8, BGV005477; 9, BGV005651; 10, BGV005655; 11, BGV004587; 12, UJI008; 13, UJI014; 14, UJI028. _1, _2, _3, samples analyzed in different sequences This procedure would enable the use of e-noses for example in breeding programs. It would be possible to select which genetic backgrounds have a lower negative impact on the aroma profile. From an agronomic point of view, it would also enable a rapid identification of which preharvest and postharvest procedures have the lowest impact on the volatile profile. These maps would be expandable, offering the possibility of including new reference points. In fact, when Figs. 3c and 4b are compared, the relative position of the real tomato samples of "Zayno RZ" (coded 1 in the figures) and the "Amarillo" landrace (coded 2 in the figures) was not altered.
In the present work, this strategy has been successfully applied to a combination of different tomato materials, selected to represent a wide variability of volatile profiles, especially in the case of tomato landraces. The landraces included in the study had already shown a clearly different volatile profile [44], and especially important as they are frequently commercialized in quality markets in which consumers are willing to pay a price premium for excellent flavor [45]. Interestingly, "Muchamiel", which had previously shown a less intense volatile profile in gas chromatography analysis compared to "Valenciano" and "Morada", plotted in the PLS-DA 2D map in an area corresponding to materials with lower volatile concentration (Fig. 4b). The next step in future works will be centered on the comparison of the volatile profile obtained with the e-nose and GC-MS data in order to confirm this trend.

Conclusions
Short-and long-term drift compromises the application of enoses to the evaluation of volatile profiles. These effects are variable and unpredictable. Consequently, general models are not useful, and the performance registered in each sequence has to be used in order to model drift effects. The distribution of 4 replicates per sample and sequence enables the development of an effective and sequence-specific short-term drift correction. On the other hand, the unpredictable nature of the variation between sessions makes it necessary to use reference materials to standardize sequences. By doing so, it would be possible to analyze a high number of samples distributed in different sequences. The use of a tomato-like synthetic has proven to be for this purpose. The two-step correction methodology proposed here, combined with PLS-DA two-dimensional similitude maps, will enable rapid and reliable identification of samples with a volatile signal profile similar to references selected as ideal targets.
Data availability Data available upon request to the authors.

Declarations
Ethics approval Not applicable Consent to participate All the authors have consented to participate on the research. The research does not include humans nor animals as subjects of research.
Consent for publication All the authors consent the publication of the research.

Conflict of interest
The authors declare no competing interests.