Evaluation of uncertainty sources in the determination of testosterone in urine by calibration-based and isotope dilution quantification using ultra high performance liquid chromatography tandem mass spectrometry

Three quantification methodologies, namely calibration with internal standard (Cal-IS, non-weighted), weighted calibration with internal standard (wCal-IS) and isotope pattern deconvolution (IPD) have been used for the determination of testosterone in urine by LC-MS/MS. Uncertainty has been calculated and compared for the three methodologies through intraand inter-laboratory reproducibility assays. IPD showed the best performance for the intra-laboratory reproducibility, with RSD and combined uncertainty values below 4% and 9% respectively. wCal-IS showed similar performance, while Cal-IS where not constant and clearly worse at the lowest concentration assayed (2 ng/mL) reaching RSD values up to 16%. The interlaboratory assay indicated similar results although wCal-IS RSD (20%) was higher than IPD (10%) and Cal-IS get worse with RSD higher than 40% for the lowest concentration level. Uncertainty budgets calculated for the three procedures revealed that intercept and slope were the most important factors contributing to uncertainty for Cal-IS. The main factors for wCal-IS and IPD were the volumes of sample and/or standard measured. INTRODUCTION The use of drugs to enhance performance in sports is a well-known and documented issue. Despite the continuous introduction of new compounds, endogenous androgenic anabolic steroids (EAAS) are among the most popular doping agents[1–3]. EAAS determination still represents an important challenge due to the complexity to differentiate exogenous administration of endogenous substances. The goal requires collaborative efforts as well as advanced methodologies[1–7]. Longitudinal fluctuations measurement for a given athlete is nowadays regarded as the most effective approach to suspect the EAAS misuse. In this way, the steroidal profile of the Athlete Biological Passport (ABP) represents a powerful tool to reveal doping with endogenous compounds[1,3,6]. For most drugs, urine is the matrix generally used since it involves a non-invasive sampling procedure, large volumes are easily obtained, shows wide time windows and concentrations are high enough[1,6,7]. However, sample preparation is mandatory to ensure matrix effect attenuation and good sensitivity and selectivity. Usual treatment techniques such as solid phase extraction (SPE), liquid-liquid extraction (LLE) and simple matrix dilution are normally used. Due to its simplicity, efficiency and low cost, LLE at basic pH is still widely used in EAAS determination in urine samples[5,6]. Concerning identification and quantification, LC-MS based techniques –equipped with Electrospray Ionization source (ESI)tend to replace GC-MS(/MS) – considered as the gold World Anti-doping Agency (WADA) standard for quantifications[8]since the former shows suitable sensitivity and faster instrumental run time. Specially UHPLC-MS/MS with its demonstrated separation efficiency is considered the method of choice in doping analysis[1,5,6,9]. A relevant problem with the use of ESI source is the signal alteration due to matrix effect[10-12]. Matrix effect can affect drastically to sensitivity, precision and accuracy of the analytical results. The most robust approach to minimize matrix effect rely on the use of Stable Isotope Labeled-Internal Standard (SILIS)[11,12]. Thus, matrix-effects associated to complex matrices can be properly overcome using a quantification methodology based on isotope dilution mass spectrometry (IDMS). Classical IDMS is based on the preparation of methodological calibration curves with the associated time consumption. An alternative method of quantification, based on the measurement of isotopic abundances in the spiked sample by multiple linear regression, can also be used. This method, known as isotope pattern deconvolution (IPD), do not requires the construction of any calibration graph and has been tested satisfactorily for rapid quantifications in different complex matrices[13-16]. IDMS together with IPD is a fast and reliable methodology, which provides one result per injection with high accuracy and free of matrix effect. In the field of doping analysis, improvements of reliability and robustness of analytical results is continuously and still required[1,2,5,6]. WADA highlights the need of good inter-laboratory precision, particularly relevant in ABP profiling[5]. Analytical results for ABP are obtained from different laboratories for the same athlete, thus, improving inter-laboratory precision seems of maximum concern to allow universal application of any developed methodology. In this way, the need of calculating and minimizing measurement uncertainty deserves to be treated thoroughly[2,17,18]. In the present work, a previously developed method has been applied to assess the uncertainty in the testosterone concentration determined in several synthetic urine samples. Testosterone concentration has been calculated using three different methodologies, weighted and non-weighted calibration with IS (wCalIS and Cal-IS, respectively) and IPD. In order to evaluate more in depth the associated uncertainty, an interlaboratory comparison among five laboratories has been performed. For all three methodologies, intraand inter-laboratory measurements have been conducted, combined uncertainties (uc) and full uncertainty budgets have been obtained and compared. EXPERIMENTAL Reagents and materials Testosterone (T, purity 99%) was provided by Sigma-Aldrich Co. (Madrid, Spain) and C2-testosterone (C2T, purity 98% and C2-enrichment 98%) by Cambridge Isotope Laboratories (Andover, MA, USA). Methanol (MeOH, HPLC quality) and methyl tert-butyl ether (MTBE, GC quality) were provided by Scharlau (Barcelona, Spain). For the sample hydrolysis, β-glucuronidase from E. coli K12 provided by Roche (Indianapolis, IN, USA) was employed. A 1 M phosphate buffer was prepared by dissolving the proper amount of (NH4)2HPO4 (Merck, Darmstadt, Germany) in Milli-Q water and adjusted to pH=7 with HCl 37% from Scharlau (Barcelona, Spain). Also, a NaHCO3/Na2CO3 (1:2, w/w) (Sigma-Aldrich Co., Madrid, Spain) solid buffer was prepared. Formic acid (LC additive quality) and a 500 mM solution of NH4HCOO (Scharlau, Barcelona, Spain) in methanol HPLC were used for the mobile phase preparation. A 250 μg/mL stock solution of T was prepared by dissolving 25 mg of solid standard, accurately weighed, in 100 mL of methanol. The stock solution of C2-T was prepared by dissolving 10 mg of the purchased material in 50 mL of methanol. This provided a concentration by reverse isotope dilution against the natural compound of 237 μg/mL. Individual 10 μg/mL and 1 μg/mL working solutions of the natural and labelled compounds were prepared by dilution of the stock solutions with methanol. All of the standard solutions were stored in amber glass bottles in a freezer. The water purification system used was a Milli-Q gradient A10 from Millipore (Bedford, MA, USA). Instrumentation All participants in the inter-laboratory comparison have determined testosterone by LC-MS/MS. Additionally some laboratories have used other methodologies (see inter-laboratory comparison section). This section describes the instrumentation used at Research Institute for Pesticides and Water (IUPA) laboratory, where the intra-laboratory measurements and all calculations have been done. An Acquity UPLC system coupled to a TQD triple quadrupole mass spectrometer from Waters Corp. (Milford, MA, USA) was employed for sample analysis. Chromatographic separation was performed with an Acquity UPLC BEH C18 column (1.7 μm, 2.1 mm x 100 mm), also from Waters Corp., at a 0.3 mL/min flow rate and an injection volume of 10 μL. The column oven was kept at 55oC and the sample manager at 10oC. Mobile phase A was purified water and mobile phase B was MeOH HPLC, both containing 0.01% of formic acid and 1 mM of NH4HCOO as modifiers. The gradient applied was: 45% B (0-1 min), linear increase to 77.5% B in 6.5 min, 95% B (7.51-8 min), 45% B (8.5-11.5 min). Chromatograms of blank and a selected sample can be seen in Figure S.8 in supplementary material. Table 1. Chemical structure and experimental conditions of the LC-(ESI)-MS/MS for testosterone and labeled testosterone Compound Structure Rt (min) Precursor ion Cone voltage (V) SRM transitions T 5.7 [M+H] 30 289.2 > 96.9 (25)


INTRODUCTION 31
The use of drugs to enhance performance in sports is a well-known and documented issue. Despite the 32 continuous introduction of new compounds, endogenous androgenic anabolic steroids (EAAS) are among 33 the most popular doping agents [1][2][3]. EAAS determination still represents an important challenge due to 34 the complexity to differentiate exogenous administration of endogenous substances. The goal requires 35 collaborative efforts as well as advanced methodologies [1][2][3][4][5][6][7]. Longitudinal fluctuations measurement for a 36 given athlete is nowadays regarded as the most effective approach to suspect the EAAS misuse. In this way, 37 the steroidal profile of the Athlete Biological Passport (ABP) represents a powerful tool to reveal doping 38 with endogenous compounds [1,3,6]. 39 For most drugs, urine is the matrix generally used since it involves a non-invasive sampling procedure, large 40 volumes are easily obtained, shows wide time windows and concentrations are high enough [1,6,7].
However, sample preparation is mandatory to ensure matrix effect attenuation and good sensitivity and 42 selectivity. Usual treatment techniques such as solid phase extraction (SPE), liquid-liquid extraction (LLE) 43 and simple matrix dilution are normally used. Due to its simplicity, efficiency and low cost, LLE at basic pH is 44 still widely used in EAAS determination in urine samples [5,6]. Concerning identification and quantification, 45 LC-MS based techniques -equipped with Electrospray Ionization source (ESI)-tend to replace GC-MS(/MS) -46 considered as the gold World Anti-doping Agency (WADA) standard for quantifications [8]-since the former 47 shows suitable sensitivity and faster instrumental run time. Specially UHPLC-MS/MS with its demonstrated 48 separation efficiency is considered the method of choice in doping analysis [1,5,6,9]. 49 A relevant problem with the use of ESI source is the signal alteration due to matrix effect [10][11][12]. Matrix 50 effect can affect drastically to sensitivity, precision and accuracy of the analytical results. The most robust 51 approach to minimize matrix effect rely on the use of Stable Isotope Labeled-Internal Standard (SIL-52 IS) [11,12]. Thus, matrix-effects associated to complex matrices can be properly overcome using a 53 quantification methodology based on isotope dilution mass spectrometry (IDMS). Classical IDMS is based 54 on the preparation of methodological calibration curves with the associated time consumption. An 55 alternative method of quantification, based on the measurement of isotopic abundances in the spiked 56 sample by multiple linear regression, can also be used. This method, known as isotope pattern 57 deconvolution (IPD), do not requires the construction of any calibration graph and has been tested 58 satisfactorily for rapid quantifications in different complex matrices [13][14][15][16]. IDMS together with IPD is a fast 59 and reliable methodology, which provides one result per injection with high accuracy and free of matrix 60 effect. 61 In the field of doping analysis, improvements of reliability and robustness of analytical results is 62 continuously and still required [1,2,5,6]. WADA highlights the need of good inter-laboratory precision, 63 particularly relevant in ABP profiling [5]. Analytical results for ABP are obtained from different laboratories 64 for the same athlete, thus, improving inter-laboratory precision seems of maximum concern to allow 65 universal application of any developed methodology. In this way, the need of calculating and minimizing 66 measurement uncertainty deserves to be treated thoroughly [2,17,18]. 67 In the present work, a previously developed method has been applied to assess the uncertainty in the 68 testosterone concentration determined in several synthetic urine samples. Testosterone concentration has 69 been calculated using three different methodologies, weighted and non-weighted calibration with IS (wCal-70 IS and Cal-IS, respectively) and IPD. In order to evaluate more in depth the associated uncertainty, an inter-71 laboratory comparison among five laboratories has been performed. For all three methodologies, intra-and 72 inter rate and an injection volume of 10 µL. The column oven was kept at 55ºC and the sample manager at 10ºC. 104 Mobile phase A was purified water and mobile phase B was MeOH HPLC, both containing 0.01% of formic 105 acid and 1 mM of NH4HCOO as modifiers. The gradient applied was: 45% B (0-1 min), linear increase to 106 77.5% B in 6.5 min, 95% B (7.51-8 min), 45% B (8.5-11.5 min). Chromatograms of blank and a selected 107 sample can be seen in Figure   Electrospray ionization in the mass spectrometer was performed at 120 ºC and 350 ºC source and 113 desolvation temperatures, 80 and 800 L/h cone gas and desolvation flow, respectively, and 3.5 kV capillary 114 voltage, operating in positive ion mode. MS/MS experimental conditions for T and 13 C2-T are listed in Table  115 1. The aim of the study was explained to 15 healthy volunteers (8 men and 7 women with ages comprised 125 between 16 and 59 years) and consent was obtained after confirmation that they fully understood the 126 experiment. Urine samples were collected and stored at -20ºC until use. Testosterone concentration was 127 approximately determined by IPD for all samples. 12 samples were selected and mixed in pairs in 128 approximate 1:1 (v/v) ratios to obtain 6 synthetic urine samples, A to F, with increasing concentrations 129 along the 2 ng/mL to 75 ng/mL testosterone range. Additionally, at IUPA laboratory, standard addition was also employed for the inter-laboratory experiment. 146 On this purpose, 2.5mL aliquots of each sample were spiked with 0, 0.5, 2 and 3.5 times the original 147 approximate concentration of T and adjusted to a final volume of 2720 µL with water. The described 148 sample treatment was applied without the addition of internal standard. 149 For all participant laboratories, calibration curves freshly prepared consisted in 6 points between 0 and 100 150 ng/mL of T in 2.5mL of water. Using the same data acquired for calibration with IS, weighed calibration 151 calculations were applied as described in Garcia-Alonso and Rodríguez-González [19]. The weighing factor 152 used has been the common value inverse of the variance (1/SD 2 ). 153 The isotope dilution quantification methodology employed is based on multiple linear regression and the 155 spiking of samples with an isotopically enriched analog of the analytes of interest. This produces an 156 intentional alteration of the natural isotopic composition of the analyte in the mix. Briefly, the altered 157 isotopic composition measured in the mixture is a combination of the contribution of the 158 abundances of the natural, , and the isotopically enriched spike, analyte. For a single 159 isotopically enriched spike and n measured transitions, this can be expressed in matrix notation as: 160 A vector error needs to be included in order to solve the system by multiple linear regression, which 162 gives the molar fractions of natural and labelled compounds (Χ and Χ respectively) as solutions. 163 These can be obtained in any spreadsheet software with a linear regression function (LINEST in Microsoft 164 Excel) inserting the data in matrix form. Then, since the added amount of labelled compound is 165 known, the amount of natural compound in the sample is readily calculated ( arrival to the selected laboratories. Samples were processed and all the required measurements were 189 performed in order to apply calibration and IPD calculations at our laboratory. In addition, laboratories 190 were also asked to perform any other routine quantification method they had implemented (Table 2). 191 Taking into account those extra quantification methods, we got 19 analytical results for each sample. These 192 results were used to calculate a consensus value for the concentration of each sample, Cref, and its 193 associated uncertainty, uref. 194 where u 2 SD is the intra-laboratory reproducibility standard deviation for the five replicates obtained along 206 five consecutive weeks at IUPA laboratory and ubias is the uncertainty associated to any source of bias which 207 accounts for the method and laboratory bias, including the uncertainty associated to the consensus 208 reference value. To that purpose, a short inter-laboratory comparison was conducted and a whole of 19 209 quantification results have been obtained for each sample A to F (see inter-laboratory experiment section). 210 Thus, ubias was calculated as 211 where uref is the bias uncertainty associated to the consensus concentration value for each sample, Cref, 213 obtained by: 214 where SR is the mean standard deviation for the inter-laboratory reproducibility and n is the number of 216 results for each sample. A n=17 was employed instead of 19 due to exclusion of outliers determined by 217 Hampel test (see results, Table 3 and Table S.7 from Supplementary Information). 218 RMS is the root mean square bias for each quantification method used in the intra-laboratory 219 reproducibility assessment conducted alt IUPA lab (for examples of calculations see Table S.5 in 220 supplementary material). 221 For each sample (A to F) a mean bias has been calculated from the intra-lab reproducibility study (n = 5). 222 These mean bias have been used to calculate RMS as: 223 On the other hand, contribution of any source of uncertainty to a given measurement, known as full 226 uncertainty budget, can be calculated using the Kragten  are sequentially altered with their SD to obtain the deviation (Δ 2 ) produced to the analytical result in 231 relation to the unchanged value, which constitutes the magnitude of contribution to total uncertainty of 232 the analytical procedure. It is readily calculated for each parameter i as: 233 Where x is the unchanged value and xi is the new value with one parameter altered. Then, total uncertainty 235 of the procedure (U(x)) can be obtained using: 236 ;78: = / < 6 237 Examples of complete uncertainty calculations can be consulted in the Supplementary Information (Table  238 S.6). 239

IPD measurements 242
As explained above, IPD calculations rely on the relative abundance distribution of natural and labelled 243 compounds and, therefore, on their accuracy. For this purpose, the most abundant SRM transitions for 244 each compound were selected with IsoPatrn software. Then, relative abundances were experimentally 245 determined by preparing individual 100 µg/L standards in MeOH/H2O 1:1 (v/v) and injecting them five 246 times each (Tables S.1

and S.2 in supplementary information). Mean values for experimental abundances 247
were used in subsequent quantification procedure and standard deviations were used in the uncertainty 248 budgets building procedure. 249 IPD calculation also requires to know the exact amount of labelled compound added to samples. Exact 250 concentration of the 13 C2-T working standard solution was calculated by reverse isotope dilution (RID) 251 against the natural T solution, resulting in 12.20 ± 0.10 mg/L. (Table S Table S In addition to intra-laboratory reproducibility, combined uncertainty, uc, were calculated in order to 287 estimate the measurement uncertainty for the three quantification methods. To this end, method and 288 laboratory bias were estimated, according to the Nordtest guide [25] (see experimental), as the square root 289 of two components: the percentage of the mean difference (RMSbias) from a reference value (Cref), and 290 uncertainty of this reference value, uref. The end value for uc accounts for the method and laboratory bias 291 together with standard deviation of reproducibility at each concentration assayed (A-F samples) (

295
The consensus values obtained from the inter-laboratory comparison were adopted as reference values 296 ( Table 3). The consensus values are not intended to be used as certified values, but they were accepted as 297 reference values to calculate bias uncertainty for each quantification methodology and to assess the bias 298 associated to that reference value. A uref of 3.1% was obtained from the mean RSD value (12.9%) and n=17 299 from the 19 quantification procedures applied minus outlier values (see experimental section and 300 Supplementary Information for details). 301 Since the data required for Cal-IS and wCal-IS is exactly the same, taking into account the difference in 302 combined uncertainty (11.2%-17.9% versus 9.1-10.0%) it is worth noting the improved quality of analytical 303 results due only to the data treatment. 304 Along with wCal-IS, IPD stands out in comparison with more extensively used methods such as Cal-IS. 305 Furthermore, IPD also provided combined uncertainties below 8.4% in all the concentration range with the 306 advantage of reduced time analysis, since no calibration curve procedure had to be performed. 307 Again, results showed that Cal-IS performs poorly at low concentrations, being the worse method for the 308 whole concentration range studied. In comparison, wCal-IS provided constant values of combined 309 uncertainty along the concentration range although higher than IPD, which produced the lowest combined 310 uncertainties of the three methods at any concentration assayed. This is in accordance with the high 311 metrological quality of analytical results provided by isotope dilution mass spectrometry 312 determinations [21]. 313 Finally, full uncertainty budgets were obtained for the three selected quantification methods according to 314 the Kragten approach (Table S.6 in supplementary material). 315 In the case of both wCal-IS and Cal-IS methods, the same 6 parameters were considered, including: 316 intercept and slope of the linear regression, measurement of the area ratio in the sample (between natural 317 and labelled compound chromatographic peak areas, Rm), volume of sample (Vs), volume of internal 318 standard (Vt) and concentration of the natural standard (Cn). Calculations of the contribution of each 319 parameter to total procedure uncertainty were carried out for the five replicates and the average values 320 were obtained. 321 As it can be seen in Figure 2, in the case of Cal-IS, uncertainty contribution coming from the intercept of the 322 regression is predominant at low concentrations (Sample A) while at high concentrations (Sample F) slope 323 is the highest contributor to final method uncertainty. Thus, uncertainty for a Cal-IS method will hardly 324 improve experimentally. Probably, an alternative way to correct bias and its associated uncertainty at low 325 concentrations could be the use of a single external calibration point forced through the origin, an 326 approach not tested in the present work. In contrast, when using weighted calibration, the major 327 contributors to uncertainty were the measurement of sample and internal standard volumes. Thus, one 328 way to easily reduce uncertainty could be consider the mass instead of volumes. 329 On the other hand, the parameters considered for uncertainty calculations in IPD quantification were the 330 following: determination of abundances of the natural testosterone (natT-1, natT-2) and 13 C2-testosterone 331 ( In this work, three analytical approaches for the determination of testosterone in urine have been 345 compared from an uncertainty evaluation point of view. 346 Firstly, method uncertainty derived from the procedure itself has been evaluated at our laboratory by 347 applying weighted and non-weighted calibration with internal standard and IPD quantifications to 6 348 synthetic urine samples, composed of mixed human urine samples, in five different weeks. Inter-day 349 combined uncertainties for each sample and method were obtained by Nordtest calculation method and 350 showed similar values for weighted calibration and IPD, below or equal to 10% in all cases, while non-351 weighted calibration yielded uncertainties ranging from 11.2% to 17.9%. 352 Secondly, an inter-laboratory experiment was carried out in order to set a reference value for the samples 353 and to further evaluate inter-laboratory RSD of these three methods. Similarly to the intra-laboratory 354 experiment, non-weighted calibration presented much higher uncertainty at low concentrations (43%) than 355 at medium and high concentrations (12%-24%), where it showed a better performance than weighted 356 calibration (18%-21% along all the range). In contrast, the combined uncertainty associated with IPD 357 method was lower than the other two in all 6 samples, ranging from 7.8% to 13%. 358 In addition, Kragten method was applied to intra-laboratory data to obtain the uncertainty budgets for the 359 considered quantification methods. Thus, linear regression parameters -slope and intercept-were found 360 to be the major contributors to uncertainty in non-weighted calibration, varying along the concentration 361 range. In contrast, weighted calibration and IPD methods were more stable in terms of relative 362 contributions to procedure uncertainty. 363 Hence, it has been demonstrated that weighted calibration might be more precise than classical calibration 364 with internal standard, providing similar uncertainties and standard deviations than isotope dilution 365 methodologies in intra-laboratory reproducibility studies. Moreover, the present IPD methodology yielded 366 lower inter-laboratory variability and thus, higher metrological quality of the analytical results are 367 expected. 368 The results presented in this work for testosterone as a model compound, together with the benefits of 369 reduced time analysis and matrix effect corrections provided by IDMS-based methodologies, highlights IPD 370 as a rapid, robust and reliable method. Thus, taking into account the lower uncertainty of the present 371 analytical approach, IPD is shown as a promising alternative to improve longitudinal fluctuations in steroid 372 profiling. 373 374