Development of a 50-item Abridged form of the Junior Spanish version of the NEO questionnaire (JS NEO-A50)

The aim of this psychometric study was to construct an abridged 50-item form, 10 for each of the five factors of personality, of the Junior Spanish version of the NEO-PI-R (JS NEO-A50). Two separate studies were conducted. In study 1, 400 high school students completed two personality scales to examine the factor structure (Exploratory Factor Analysis), convergent validity and reliability of the JS NEO-A50. In study 2, an independent sample of 385 adolescents completed the JS NEO-A50 and several outcome measures to replicate the factor structure (Exploratory Structural Equation Model) and examine criterion validity, respectively. The five-factor structure found in study 1 was satisfactorily replicated in the second, independent sample. Sources of reliability (internal consistency and test-retest) and validity (convergent) were adequate. Also, the outcome measures assessed in study 2 were related to personality traits in the expected direction. Life satisfaction was significantly predicted by emotional stability; symptoms of behavioral problems were predicted by low scores in both agreeableness and conscientiousness, while internalizing emotional symptoms were mainly predicted by emotional instability; finally, academic performance was mainly predicted by conscientiousness. We conclude that the JS NEO-A50 is a sound inventory to measure the five broad personality domains in Spanish-speaking adolescents.


Introduction
Personality constitutes an important and core construct in people's lives. Evidence has shown robust associations between personality and a wide array of life outcomes, such as interpersonal relationships, political and religious beliefs, occupational performance, happiness or longevity (Ozer & Benet-Martínez, 2006;Soto, 2019). The role of personality is also important in common mental disorders (Jeronimus et al., 2016;Kotov et al., 2010), particularly when considering internalizing, externalizing and the general (aka p) factors of psychopathology (Caspi et al., 2014;Mezquita et al., 2015). Similarly, personality traits have also shown great relevance in adolescents' important life areas, such as academic achievement (Poropat, 2009), resilience and coping behaviors (Oshio et al., 2018) or happiness (Gale et al., 2013;Suldo et al., 2015); and also more negative outcomes such as antisocial behavior (Durán-Bonavila et al., 2017;Mann et al., 2016), substance use (Stautz & Cooper, 2013;Ibáñez et al., 2015) or psychopathology (Castellanos-Ryan et al., 2016;De Bolle et al., 2012;Etkin et al., 2020). Thus, it is crucial to have psychometrically sound instruments available for the assessment of personality traits in this key period in the life span.
Nowadays, there is a wide consensus in considering the broad domains of personality proposed by the Big Five (BF; Goldberg, 1992) and Five-Factor models (FFM;McCrae & Costa, 2008) as the main framework of personality taxonomy . As for personality assessment instruments, the NEO Personality inventories (McCrae & Costa, 2010) are the most used questionnaires for assessing the FFM.
Although there are short Big Five questionnaires that can be applied to adolescents in our sociocultural context (BFQ-C; Barbaranelli et al., 2003;BFPTSQ, Ortet et al., 2017;OPERAS;Vigil-Colet et al., 2013), their coverage of the NEO facets is limited.
Thus, there is no brief tool available specifically for Spanish adolescents that presents an adequate FFM bandwidth, a key validity issue (Soto & John, 2017).
As far as we know, there is only one questionnaire based on the NEO personality inventories developed for Spanish adolescents, the JS NEO (Ortet et al., 2012). The JS NEO consisted in an adaptation to Spanish adolescents of the NEO-PI-R adult version (McCrae & Costa, 2010) by using an easy-to-understand language and age-appropriate content. For the development of the JS NEO, half of the NEO-PI-R items had to be modified to be suitable for Spanish adolescents between 12 and 17 years of age (Ortet et al., 2012). The JS NEO has a complete form of 240 items (Ortet et al., 2012) and a short version of 150 items (JS NEO-S; Ortet et al., 2010). Both versions allow to obtain reliable and valid scores for the five broad dimensions and their corresponding 6 facets proposed in the NEO-PI-R in Spanish adolescents aged between 12 and 17 years.
However, questionnaire length may be a relevant issue in certain circumstances, especially in research contexts where a wide battery of questionnaires has to be administered, and schools represent time-limited settings which lower the available administration time. For these reasons, the main aim of this study was to develop an abridged 50-item questionnaire, the JS NEO-A50, in order to save time and resources for the assessment of the NEO personality traits in Spanish adolescents. This goal was set trying to preserve adequate bandwidth and reliability and validity indices of the new instrument. To this end, two studies were conducted with non-clinical samples to explore the factor structure of the JS NEO-A50, and the reliability and validity of the scores obtained. Specifically, and following recommendations for the development of short versions (Widaman et al., 2011) we: a) began with a strong instrument, the JS NEO-S; b) attempted to preserve the content of the five domains and their facets; c) retained the same factor structure as the original form; and d) examined reliability and validity indices in an independent sample.
We report how we obtained our samples, all data exclusions, all measures in the study, and all analyses including all tested models. Data inclusion/exclusion criteria were not necessary. For our inferential tests, we report p values, effect sizes (in the Electronic Supplementary Material; ESM), and 99% confidence intervals. All data, materials, and ESM are accessible at https://osf.io/yrhc9/?view_only=5b95283d710147528b6ed41e51f36b59.

Participants and procedure
All the students from first year to fourth year of a high school of convenience in a city in eastern Spain, were invited to participate. Four hundred and twenty-eight students returned signed written parental consent forms and responded to the questionnaires, but 28 of them did not complete all of the scales. Thus, the final sample consisted of 400 students between the ages of 12 and 17 (Mean age = 14.31 years; SD = 1.57; 51% males, 49% females). To study retest reliability, the JS NEO A-50 was readministered one month later to a sub-sample of 227 students, belonging to 10 classes selected at convenience from second to fourth grade, between the ages of 13 and 17 (Mean age = 15.16 years; SD = 1.00; 47.1% males, 52.9% females). There were no significant differences between the full sample and the subsample in terms of gender frequency: χ 2 (1) = .87, p > .05. The subsample had a significantly higher mean age, albeit by less than a year: t(625) = 8.24, p < .001.
This study was part of broader research into psychosocial risk and protective factors affecting mental health (see Moya-Higueras et al., 2018 for more details). After giving detailed information about the research and handing out the consent documents to teachers and parents or legal guardians, trained research assistants administered the battery of questionnaires to each of the classes in two 1-hour tutorial sessions from each class, separated by 1 week. A sub-sample of convenience completed a third session one month later, in order to explore test-retest reliability. The questionnaires were voluntarily completed by those students authorized by their parents or legal guardians, and merchandising items from our university, such as notebooks and pens, were given to the adolescents that completed all the questionnaires in order to incentivize participation. Participants took around one hour to complete the questionnaires.

Ethics
This research was approved by the ethical committee from the authors' university, and authorized by the high schools' boards as well as by the regional education authorities. The parents or legal guardians of the participants gave written informed consent in accordance with the Declaration of Helsinki and the European Parliament General Data Protection Regulation (GDPR; European Parliament 2016/679) guidelines, emphasizing that all personal details including identification data would be completely confidential. Ortet et al., 2010). This 150-item scale is the short form of the Spanish adaptation of the NEO-PI-R personality questionnaire for adolescents (JS NEO) between the ages of 12 and 17 (Ortet et al, 2012). It consists of statements answered on a 5-point Likert-type scale (0 = Strongly disagree; 4 = Strongly agree) in order to assess the five higher-order dimensions in the FFM (neuroticism, extraversion, openness, agreeableness and conscientiousness), as well as the 30 lower-order facets corresponding to them (6 facets per dimension and 5 items per facet). Reliability estimates for domains in the present sample ranged between .78 (openness) and .90 (conscientiousness) (see Table 2). Facet reliabilities in the present sample ranged between α = .50 and .77 for all but three of the 30 scales: Anxiety (.40) and Impulsiveness (.33) from the neuroticism domain, and Feelings (.48) from the openness domain. These lower reliabilities were expected due to the reduced number of items, and had similar values to those of the original validation article (Ortet et al., 2010) and subsequent studies employing the JS NEO-S (e.g., Romero & Alonso, 2017). (BFPTSQ;Morizot, 2014), Spanish version (Ortet et al., 2017). It is a short, 50-item personality questionnaire designed to assess the broad dimensions in the FFM, specifically designed for both the adolescent and adult populations. It is answered on a 5-point Likert-type scale (0 = Disagree strongly; 4 = Agree strongly). The alpha reliabilities of the BFPTSQ in the present sample were: openness .80, extraversion .77, agreeableness .74, conscientiousness .77 and emotional stability (low neuroticism) .80.

Data analyses
The JS NEO-S was used to extract the 50 items for the JS NEO-A50. In order to select the 10 items per scale, we attempted to ensure a proper balance between a maximization of facet representation with an adequate structure and reliability of the new 50-item scale. Specifically, and following a recommended strategy for constructing

Descriptives
Student's t-test for independent samples revealed that females scored higher on all FFM domains (only non-significantly for extraversion) (see ESM Table 1).

EFA
The EFA for the entire scale yielded adequate loadings (≥ .30) for the items selected in each of their respective dimensions (see Table 1). 26 out of the 30 facets from the JS NEO-S had at least one item representing each of them in the JS NEO-A50, whereas only four facets were not represented due to their items achieving insufficient psychometric fitness: Impulsiveness, from the neuroticism domain; Actions and Values, from the openness domain; and Trust, from the agreeableness domain. The amount of total variance explained by the five factors was 34.2%.

Reliability
Internal consistency reliabilities were adequate for the five JS NEO-A50 scales, with Cronbach alpha indices ranging from .73 to .83 and Omega indices ranging from .74 to .83. The alpha values were very similar to those calculated and reported for the JS NEO-S for this study. In addition, temporal stability reliability was also satisfactory, with 1-month test-retest correlations ranging from .74 to .81 (see Table 1).
Insert Table 1 about here

Convergent validity
Equivalent personality factors presented high to very high correlations, in the range from .55 to .69 between JS NEO-A50 and BFPTSQ traits (see Table 2). The abridged instrument's domains were highly associated with respective JS NEO-S domains, although these correlations could be overestimated due to common items.
Thus, we mainly relied on the associations with JS NEO-S ad-hoc domains as an additional source of convergent validity. In this case, correlations were large to very large (from.63 to .83) and slightly lower for openness (.48) (see Table 2). At the facet level, JS NEO-A50 traits even showed small to moderate correlations among those facets not represented by items in the abbreviated questionnaire (in the range from .20 to .34) (see ESM Table 2).
Insert Table 2 about here

Participants and procedure
Following the same protocol as described in study 1, we obtained data from another high school of convenience, located in the same city in eastern Spain. Four hundred and one students returned signed written parental consent forms and responded to the questionnaires, but 16 of them did not complete all the questionnaires. Thus, a final sample of 385 high school students between the ages of 12 and 17 (Mean age = 14.29, SD = 1.49; 47.5 % males, 52.5 % females) was obtained for study 2. Participants took around one hour to complete the questionnaires.

Measures
JS NEO-A50. In this study, the 50 items from study 1 were used to conduct an Exploratory Structural Equation Model (ESEM) for further validation of the abridged scale's structure (see item content in ESM Table 3). (SENA;Sánchez-Sánchez et al., 2016). The SENA is a questionnaire that assesses a wide range of common emotional and behavioral problems in children and adolescents. Two externalizing (aggression-7 items, and antisocial behavior-8 items) and two internalizing (anxiety-10 items, and depression-14 items) scales were selected. The internal consistencies for each of the subscales in the present study were .77 for aggression, .67 for antisocial, .89 for anxiety, and .90 for depression. (SLSS;Huebner et al., 1998), Spanish version (Galindez & Casas, 2010). The SLSS is a brief 7-item questionnaire that assesses selfreported life satisfaction for youngsters between the ages of 8 and 18 (Huebner et al., 1998). The internal consistency for the scale in the present study was .73.

Student's Life Satisfaction Scale
Single item assessing academic performance. The item requested 'What grades did you obtain last school year?' The response format was a 5-point scale ranging from 0 = Normally failed to 4 = Normally outstanding. Note that grades in the Spanish educational system are given in the following range from 0 to 10 points (0-4 = Fail; 5 = Sufficient pass; 6 = Pass; 7-8 = Mention; 9-10 = Outstanding/Honors).

Data analyses
In order to investigate internal consistency reliability in the present sample, Alphas and Omegas were calculated for the JS NEO-A50 domains.
We employed ESEM to confirm the factor structure described in study 1. ESEM has shown to reflect personality structure more adequately than other procedures (e.g. chi-square suggests a good fitting model. However, because this test is known to be overly sensitive to increasing sample size, to minor departure from multivariate normality and to minor (substantively irrelevant) model misspecifications, additional fit indices were considered (Bentler, 1990). Thus, an acceptable model fit is suggested when a value of .90 or above is obtained for the comparative fit index (CFI) and

Confirmatory
Tucker-Lewis index (TLI), of .08 or below for the root mean square error of approximation (RMSEA), and of .10 or below for the standard root mean square residual (SRMR). A chi-square/degrees of freedom ratio of 2 or below is also an index of acceptable model fit (Bentler, 1990;Marsh et al., 2004;Jöreskog, 1969). For the RMSEA 90% CI, values below .05 and below .08 for the lower and upper bounds, respectively, suggest acceptable fit (MacCallum et al., 1996). Confidence intervals (99%) were calculated and reported. The ESEM was employed using target loading rotation. A factor model was estimated using 3 a priori correlated uniquenesses (CUs; see ESM Figure 1). CUs are employed in FFM to reflect the fact that some items share similar content, a common word, or relate to the same domain (see Morizot, 2014 for a similar procedure). In the present study, our a priori CUs were conducted strictly for those items sharing a common word and also belonging to the same original facet, whose content was considered very similar (e.g., items 101r and 132r both refer to manipulation tactics to "get them [others] to do what I want").
A single covariate Multiple Indicator Multiple Causes (MIMIC) model within ESEM was performed in order to explore differential item functioning (DIF; Jones, 2006) across gender and age separately. DIF analysis was chosen as it is adeqaute for ordinal indicators, and is appropriate for use within the ESEM strategy (Marsh et al., 2014). For age comparison, the sample was divided into two groups of equal ranges (Group 1: 12-14 years, n = 213; Group 2: 15-17 years, n = 172). The stepwise procedure for both age and gender DIF involved (1) testing the model without any direct effects, (2) inspecting whether the modification indices showed a significant direct effect from the covariate (age or gender) on any of the items, and 3) performing a subsequent DIF test if significant direct effects were found, ascertaining improvement in model fit. Both ESEM and DIF analyses, along with JS NEO-A50 domain Omegas, were computed employing Mplus software, version 8.4 (Muthén & Muthén, 2017).
Last, and for criterion validity, we performed stepwise multiple linear regression analyses to predict antisocial behavior, aggression, anxiety problems, depressive symptoms, life satisfaction and academic performance. We controlled for age and gender in a first step (males were coded as "1" and females as "2"), whereas the JS NEO-A50 traits acted as predictors in a second step. Reliability and validity indices, as well as gender differences were calculated employing SPSS software, version 26.

Results
Correlations among all study 2 variables can be found in ESM Table 4.

Descriptives
Gender differences in mean domain scores were very similar to those in study 1, where females scored higher on all FFM traits (see ESM Table 5).

ESEM
The indices obtained by means of the ESEM showed acceptable goodness-of-fit values (χ 2 /df = 1.53), although the chi-square test did not acquire non-significance. This result from the chi-square test was expected due to the index's sensitivity to sample size. Nonetheless, only one index, TLI, had a slightly lower value than recommended for acceptable fit (≥ .90). DIF analysis for gender revealed no direct effects, except for one on openness item 8r. Model fit was unchanged when this direct effect was accounted for. DIF for age yielded no direct effects (see Table 3). Table 3 about here All target loadings were statistically significant and above .30, with reversedscored items loading negatively onto their respective domains (except for agreeableness, due to all its items being reversed-scored). (see Table 4).

Reliability
Internal consistency reliabilities for domains in the present study were acceptable, ranging between .70 and .81 for alphas, and between .71 and .82 for Omegas (see Table 4), and very similar to those reported in study 1.

Criterion validity
In the first step of the regression, boys reported more externalizing behaviors and higher life satisfaction, whereas girls scored higher on internalizing symptoms and reported slightly higher grades. In the second step of the regression, the externalizing scales of aggressive and antisocial behaviors were predicted by (low) agreeableness and (low) conscientiousness traits. Anxious symptoms were positively and significantly linked to neuroticism, while depressive symptoms were predicted by both neuroticism, introversion and (low) conscientiousness traits. Life satisfaction was significantly associated with emotional stability, extraversion and, to a lesser extent, with conscientiousness. Finally, academic performance was positively and significantly associated with conscientiousness and openness. An unexpected but small association was also found between aggressive behavior and (low) openness (see Table 5). Table 5 about here

Discussion
Nowadays, the most accepted and useful framework in personality psychology is the FFM Soto, 2019), with the NEO inventories being the most used questionnaires. Thus, the main aim of the present two-study research was to develop a 50-item version the JS NEO-S (JS NEO-A50). Our main results revealed the adequate psychometric properties of the JS NEO-A50 despite its brevity. Specifically, the EFA conducted in study 1 showed that the abridged 50-item form adequately covered most facets embraced by the FFM. This structure was also replicated in an independent sample, as reflected in the ESEM performed in study 2, with items showing adequate loadings on their respective domains and acceptable goodness-of-fit indices. Thus, the questionnaire had a reasonably adequate FFM bandwidth, with 26 of the 30 facets covered by at least one item. Only four facets, Impulsiveness, from neuroticism, Trust, from agreeableness and Actions and Values, from openness were not properly represented. Impulsiveness and Trust usually present important secondary loadings in other dimensions (McCrae et al., 2010;Ortet et al., 2012), suggesting that these facets would result from a combination of two or more dimensions, and thus not being core traits of neuroticism and agreeableness, respectively. DIF analysis revealed that there were no differences in item functioning for the JS NEO-A50 across age and gender, with only one direct effect on a single item out of 50 for the latter.
The openness domain was the most problematic to fit correctly in the abridged scale, in line with previous research (Mervielde et al., 1995;Soto et al., 2008;Soto & Tackett, 2015). It has been argued that openness embraces the most maturity-based facets, with more sophisticated manifestations arising as children age (Tackett et al., 2012). Thereby, this dimension may not fully emerge until late adolescence (Allik et al., 2004;Tackett et al., 2012).
Regarding reliability and validity indices of the JS NEO-A50, our findings were mostly in line with the expected results. The moderate to high reliability indices, i.e., internal consistency and temporal stability, found in both studies were satisfactory and comparable to what is found in the scale's longer counterparts, i.e., the JS NEO and JS NEO-S (Ortet et al., 2012;Ortet et al., 2010), and in other short questionnaires (McCrae & Costa, 2010;Morizot, 2014;Soto & John, 2017). The analysis conducted to assess convergent validity in study 1 yielded adequate correlations among analogous FFM personality traits, with similar magnitudes as those found in other studies (Ortet et al., 2012;Morizot, 2014).
The measures of criterion validity in study 2 showed most of the expected results, after controlling for age and gender. In line with previous studies in adults (Jones et al., 2011;Kotov et al., 2010), our second study showed that those more disagreeable and unconscientious individuals scored higher on the externalizing behavior scales: antisocial and aggressive behavior. An unexpected finding was neuroticism being unrelated to aggression, contrary to what is typically found (Jones et al., 2011). Perhaps impulsiveness not being represented in neuroticism may partially explain this, as impulsiveness is a relevant trait, especially for reactive aggression (Miller & Lynam, 2006). We also did not anticipate (low) openness linking significantly to aggressive behavior, although previous studies have reported similar small associations (Jones et al., 2011).
The internalizing scales (depressive and anxious behavior) were associated especially with highly neurotic youngsters, in line with previous research in adults (Jeronimus et al., 2016;Kotov et al., 2010). We also found that introverts tended to show depressive symptoms, supporting the theoretical notion that low positive emotionality (closely associated with introversion) constitutes a personality-related vulnerability toward depression (Khazanov & Ruscio, 2016). Also, in line with previous research (Kotov et al., 2010), depressive symptoms were associated with low conscientiousness, probably reflecting the role of impulsivity and disinhibition in depression (Berg et al., 2015).
Life satisfaction, as expected, showed that emotionally stable and extraverted individuals were the happiest, in a similar vein that what is found for adults (Steel et al., 2019) and in other studies in adolescents (Suldo et al., 2015;Weber & Huebner, 2015).
Last, academic performance was strongly predicted by conscientiousness and, to a lesser degree, openness, the two most relevant personality dimensions for academic performance at all levels of education, from primary to tertiary education (Poropat, 2009;Richardson et al., 2012;Vedel, 2014).
One of the main limitations of this study was the sample sizes, which were clearly lower compared to other similar studies (e.g., Ortet et al., 2012). Also, use of convenience samples may constrain the generalization of the results. Further, method variance may have been an issue, as only self-reported instruments were employed. For instance, the overlap among internalizing symptoms and emotional stability may have been slightly overestimated. Thereby, replication efforts of these findings in wider and more representative samples of the adolescent population, employing parent and/or teacher reports additionally, would be desirable. It would also be helpful to employ

Open Science
Open Data: We confirm that there is sufficient information for an independent researcher to reproduce all of the reported results (https://osf.io/yrhc9/?view_only=5b95283d710147528b6ed41e51f36b59).
Open Materials: We confirm that there is sufficient information for an independent researcher to reproduce all of the reported methodology (https://osf.io/yrhc9/?view_only=5b95283d710147528b6ed41e51f36b59).

Preregistration of Studies and Analysis Plans:
This study was not preregistered. Note. Five factor extraction with Varimax rotation of the 50 items selected (10 per dimension) from the JS NEO-S. All loadings are provided in absolute values. Item numbers correspond with the JS NEO-S. Item numbers with an r are reverse scored. Shaded entries are the item loadings on their respective domains. Loadings ≥ .30 are shown in bold. N = Neuroticism; E = Extraversion; O = Openness; A = Agreeableness; C = Conscientiousness. 99% CI = 99% confidence interval. a n = 400; b n = 227.