Long-term brain effects of N-back training: an fMRI study

Neurobehavioral effects of cognitive training have become a popular research issue. Specifically, behavioral studies have demonstrated the long-term efficacy of cognitive training of working memory functions, but the neural basis for this training have been studied only at short-term. Using fMRI, we investigate the cerebral changes produced by brief single n-back training immediately and 5 weeks after finishing the training. We used the data from a sample of 52 participants who were assigned to either an experimental condition (training group) or a no-contact control condition. Both groups completed three fMRI sessions with the same n-back task. Behavioral and brain effects were studied, comparing the conditions and sessions in both groups. Our results showed that n-back training improved performance in terms of accuracy and response speed in the trained group compared to the control group. These behavioral changes in trained participants were associated with decreased activation in various brain areas related to working memory, specifically the frontal superior/middle cortex, inferior parietal cortex, anterior cingulate cortex, and middle temporal cortex. Five weeks after training, the behavioral and brain changes remained stable. We conclude that cognitive training was associated with an improvement in behavioral performance and decreased brain activation, suggesting better neural efficiency that persists over time.


Introduction
Working memory is necessary for a significant range of cognitive processes. It is important for everyday life because it is a determinant process in reasoning and in guiding decisionmaking and behavior (Diamond 2013). In the past, working memory was defined as a rigid attribute, but it is now known that working memory can be improved when adequate training programs are used (Klingberg 2010; Morrison and Chein 2011;von Bastian and Oberauer 2014). These behavioral studies have demonstrated both the immediate effects of this training and its long-term (2-12 months) efficacy once the training has ended. Cerebral changes produced by cognitive training have been studied, but only in the short-term, and so Electronic supplementary material The online version of this article (https://doi.org/10.1007/s11682-018-9925-x) contains supplementary material, which is available to authorized users. the stability over time of neural changes produced by this training remains unknown. Thus, the overall goal of the present study was to investigate the behavioral and cerebral changes produced by working memory training in the short and long term.
The simple way of responding to stimuli and the easy management of difficulty makes the n-back task an appropriate tool to monitor working memory processes. A large number of studies have been conducted on cognitive training using nback as the main task. All of them suggest that adequate nback training improves task performance in terms of accuracy and reaction times, even with relatively short-term training (e.g. Buschkuehl et al. 2014;Hempel et al. 2004;Jaeggi et al. 2008;Küper and Karbach 2016;Li et al. 2008;Salminen et al. 2012;Schneiders et al. 2011;Takeuchi et al. 2010; Thompson et al. 2016;Yamashita et al. 2015). Generally, participants double or triple their pre-training performance levels (Kundu et al. 2013;; Thompson et al. 2013;Jaeggi et al. 2008). The researchers used training programs ranging from 60 to 1500 min, with no major differences in improvement (Anguera et al. 2012;Jaeggi et al. 2010a, b;Schneiders et al. 2012;Vartanian et al. 2013), but little research has focused on the effects of brief n-back training. The majority utilized an n-back adaptive task during training to manipulate the level of difficulty depending on the participant's performance. Both the single n-back and the dual nback task have been used in training studies, with the latter being the most widely utilized, although both have shown efficacy in improving working memory capacity (Jaeggi et al. 2010a, b). In a very recent study, Küper and Karbach (2016) compared brief single n-back and dual n-back training, concluding that both showed equivalent improvement. Moreover, the authors concluded that in short periods of training, single n-back training can be more effective than dual nback training (Küper and Karbach 2016). Despite all the existing literature, only a few studies have tested the longterm (2-8 months) effects of n-back training, and they found that the behavioral changes observed remained stable (Jaeggi et al. 2011Katz et al. 2017;Li et al. 2008), although a decrease was observed in the performance between posttraining and the follow-up session (Thompson et al. 2013).
N-back is one of the most common experimental paradigms for functional magnetic resonance imaging (fMRI) studies of working memory (Dobbs and Rule 1989;Jaeggi et al. 2010a;Owen et al. 2005;Redick and Lindsey 2013;Wager and Smith 2003). In a meta-analysis by Owen et al. (2005), twenty-four fMRI studies with healthy subjects who performed the n-back task were analyzed in order to find the cerebral regions involved. They studied the brain areas activated depending on the type of stimulus used in the task. Their results showed six cortical regions and two subcortical regions activated by verbal stimuli: the lateral premotor cortex, dorsal anterior cingulate and supplementary motor area, dorsolateral and ventrolateral prefrontal cortex, frontal pole, and bilateral and medial posterior parietal cortex; subcortically, the medial and lateral cerebellum and thalamus were activated (Owen et al. 2005).
There are few studies examining the cerebral changes produced by cognitive training on working memory (see the review by Buschkuehl et al. 2012). In that review, they concluded that there was evidence for brain changes in specific areas in terms of activation, but there was no agreement about whether this activation increased, decreased, or underwent redistribution, or even whether a reorganization of networks took place . First, regarding studies that reported increases in brain activation after working memory training, Buschkuehl et al. found limited evidence for this effect. For example, Westerberg and Klingberg (2007) evaluated the cerebral changes in only three young volunteers after 5 weeks of working memory training, and they found a significant increased activation in the middle or inferior frontal gyrus and parietal cortex. This finding coincides with the results from a similar prior experiment carried out by this group (Olesen et al. 2004), where they found increased brain activity in the middle frontal gyrus and superior and inferior parietal cortices after 5 weeks of cognitive training. Second, regarding research that observed both increased and decreased activation (activation redistribution) after working memory training, one study by Dahlin et al. (2008; Experiment 1) stands out. In that study, analyses of pre-and post-training changes in the fMRI data showed increased activation in the left striatum, temporal, and occipital regions, but also decreased activity in frontal and parietal areas. Participants engaged in 5 weeks of computerbased updating training on a working memory task. Finally, Buschkuehl et al. stated that no noteworthy working-memory training studies showed network reorganization.
Among fMRI studies that reported a decrease in cerebral activation, the majority used the n-back task for training (Hempel et al. 2004;Schneiders et al. 2011Schneiders et al. , 2012Schweizer et al. 2013;Thompson et al. 2016). Hempel et al. (2004) carried out 4 weeks of n-back training, and cerebral activation was examined before, after 2 weeks, and at the end of the training using fMRI. There was no control group. Their results showed an increased activation after 2 weeks of training in the right inferior frontal gyrus (BA 45) and the right intraparietal sulcus (BA 39/40), but activation decreased in the same areas after 4 weeks of n-back training, forming an inverse U-shaped activation pattern. In addition, Schneiders et al. (2011) used an 8-10 day adaptive n-back training (between 400 and 500 min) and performed fMRI pretest and posttest sessions. There were two training groups (visual or auditory nback) and a no-contact control group. The authors observed decreased activation in the right superior middle frontal gyrus (BA 6/9/46) and right posterior parietal lobule (BA 40). In another study, Schneiders et al. (2012) reported the same activation pattern after training their participants for approximately the same length of time as in their prior study, but only on an adaptive auditory n-back task. Another study that used n-back for training was Schweizer et al. (2013), but in this case, the researchers trained their subjects on an adaptive affective n-back task for between 20 and 30 min during 20 days, and they had an active control group for comparison. They found activation decreases in the 3-back load level in the left dorsolateral prefrontal cortex, right superior frontal gyrus, bilateral supramarginal gyrus, bilateral middle temporal gyrus, and bilateral middle occipital lobe. In a recent fMRI study (Thompson et al. 2016) the participants were trained on an adaptive dual n-back task in 20 sessions distributed across 4 weeks. Before and after the training, volunteers were scanned on the non-adaptive trained task. The authors used both an active and a no-contact control group. Once more, they reported reductions in activation in the bilateral inferior and middle frontal gyrus, insular cortex, and intraparietal sulcus. One study carried out by Vartanian et al. (2013) showed that very brief working memory training also causes cerebral modifications. They performed only 60 min of single n-back training and reported decreased brain activation in prefrontal areas (BA 46 and 47) (Vartanian et al. 2013). All the studies agree on the brain regions where the activation reductions were found.
On the other hand, Buschkuehl et al. (2014) wanted to test whether brief n-back training (less than 3 h) increased taskrelated activation while participants performed difficult levels of the n-back task (4-back) using arterial spin labeling (ASL). They found that activation increased in prefrontal (BA6) and occipital (BA19) areas after training. 4-back places a high demand on cerebral resources, and it would involve an increment in the magnitude of perfusion ). This activation pattern agrees with the compensation-related utilization of neural circuits hypothesis (CRUNCH) (Reuter-Lorenz and Cappell 2008). This hypothesis postulates that people will activate more cortical regions as task load or resource demands increase. Previous results showed that at lower levels of task demands, older adults activate their taskrelated brain areas more than younger adults to achieve similar performance as younger adults. However, at harder levels, older people showed reduced task efficiency and less activation than young adults (Heinzel et al. 2014(Heinzel et al. , 2016. Thus, in accordance with this hypothesis, the activations in task-related areas would increase or decrease depending on the difficulty of the task. Based on all of this literature, it is difficult to make predictions about cerebral changes related to working memory training. Decreased activation is the most frequent result found after cognitive training in studies that use n-back, but it has been observed that with high-level demands, the activation increases ).
In the present study, our main goal was to examine the long-term cerebral changes after working memory training because, to date, we did not find any study that held a follow-up session to evaluate the stability of the cerebral changes over time. By means of a longitudinal fMRI study, we examined the behavioral and functional data before a brief n-back training, immediately after it, and 5 weeks after finishing the training. Our participants trained for a total of 200 min on an adaptive version of single n-back on 1-back, 2back and 3-back levels. That specific training was chosen because, based on previous findings mentioned above, in short periods of training, single n-back training would be more effective than dual n-back training. In the light of this, we hypothesized that: 1) Training processes would result in decreased activation of brain areas already involved in working memory, the frontal and parietal areas, in the short term; 2) The participants who trained on the adaptive n-back would produce faster responses and respond more accurately than non-trained participants immediately after n-back training, and this advantage would be maintained 5 weeks later; 3) After 5 weeks without training, the cerebral activation in the task-related brain areas would increase to compensate for the lack of training.

Participants
Fifty-two healthy right-handed participants (21 male) with ages ranging between 21 and 26 years (mean age = 22.60 ± 1.45) participated in this study. Subjects were recruited from the student population of the University Jaume I, and none of them reported a previous psychiatric or neurologic diagnosis. Informed consent was obtained from each subject before participation, and they received monetary compensation for their time and effort. Participants were randomly assigned to either an experimental condition (training group) (N = 25, mean age = 22.77 ± 1.5, 9 men) or a control condition (control group) (N = 27, mean age = 22.44 ± 1.4, 12 men). Their intellectual level was assessed with the Matrix Reasoning Test (WAIS-III-R) (trained group: mean = 21.04 ± 3.42; control group: mean = 21.81 ± 2.02). The two groups differed only in the training (control group did nothing). The Ethical Committee of the Universitat Jaume I approved the research project.

Experimental paradigm
Both groups completed three fMRI sessions with the same adapted block-design n-back task (Zou et al. 2013). A schematic description of the experimental design is represented in Fig. 1. The pre-training session, post-training session, and follow-up session correspond to Session 1 (S1), Session 2 (S2), and Session 3 (S3), respectively. Visual stimuli (letters) were presented electronically using E-Prime software (Psychology Software Tools, Pittsburgh, PA), professional version 2.0, installed in a Hewlett-Packard portable workstation (screen-resolution 800 × 600, refresh rate of 60 Hz).
Participants watched the laptop screen through MRIcompatible goggles (VisuaStim, Resonance Technology, Inc., Northridge, CA, USA), and their responses were collected via MRI-compatible response-grips (NordicNeuroLab, Bergen, Norway). The E-Prime's logfile saved each participant's accuracy and reaction time (RTs) to each stimulus.

N-back fMRI task
The task was presented in three load levels: two working memory blocks (2-back and 3-back) and a baseline control task (0-back). In 0-back, subjects pressed the Byes^button when the target (letter X) appeared on the screen, and they responded Bno^to any other letters. In the 2-back and 3-back load levels, participants pressed the Byes^button when the current letter shown on the screen matched the one presented 2 or 3 items back, and they pressed Bno^when there were no targets (see Fig. 2a). Subjects had to give manual responses with only their right hand, responding to targets with their thumb and to non-targets with their forefinger.
With a total of nine blocks, three for each load level, the entire task lasted 11 min. Each block lasted 60.7 s and consisted of 200 ms of a blank screen, followed by 30 (6 target) consecutive trials of single letter stimuli (500 ms duration, 1500 ms inter-stimulus interval) with 500 ms of a blank screen at the end of each block. In addition, 8000 ms of a fixation cross and 2000 ms of an instruction display indicating task difficulty (0-back, 2-back or 3back) were included before each block (see Fig. 2b). There were 270 stimuli in all, and 54 of them were targets. The sequence of the stimuli was pseudo-randomized. The visual material comprised 15 different capital letters from the alphabet (B, C, D, F, G, H, J, L, N, P, Q, R, S, T and V). Any letter could be a target in 2-and 3-back, but in 0-back only the BX^letter was the target. The letters, instructions, and fixation point were presented in the middle of the screen on a white background. All of them were in black ink with a 54-point Arial font. The task did not contain any lures.
Subjects received oral instructions about how to do the task, and they performed a 5-min practicing task. In that, participants performed three blocks, one per load level, with only 15 trials (3 targets), in order to become familiar with the stimuli presentation and with the response buttons. A similar laptop with the same display features and the same hardware for manual responses was used outside of the scanner. Participants were asked to answer accurately and as quickly as possible.

N-back training task
The training group carried out four consecutive sessions of single n-back training after fMRI S1 in our laboratory located at the University. One training session lasted 60 min and was distributed in two phases: the learning part and the test part. In the learning part, participants performed an adaptive n-back paradigm adapted from Jaeggi et al. (2008) for 50 min, whereas in the test part, they performed a simple n-back task, which lasted 10 min. Therefore, the total training time was approximately 200 min, plus 40 min for the test part. We used the same laptop as in the fMRI sessions, with the same display features and the same hardware for manual responses. Participants performed only one training session per day. As with our fMRI n-back task, no lures were present in our training task.  For the adaptive n-back task, we used the same stimuli and block timing as in the n-back fMRI task. However, we made some changes: the 0-back load level (0-back) disappeared, a new load level (1-back) was introduced, and participants were given feedback about their performance after each stimulus and at the end of each block. In 1-back, participants pressed the Byes^button when the current letter shown on the screen matched the one presented immediately before, and they pressed Bno^in response to any other letters. We lengthened the task to approximately 16 min, and subjects performed three runs per training session. Once again, participants were asked to answer accurately and as quickly as possible.
In this task, we changed the level of difficulty by changing the level of Bn^(1, 2 or 3) in order to motivate participants to improve. After each block, the participant's individual performance was analyzed, and the n-back level was automatically adjusted. Thus, if the participant had at least 90% correct answers, the level of Bn^in the next block was increased by one, but it was decreased by one if accuracy was below 80%. In all other cases, the n-level remained constant (Salminen et al. 2012). In the last run, we increased the percentage by 5 % to make it more difficult. Therefore, if the participant had at least 95% correct answers, the level of Bn^was increased by one, whereas it was decreased by one if accuracy was below 85%. Each run started with the minimum level of Bn^(1) for motivational reasons (Schneiders et al. 2012). Feedback was introduced after each response: a colored circle appeared for few seconds at the corner of the screen: green if the answer was correct, red if it was an error, and blue if participants did not press any button. Furthermore, at the end of each block, subjects received information about their performance: correct response percentage and reaction time average.
In the test part, participants performed an eight-block nback task. We used the same stimuli and block timing as on the n-back fMRI task, but without the 0-back load level. Subjects had no feedback this time. Their results on this test were useful to evaluate their progress on n-back.

Neuroimaging data acquisition
Functional MRI data were collected on a 1.5 T Siemens Symphony scanner (Erlangen, Germany). The same sequences were used in the three sessions. Participants were placed in a supine position in the MRI scanner, and their heads were immobilized with cushions to reduce motion artifacts. For task-fMRI, a gradient-echo T2*-weighted echo-planar MR sequence covering the entire brain was used (TR/TE = 2500/49 ms, matrix = 64x64x28, flip angle = 90°, voxel size = 3.5 × 3.5 × 4.48; slice thickness = 4 mm; slice gap = 0.48 mm). A total of 270 volumes were recorded. The slices were made parallel to the anterior-posterior commissure plane covering the entire brain. Before the functional magnetic resonance sequences, a high-resolution structural T1-weighted MPRAGE sequence was acquired (TR = 2200 ms; TE = 3 ms; flip angle 90°, matrix = 256 × 256 × 160; voxel size = 1 × 1 × 1 mm).

Behavioral analysis
IBM SPSS Statistics software (Version 22 Armonk, New York, USA) was used to process the behavioral data (accuracy and RTs for participants' performance). A repeated-measures 2x3x3 mixed model ANOVA was conducted for each variable, using Group (training x control) as the betweensubjects factor and Load Level (0-

Neuroimaging analysis
Preprocessing Preprocessing and statistical analysis of fMRI data were conducted with SPM12 (Wellcome Trust Centre for Neuroimaging, London, UK). We aligned each subject's fMRI data to the AC-PC plane by using his/her anatomical image. The fMRI preprocessing included head motion correction, where the functional images were realigned and resliced to fit the mean functional image. No participant had a head motion of more than 2.5 mm maximum displacement in any direction or 2.5°of any angular motion throughout the scan. Afterwards, the anatomical image (T1-weighted) was coregistered to the mean functional image, and the transformed anatomical image was then re-segmented. The functional images were spatially normalized to the MNI (Montreal Neurological Institute, Montreal, Canada) space with 3 mm 3 resolution, and spatially smoothed with an isotropic Gaussian kernel of 8 mm FWHM (Full-Width at Half-Maximum).

First level of analysis
Statistical analyses were performed in the context of the General Linear Model (Friston et al. 1995) for each participant and for each time point, using SPM12. In the first level analysis, we modeled the load levels of interest corresponding to 2-back > 0-back, 3-back > 0-back, and 2 and 3-back > 0back. The BOLD signal was estimated by convolving the stimuli onset with the canonical hemodynamic response function. Six motion realignment parameters were included to explain signal variations due to head motion, that is, as covariates of no interest. A high-pass filter (128 s) was applied to the functional data to eliminate low-frequency components. Then, contrast images were obtained to directly compare our load levels of interest. For the cross-sectional analysis, the first session (S1) load levels of interest were compared in order to assess differences between the n-back load levels before learning.

Statistical analysis
In the cross-sectional analysis, a whole-brain one-sample t test was conducted in order to study the brain regions involved in the n-back task (2-back and 3-back load levels > 0-back load level) using the fMRI data collected in S1. In addition, first session data were used to perform a two-sample t test to examine the equality of the brain responses in the two groups, so that between-groups brain differences found in subsequent sessions would be due to training effects. Test-retest reliability analyses (one sample t test) for imaging control group data are provided in Supplementary Information.
In the second-level analysis, the longitudinal analysis was performed separately for 2-back and 3-back, with interaction analysis between sessions to evaluate: 1) the immediate effect of training, comparing S2 to S1; 2) the long-term effects of training, comparing S3 to S1; and 3) the effects of differences between immediate and long-term effects, comparing S3 to S2. To avoid false positives in the fMRI analyses (Woo et al. 2014), the statistical criterion was set at p < 0.05, and familywise error (FWE) was cluster-corrected for multiple comparisons (voxel-level uncorrected threshold of p < 0.001; specific cluster sizes appear in each result).

Behavioral fMRI results
The repeated measures 2x3x3 mixed-model ANOVA conducted for accuracy yielded main effects for Session (F (2,50) = 34.66 p < .001) and Load Level (F (2,50) = 42.85 p < .001), which means that all the participants reduced their mistakes in the post-training and follow-up sessions, compared to S1, and that the highest accuracy scores were observed during the 0-back. These main effects were driven by significant Group x Session (F (2,50) = 7.77 p = .001), Load Level x Session (F (4,48) = 13.07 p < .001) and Load Level x Group (F (2,50) = 7.23 p = .002) interactions. The first interaction indicated that trained participants were better than controls during the post-training and follow-up sessions, the second indicated that differences between load levels were greater at pre-training, whereas the third reflected that the training group showed better performance than the control group on 2back and 3-back. As expected, the Load Level x Session x Group interaction reached significance (F (4,48) = 4.01 p = .007), which means that the trained group became more accurate in the post-training and follow-up sessions than the control group, when performing the 2-back and 3-back load levels (see Fig. 3a). Post-hoc analyses revealed that these differences were significant for 3-back vs 0-back (p = .002), and they only approached significance for 2-back vs 0-back (p = .13).
Analyses of RTs scores revealed a similar pattern to that of accuracy. The 2x3x3 ANOVA also yielded significant main effects for Session (F (2,50) = 51.59 p < .001) and Load Level (F (2,50) = 75.37 p < .001). Both groups responded faster in the post-training and follow-up sessions than in the pre-training session. Also, both responded faster in the 0-back load level compared to the 2-back load level, as well as in the 2-back load level compared to the 3-back load level. Significant twoway interactions were obtained for the Group x Session (F (2,50) = 28.14 p < .001) and Load Level x Session (F (4,48) = 28.23 p < .001) interactions. The first two interactions may be interpreted similarly to accuracy, participants were faster than controls during the post-training and follow-up and the differences between load levels were greater at pre-training. Importantly, all these significant effects were qualified by the three-way Load Level x Session x Group interaction, which was highly significant (F (4,48) = 11.34 p < .001). As expected, this interaction showed that the training group, compared to the controls, was faster after training and in the follow-up session in the 2-back and 3-back load levels (see Fig. 3b). Post-hoc analyses revealed that this effect was significant for both 2-back vs 0-back and 3-back vs 0-back load levels (p < .001).
In sum, these results show that there were greater improvements in the 2-back and 3-back load levels after cognitive training, and that these improvements remained stable after 5 weeks.

Behavioral training results
With the behavioral training data for the training group, a repeated-measures 2×4 ANOVA was conducted with the results of the test part of the training to evaluate their progress on n-back. For accuracy training performance, a main effect of Training Session (F (3,27) = 6.49 p < .05) and Load Level (F (1,29) = 11.99 p < .05) was found, indicating participants' improvement, in terms of correct answers from one training session to another, and reductions in their mistakes on both 2back and 3-back. For RT values, we could see a significant effect of Training Session (F (3,23) = 10.35 p < .001), which means that subjects' RTs decreased from one training session to another (see Fig. 4 for more values). As expected, these results confirmed the great progress of the training group on n-back performance after 200 min of training.

Cross-sectional analysis: Task effects at baseline
A whole-brain one-sample t test was conducted in order to study the brain regions involved in the n-back task (2back and 3-back load levels >0-back load level). We used the fMRI data collected in S1. This analysis showed significant cortical and subcortical activations in brain areas related to working memory. Studying the task effects for each 2-back and 3-back load level (2-back>0-back and 3-back>0-back) separately (see Fig. 5), the same areas were activated: bilateral superior, middle and inferior frontal cortex (BA 6/8-11/32/45-48), including supplementary motor area/anterior cingulate gyrus (SMA/ACC) (BA 6/ 32) and the insula (BA 47), bilateral superior and inferior parietal cortex (BA 7/40), including precuneus, and bilateral cerebellum (crus I). Midbrain areas (thalamus and globus pallidus) were not significantly activated in 3back, whereas they were in 2-back. Results were p < .05 FWE cluster-corrected using a threshold of p < .001 at the uncorrected voxel level with a cluster extension of k = 2504 voxels for 2-back and k = 143 for 3-back.
The two-sample t test analysis performed between groups to examine the equality in brain responses in S1 yielded no significant functional differences. As a result, the brain differences found between groups in subsequent sessions were due to training effects. The threshold was p < 0.001 uncorrected at the voxel level.

Learning effects
To study the effects of training on the brain, an interaction analysis was conducted. Therefore, a 2×2 ANOVA (Group x Session) was carried out separately for each load level (2back and 3-back). When studying the training effects by Fig. 3 Results of the behavioral analysis. a Correct-response percentage and b mean reaction times (in milliseconds) per session have been plotted as a function of load level and time. Pretraining session, post-training session and follow-up session correspond to Session 1, Session 2 and Session 3, respectively. Training group data correspond to the dark broken lines (circles) and control group data to the light solid lines (squares). RT = Reaction Time. Error bars represent standard error comparing S1 vs S2 and S1 vs S3 in the 2-back load level, we found similar results. These interaction analyses yielded activations in the bilateral superior frontal cortex (BA 8-9), including the SMA/ACC (BA 6/32), dorsolateral prefrontal cortex (BA 9/46), inferior frontal cortex , and right inferior parietal cortex (IPC) (BA 39-40), in the trained group compared to the control group (see Fig. 6 and Table 1). The reverse contrast yielded no significant effects. Results were p < .05 FWE clustercorrected using a threshold of p < .001 at the uncorrected voxel level and a cluster extension of k = 125 voxels and k = 87 voxels, respectively.
Regarding the 3-back load level, there were differences in the affected areas depending on the sessions compared. In the Trained group (S1 > S2) > Control group (S1 > S2) contrast, the analyses showed activations in the bilateral superior/middle frontal cortex (BA 8-11/46), including the SMA/ACC (BA 6/32), left insula, bilateral IPC , and left temporal middle cortex (BA 21), in the trained group compared to the control group. On the other hand, in the Trained group (S1 > S3) > Control group (S1 > S3) contrast, the difference was found in the right IPC (BA 40), bilateral insula, SMA/ACC (6/32), bilateral inferior frontal cortex , and dorsolateral prefrontal cortex (BA 9). The reverse contrasts yielded no significant differences. In Fig. 7 and Table 2, we have included the results and values of these comparisons for each load level. The threshold was at p < .05 FWE cluster-corrected using an auxiliary threshold of p < .001 at the uncorrected voxel level and a cluster extension of k = 89 voxels and k = 67 voxels, respectively.
When studying the stability of the effects of the working memory training over time, an interaction analysis was also conducted (Trained group (S2 > S3) > Control group (S2 > S3)) separately for 2-back and 3-back. No significant effects were found in either load level or any comparison. The threshold was p < .001 uncorrected at the voxel level.
In sum, comparing the pre-training session to the posttraining session and the follow-up session, decreased  Results of the adaptive n-back post-training effects for 2-back load level: a represents the contrast: Trained group (S1 > S2) > Control group (S1 > S2) and b represents the contrast: Trained group (S1 > S3) > Control group (S1 > S3). Results were p < .05 FWE cluster-corrected using a threshold of p < .001 at the uncorrected voxel level and a cluster extension of k = 125 voxels and k = 87 voxels respectively. Left (L) and right (R). Coordinates are in the MNI space. Color bars express t-scores Table 1 List of brain activations as a result of the post-training session and follow-up session in 2-back load level between groups comparison: a) comparing Session 1 with Session 2 and b) comparing Session 1 with Session 3 MNI SPACE Zvalue BA Cluster extent x y z a) Trained Group (S1 > S2) > Control Group (S1 > S2) FOOTNOTES: Results were p < 0.05 FWE cluster-corrected using a threshold of p < 0.001 at the uncorrected voxel level, and a cluster extension of k = 125 voxels and k = 87 voxels respectively L Left, R Right, BA Brodmann Area, SMA supplementary motor area activation was found in working memory brain areas when studying the 2-back or 3-back load level. These results were found for trained participants compared to controls. However, no differences were found between the post-training and follow-up sessions, which means that the effects of training remained stable after 5 weeks.

Discussion
The present fMRI research focused on studying the behavioral and neural changes associated with working memory training and their stability over time. To accomplish this, we randomly separated participants into two groups (training or control group), and both groups completed three fMRI sessions performing the same n-back task. The training group was trained outside of the scanner on an adaptive version of the single n-back task for 200 min in four training sessions between the pre-training and post-training sessions. A follow-up session was held after 5 weeks of no training. Our results showed significant behavioral and functional differences between groups related to the working memory training. N-back training improved the performance on the task, and these behavioral changes were accompanied by decreased activation in diverse brain areas related to working memory, specifically, in the frontal superior/middle cortex, inferior parietal cortex, anterior cingulate cortex, and temporal middle cortex. Importantly, 5 weeks after the training, the behavioral and brain changes remained stable. Our results demonstrate that our cognitive training program improved behavioral performance and cause cerebral modifications that persist over time when compared with a no-contact control group. Training effects were observed in terms of accuracy and RTs. Generally, all the participants in both groups improved their performance in the post-training session compared to the pre-training session. Control group improvements could be explained by retest effects due to task repetition, as reported in previous cognitive studies (Jaeggi et al. 2008;Schneiders et al. 2011). However, the training group reduced their errors and RTs significantly more than the control group in both working memory load levels. As expected, 200 min of working memory training on our adaptive single n-back task yielded an improvement in performance in terms of accuracy and reaction times. Our follow-up findings also showed that these behavioral changes remained stable 5 weeks after completing the training. A non-significant decrease was noted in the performance from the post-training session to the follow-up session because there was no additional training. These results agree with previous n-back training studies (e.g. Jaeggi et al. 2011;Thompson et al. 2013Thompson et al. , 2016. The n-back task activation pattern reported here, which includes the frontal, parietal, cerebellar, and subcortical areas, coincides with previous neuroimaging studies (Owen et al. 2005). All the activations found were bilateral and located specifically at the SMA/ACC (BA 6/32), superior, middle, and inferior frontal cortex, including the anterior insula, superior and inferior parietal cortex (BA7/40), including Fig. 7 Results of the adaptive n-back post-training effects for 3-back load level: a represents the contrast: Trained group (S1 > S2) > Control group (S1 > S2) and b represents the contrast: Trained group (S1 > S3) > Control group (S1 > S3). Results were p < .05 FWE cluster-corrected using a threshold of p < .001 at the uncorrected voxel level and a cluster extension of k = 89 voxels and k = 67 voxels respectively. Left (L) and right (R). Coordinates are in the MNI space. Color bars express t-scores Table 2 List of brain activations as a result of the post-training session and follow-up session in 3-back load level between groups comparison: a) comparing Session 1 with Session 2 and b) comparing Session 1 with Session 3 MNI SPACE Zvalue BA Cluster extent x y z a) Trained Group (S1 > S2) > Control Group (S1 > S2) FOOTNOTES: Results were p < 0.05 FWE cluster-corrected using a threshold of p < 0.001 at the uncorrected voxel level, and a cluster extension of k = 89 voxels and k = 67 voxels respectively L Left, R Right, BA Brodmann Area precuneus, cerebellum (crus I), and thalamus. Formerly, in working memory, the prefrontal cortex was considered a warehouse of information (Smith and Jonides 1999), but current views give the prefrontal cortex the function of controlling the cognitive processing of information, selecting stimuli, and producing adequate responses (Postle 2006). There is increasing evidence supporting this view (Lara and Wallis 2015). In addition, executive manipulation of acquired facts has been associated with the parietal lobe (Koenigs et al. 2009), as well as the storage function of working memory (Owen et al. 2005) as well as attentional processes of working memory (Berryhill et al. 2011). Regarding the subcortical areas, the cerebellum assumes cognitive information processing functions due to its connections with the prefrontal cortex (Hayter et al. 2007;Vandervert 2009). The thalamus, due to its attentional role of filtering relevant information, helps the prefrontal cortex in its working memory function (Watanabe and Funahashi 2012). Our findings are generally consistent with previous n-back functional neuroimaging studies that report decreased activation after training (Schneiders et al. 2011(Schneiders et al. , 2012Schweizer et al. 2013;Thompson et al. 2016). In relation to training activation changes, our imaging data revealed that participants who belonged to the training group showed decreased activation in various cerebral areas related to working memory. Decreased activation has been interpreted as an indication of better neural efficiency in these areas, thus improving their function. This decline in cerebral activation may allow participants to respond more quickly and make fewer mistakes . Kelly et al. (2006) noted that this effect of decreased activation is typically observed after training on higher cognitive tasks, and they stated that lower activation is associated with increased neural efficiency, which means that fewer neurons are needed to give a fast and accurate answer to the task. However, some studies have criticized the better neural efficiency explanation for the decreases in activation for being overly simple and unclear (Constantinidis and Klingberg 2016;Poldrack 2015). In his review, Poldrack (2015) viewed efficiency as inverted energy for the transmission of information in the brain networks. He highlighted the need for new studies and models to examine the neural changes, and he reported that identifying potential activation effects may lead to future mechanistic explanations. Therefore, although a decrease in activation is often interpreted as an increase in neural efficiency in the literature, our data did not demonstrate the underlying cellular mechanism, but instead they pointed to the areas of change after working memory training. The bilateral superior frontal cortex (BA 8-9), IPC (BA 40) and SMA/ACC (BA 6/32) were the areas affected by this activation reduction in both 2-back and 3-back. During 3-back performance, we also found a decreased activation in the left insula and left middle temporal cortex (BA 21). The main effect only on 3-back performance in these specific areas may be due to more demanding load levels than those of 2-back (Thompson et al. 2016).
The activation decreases in the superior part of the frontal cortex in both hemispheres were expected because the dorsolateral prefrontal cortex is strongly involved in working memory processes (Lara and Wallis 2015). It is essential for continuous updating processes, attention focus, and ordering and selecting stimuli, which are fundamental processes in performing the n-back task successfully. Other areas related to working memory where this effect was found were the IPC, SMA and ACC. Regarding IPC, this area is in charge of the phonological store, as demonstrated in studies with patients with lesions in this area (Baldo and Dronkers 2006). This storage of verbal information is necessary to carry out our nback task because we used letters as stimuli. Moreover, the IPC is typically activated when an attentionally-demanding maintenance strategy is used (Berryhill et al. 2011). With regard to SMA, this area has been related to the planning of sequences of movement, motor learning, and motor activation of the hand. In our case, as participants had to give the answer by pressing a button with their right hand, the activation decreases in these areas were accompanied by a decrease in RTs. On the other hand, the ACC has been related to error detection (Bush et al. 2000), which is crucial to carry out our working memory task. Menon and Uddin (2010) said that the ACC and the insula work together in the detection of important stimuli and in initiating attentional control signals. Regarding the middle temporal cortex, further investigation is needed to determine its exact relationship with working memory.
One of the novel goals of the present study was to investigate the long-term effects of cognitive training. We did not find any longitudinal fMRI research that studies the stability of brain changes produced by working memory training, and so we cannot compare our functional results. Our fMRI analysis showed no significant changes between the two sessions, which means that the changes due to n-back training remained stable after the training ended. Our findings showed that the main effect that occurred between the pre-training and posttraining sessions (decrease in activation) was present in the same areas when comparing the pre-training session with the follow-up session. Thus, our results demonstrate that the behavioral and cerebral changes produced by working memory training remain stable after 5 weeks without training. The stability of these brain changes after 5 weeks could suggest an improved efficiency of these areas because we found no modifications in the results when comparing S2 and S3 in the behavioral analysis or the fMRI analysis. The follow-up session seems to be a necessary component of any working memory training paradigm designed to create enduring improvements (Thompson et al. 2013).
Overall results are partially consistent with the CRUNCH theory (Reuter-Lorenz and Cappell 2008). In fact, the reduction in activations after training in the training group may be explained by the theory because training reduced the required task demands. However, we also expected that 5 weeks of no training would increase the activations on the task, but this was not the case, indicating that the positive effects of training were maintained without any loss for at least 5 weeks, as the behavioral and neural data suggest.
This study has a few limitations. We used a no-contact control group that did not receive any training. The training group came to our laboratory on four consecutive days, and they had more contact with the experimenters than the control group did, which may result in motivational differences between the two groups in terms of task efficiency. Nonetheless, the control group improved their performance from S1 to S2 and from S2 to S3. Although this may be attributed to the retest effect, we also note that this improvement would not have taken place if there had been a lack of motivation. In any case, in future studies, active control groups should be included in the study design because the observed gains may not be due to working memory training per se, but to the training in general. Another limitation may be the short training period (200 min), although some studies have used the same training time or less and showed behavioral improvements and cerebral changes Jaeggi et al. 2008;Küper and Karbach 2016;Vartanian et al. 2013;Yamashita et al. 2015). In addition, we have chosen a brief single n-back training with an eye on future clinical interventions. A long training protocol might be difficult and costly for patients and institutions. Therefore, we wanted to evaluate the effects of this kind of short working memory training regime on healthy controls to allow comparisons with clinical populations in future studies. The practice is limited to 1-, 2-and 3-back levels, which may not seem challenging, but 3-back is considered a highly demanding task, and participants reported that they always tried to get better results because they could see their correct response percentage and reaction time average. Finally, the expression Blong-term^should be used with care because 5 weeks is not much time compared to other studies investigating long-term effects for at least 3 months to 1 year (e.g. Jaeggi et al. 2014;Katz et al. 2017;Thompson et al. 2013). In any case, we think it is relevant that this is the first manuscript to investigate brain reorganization weeks after the training is over. Future studies should determine this stability in longer retest periods.
In conclusion, n-back training not only improves behavioral performance, but it also causes cerebral modifications as signaled by the decrease in the activation of various brain areas related to working memory. These behavioral and neural changes are stable and persist after weeks with no training on the task. The future challenge is to determine whether this kind of training has the same effects in a clinical population and could be translated into beneficial and long-lasting treatments, and test whether these changes last longer than 5 weeks.