Peer tutoring in middle school mathematics: academic and psychological effects and moderators

Abstract In this study reciprocal peer tutoring was implemented in middle school mathematics classes for a six-month period. The effects of this methodology on students’ mathematics achievements, mathematics anxiety levels, attitudes towards mathematics, and mathematics self-concepts were examined. Study participants included 768 students in grades 7–9 (12–15 years old). Statistically significant differences and positive effect sizes were reported for mathematics achievement, mathematics anxiety level, and mathematics self-concept (Hedge’s g average value of 0.28). No significant statistical differences were reported for mathematics attitude. Students’ repeating condition acted as significant moderators for mathematics anxiety. The main conclusion is that reciprocal peer tutoring can be very beneficial academically and psychologically for middle school mathematics students.


Introduction
Peer tutoring (PT), commonly referred to as classwide peer tutoring (CWPT) (Heron et al., 2006), is an active learning methodology in which pairs of students teach each other while they, themselves, learn by teaching Guill et al., 2020). During PT, the more competent student (tutor) helps a less competent peer (tutee) to understand curricular content (Hoffman et al., 2020). Different types of PT can be implemented, depending on students' ages and roles. One type is reciprocal PT, during which students switch roles, going from tutor to tutee . Fixed PT takes place when students do not switch roles (Hajar, 2020). Depending on the students' ages, same-age tutoring (Moliner & Alegre, 2020a) or cross-age tutoring (Zeneli et al., 2016) may be arranged. Although PT can be implemented in a variety of forms, and the organisational settings (length of PT sessions, number of sessions per week, duration of the PT program, and so on) may be infinite, most PT initiatives result in successful outcomes in academic, psychological, social, and behavioural variables (Malone & Fuchs, 2014;Chen et al., 2021;Culver et al., 2022;Sortkaer & Reimer, 2022). While researchers study PT outcomes for a range of academic subjects (McMaster & Fuchs, 2016;B€ ackstr€ om, 2022;Patchan et al., 2022), Mathematics is one of the disciplines for which PT experiences have been documented most often over the last four decades (Webb et al., 2019).

Peer tutoring in mathematics
The academic benefits of PT in mathematics have been documented in both recent literature reviews and meta-analyses Alegre et al., 2019a;Alegre et al., 2019b). Nevertheless, several authors (Tymms et al., 2011;Shenderovich et al., 2016;Leung, 2019a) have indicated that rigorous studies with randomised controlled trials and large sample sizes are still scarce in the literature. Moreover, according to Leung (2019b) not many studies in the field address the interactions among academic and psychological variables in PT interventions in mathematics. Additionally, most studies in the literature refer to primary education experiences, while research in upper educational levels, such as middle school, high school, or college, is not as abundant (Moliner & Alegre, 2020b;Mills et al., 2022).

Anxiety, attitudes and self-concept in peer tutoring
Many variables have been studied since the 80 s in peer tutoring interventions. Several studies have assessed the effects of peer tutoring on mathematics anxiety, attitudes towards mathematics or mathematics self-concepts (Topping et al., 2017). According to Powell and Fuchs (2015), an increase in students' mathematics self-concepts may be expected for those students who participate in peer tutoring interventions. Moreover, receiving frequent help by a peer has frequently been linked to a reduction in students' anxiety . Given the closeness and mutual confidence between classmates, organised academic support through peer learning interventions are usually regarded as a potential tool to reduce students' anxiety . In this sense, peer tutoring interventions are also expected to have a positive impact on students' attitudes towards learning. As Tymms et al. (2018) and  indicate, peers may play a vital role on other students' attitudes in class. So, structured and supervised helping behaviours in educational contexts may foster students' attitudes towards learning. Although previous researchers in the field have analysed one or even two of these variables, the literature in the field that includes moderation analysis among these variables or other student related variables is still scarce (Johns & Mills, 2021, Xu et al., 2022.

Aim and research questions
Given these existing gaps in the literature, the aim of this study was to determine the following: 1) the effects of reciprocal PT in middle school mathematics on several academic and psychological variables, and 2) the influence of repeating condition as a moderator of one or more of these variables.
The research questions that guided this study are as follows: 1. What are the effects of reciprocal PT on middle school students' mathematics achievements, mathematics anxiety levels, attitudes towards mathematics, and mathematics self-concepts? 2. Does repeating condition moderate the effects of the reciprocal PT intervention on mathematics achievement, mathematics anxiety level, attitude towards mathematics, or mathematics self-concept?

Sample
In Spain, special authorisation is required in order to administer questionnaires to students under 18 years old when conducting any type of research (Berasategui Sancho et al., 2021). To this purpose, the Valencian Ministry of Education was addressed. A middle school in the Valencian region was randomly selected and authorised to undertake this study. Apart from the institutional authorisation, students' parental authorisation was also required. Moreover, the School Council had to approve the research, and ethical requirements provided by the Ethics Committee of the Spanish National Research Council had to be followed. After all administrative procedures had been completed, full legal authorisation was secured for 770 students from grades 7 through 9 (12-15 years old) to participate. As two students moved to another school during the study period the final sample comprised 768 students equally distributed by grades (one third of participating students from each grade) and experimental condition (one half randomly assigned to experimental conditions and the other half to control conditions). Of these 768 students, 388 were female, and 380 were male. Students' demographic distribution was as follows: 451 (58.72%) were Hispanic, 174 (22.66%) were Rumanian, 136 (17.71%) were African, and the remaining 7 students (0.91%) were from other ethnic groups. A total of 96 students (32 students in each grade) were undertaking a grade repetition (repeaters).
Power of the sample StudySize 3.0 software by Creostat HB was used to determine the sample power. A sample power of .80 was determined for a sample of 768 participants when using Pearson's correlation test to report a correlation difference of 0.1 with a significance level of .05.

Intervention
Experimental design Authors such as Zeneli et al. (2018) suggested including control groups in studies on PT interventions, as the omission of a control group may result in an overestimation or underestimation of the outcomes. Hence, following their suggestions, students were randomly assigned to experimental or control conditions following the standards of a randomised control trial. Block randomisation method was used in this research (Ye et al., 2022). This method is designed to randomise subjects into groups that result in equal sample sizes (Xia et al., 2021). All students were pretested at the end of the first trimester of the school year, prior to the beginning of the reciprocal PT intervention, and later posttested at the end of the third trimester, following the last PT session.

Mathematical content covered
Students were taught different mathematical content during the interventions, depending on their grade level. The main mathematical content covered during last two trimesters of the school year (period of tutoring intervention) was as follows. Students in 7th grade worked with simple and compound rules of three with direct and indirect proportionality, first-degree equations without fractions, percentages, proportional and inverse distributions, linear functions, descriptive statistics, the Laplace rule, and basic surface and volume calculations. Students in 8th grade worked with 2 Â 2 systems of equations, equations with fractions and second-degree equations, compound probability and introduction to inferential statistics, and calculation of surfaces and volumes of irregular prisms. Students in 9th grade worked with 3 Â 3 systems of equations, equations with grades greater than two, complex surfaces and volumes, tree diagrams, percentiles, and box diagrams.
Organisation, schedule, frequency, and duration of tutoring sessions During the first trimester of the school year, students in both the experimental and the control groups received teacher-centered instruction (Kytt€ al€ a et al., 2022), that is, all teachers employed the one-way instructional teaching method. Teacher-centered pedagogy is based upon a model of an active teacher and a passive student (Baker et al., 2022). Students had to work individually; they could ask the teacher questions at any time, but interactions between the students themselves were limited. After the first trimester, the reciprocal PT program was implemented with students in the experimental group. This program lasted six months, which comprised the last two trimesters of the school year. Throughout these trimesters, four sessions of reciprocal PT were conducted each week during mathematics classes. The approximate time of each PT session was 25 min. During these last two trimesters, students in the control group continued with the teacher-centered method.
Type of tutoring and distribution of students Reciprocal and same-age PT was implemented. This type of tutoring was selected due to organisational issues. Students in the same grade tutored each other during the intervention. Pairs of students were formed following the suggestions of experienced researchers in the field (Topping, 2020). Students were ordered from highest to lowest based on their mathematics grade from the first trimester. Then, the first student was paired with the second student, the third was paired with the fourth, and so on. In this way, each student was paired with another who had a similar competency in mathematics (Thurston et al., 2019).

Teacher training
Teachers whose students were assigned to experimental conditions received two training sessions of about one hour each. During the first session, they were instructed on the fundamentals of reciprocal PT (De Backer et al., 2016). Teachers watched videos presented by professional researchers with expertise in the specific topics for which the class intervention took place. Elements such as correction of mistakes, patience, and positive feedback were highlighted, and issues such as the difference in the speed at which students complete their tasks and the incompatibility of pairs due to behavioural issues were addressed. During the second session, an intervention trial was performed during the teachers' classes. Teachers were given feedback during the intervention. After the trial, teachers asked several questions that had emerged during the session, which were answered in a joint meeting of all teachers. The training sessions were delivered by two qualified instructors who also served as researchers in the field.

Student training
Students in the experimental group received two training sessions on tutoring procedures two weeks before the start of the reciprocal PT program. The same teachers who taught the students during the school year conducted these training sessions under the supervision of the researchers. Through active participation, students were asked to indicate the qualities that competent tutors and tutees should have in order to successfully tutor and be tutored. They were also told that they all shared a common goal in this process: ensuring that every classmate understood and finished the exercises and problems by the end of each tutoring session. "Pause, Prompt, and Praise" techniques were also discussed, and the importance of patience and respect was emphasised (Duran et al., 2020). Moreover, students were told that interactions between them had to be rich in mathematical content, so they should not talk about other non-related issues during the PT sessions. The importance of explaining mathematical content and procedures in different ways was also highlighted (De Backer et al., 2022).

Teachers' role and distribution
Teachers facilitated the reciprocal PT sessions and supervised interactions between students. It was the teacher's responsibility to ensure that these interactions were proceeding in a respectful environment, that the work was limited to the exercises and problems covered in each session, and that effective academic help was being provided. In order to avoid any teacher effect during the intervention, each teacher was assigned approximately the same number of students in the experimental group and in the control group for the same grade.
Materials used by students during the intervention Students in both the experimental and the control groups were given the same materials during the entire school year, including during the tutoring sessions. Each grade had its own textbook, and the teachers provided worksheets and online exercises. Students were allowed to use calculators and other instruments (compass, set square, and so on) to solve the exercises and problems.

Classroom and tutoring dynamic
The CWPT method employed in this research is based in the DUOLOG method created by Keith Topping and other authors . This method was developed for working with mathematical problems and it has been used during the last two decades (Thurston et al., 2020). It consists of eight structured steps and includes discussion between students, summarising and generalising strategies. As Tymms et al. (2011) indicate, the tutor has to encourage the tutee to solve the problems while high emphasis is given on the development of metacognitive awareness of the strategies used. The distribution of time invested for each task is approximately as follows.
At the beginning of each session, the teacher took approximately 15 min to explain new content. Then, students were given approximately 15 min to complete a series of exercises and/or problems related to that new content, working individually. During that time, students were allowed to ask the teacher questions about how to do the exercise or solve the problem. Meanwhile, the teacher checked the procedures the students were following and their results, making sure that at least one of the students from each pair had the correct answers, while providing feedback. When the 15 min were over, a reciprocal PT session was held for about 25 min. Working in their assigned pairs, the students had to both check the work they had done individually by comparing their results, sharing their procedures, and asking each other questions, and also work together to solve those tasks that they had not been able to finish individually. Even if a pair of students had solved all tasks correctly, eliminating the need for tutoring, they were still required to share the procedures they had employed. When the two paired students arrived at different answers to a problem, both students tried to identify the mistake together, and the student with the correct answer had to help the other student understand how to complete the problem correctly. Although students were allowed to ask questions regarding the exercises and problems during tutoring, perseverance and individual effort were a must. If a pair of students was unable to solve a task correctly after working together, the teacher provided assistance. All students had to be able to solve the exercises and problems by themselves by the end of each tutoring session. Extra exercises and problems were given to student pairs who finished their work early. During the last 5-10 min of the session, the teacher responded to any remaining questions from students.
Data collection and missing data Data were collected in November 2018 (pre-test scores, before the intervention started) and June 2019 (posttest, after the intervention). The students completed the tests during regular mathematics classes. Tests were administered by the researchers of this manuscript. SPSS Missing Value Analysis Package was used to detect all missing values. Missing data in this research was established as MCAR as the probability of a missing data value was independent of any observation in the data set (Gachau et al., 2022). Given this fact, missing data was handled as follows. Missing values were categorised by researchers individually for each student who had one or more items with no answer. After that, incomplete tests were returned to the student for completion. Through this procedure it was ensured that every item had been answered and there was no missing data. Students were assigned a single identification code to enter on their tests in lieu of their names and were informed that their answers would be completely anonymous.

Measures against drop out
The main measure against drop out was a drawing for an iPad (2018) and two coupons worth 50 euros each for school materials. Students were told before the pre-test that they could only participate in the drawing if they completed all tests. No students dropped out during the research.
Measures against hawthorne-effect Students in the experimental group were not told that their tutoring sessions were linked to the administration of the tests or that their participation in the research differed from that of other students in that middle school. This was done to prevent a Hawthorne-effect, that is, the possibility that students may have modified an aspect of their behaviour in response to their awareness of being observed. As documented in the literature, the integrity of the research could be undermined due to this effect (Tadesse et al., 2020).

Instruments
Four variables were measured in this study: the student's mathematics achievement, mathematics anxiety level, attitude towards mathematics, and mathematics selfconcept. The instruments selected had been validated as rigorous, their reliability had been repeatedly tested (House, 2009;Wilkerson, 2021;Arens et al., 2022), and they were frequently used by an international audience (Taut & Rakoczy, 2016). Students' mathematics achievements were measured using the International Association for the Evaluation of Educational Achievement (IEA) Trends in International Mathematics and Science Study (TIMSS) mathematics achievement test (Herber et al., 2017), adapted to the Spanish curriculum. A maximum score of 10 points could be achieved: the higher the student's score, the higher his or her mathematics achievement. Recent studies refer to a reliability of .93 Cronbach's alpha for the instrument (Vesi c et al., 2021;Wardat et al., 2022).
Students' mathematics anxiety levels were measured using the Mathematics Anxiety Scale for Children (MASC) developed by Chiu and Henry (1990), a four-point Likert scale test in which the higher the student's score, the higher his or her level of mathematics anxiety.
Studies refer to a reliability of approximately .85 Cronbach's alpha for the instrument (Keshavarzi & Ahmadi, 2013;Hoorfar & Taleb, 2015). Students' attitudes towards mathematics were measured using the Fennema-Sherman Mathematics Attitudes Scale (FSMAS) developed by Fennema and Sherman (1976), a five-point Likert scale assessment in which the higher the student's score, the more positive his or her attitude towards mathematics. Studies refer to a reliability of approximately .92 Cronbach's alpha for the instrument (Jong et al., 2015;Ren et al., 2016). Finally, students' mathematics self-concepts were measured using the mathematics self-concept eight-point Likert scale developed by Marsh and Shavelson (1985): the higher the student's score, the more positive his or her mathematics self-concept. Studies refer to a reliability of approximately .88 Cronbach's alpha for the instrument (Wohlkinger et al., 2016;Yanhong et al., 2021).

Statistical analysis
SPSS software version 25 was used to perform all analyses and calculations in this research. The Kolmogorov Smirnov test was performed to ensure normality of the data for the pre-test scores in the experimental and control groups (Xiao, 2017). Chisquared was used to address the differences in the proportions repeating condition in the groups (repeater/non-repeater). Repeating condition (also known as "grade retention" or "non-promotion") refers to the fact that a student has repeated one or two school years at some time in his/her academic career, joining a class of younger students (Snead et al., 2022). Means and standard deviations were calculated for all variables. Cronbach's alpha was used to measure reliability of the instruments (Kalkbrenner, 2021). Students' t-tests (95% confidence level) were used to analyse the differences in the pre-test scores between the control group and the experimental group. ANCOVAs were used to assess the effectiveness of the intervention on the four variables that were measured in this study. In these analysis, postest scores acted as the dependent variable with the respective pre-test scores used as covariates when comparing the control group and the experimental group. Effect sizes were reported for each variable and condition (group and repeating condition). Hedge's g was used as a measure of effect size. Moderation analyses with regression were performed using the PROCESS macro for SPSS version 3.4 (Hayes & Rockwood, 2020).

Treatment fidelity
Treatment fidelity was monitored in several ways. First, students' attendance and behaviour were controlled through an online platform called Itaca. All teachers in the Valencian Region must use this platform; they are given a single username and password the first time they enter the teaching system in the Valencian Region. Among other functions, this system allows teachers to track student attendance at each session and to insert comments regarding students' behaviour during their classes. Participants in both the experimental and the control groups had to attend at least 90% of the scheduled sessions. If a teacher inserted a negative comment regarding the behaviour of a student in the experimental group during the PT sessions (showing absolute reluctance to help a peer), the student was dropped out of the study. Teacher performance during the tutoring interventions was supervised by one of the researchers. Each teacher participating in the study with students in the experimental group was supervised during at least two tutoring sessions. No significant violations of the intervention protocol were reported regarding student attendance or behaviour or for teacher performance.

Adequacy of randomisation
The Kolmogorov Smirnov test showed that students' scores in both the experimental and the control groups for each of the four analysed variables followed a normal distribution (p > .95 for all cases). The overall proportions of females/males were .52/.48, respectively, for the experimental group and .51/.49, respectively, for the control group. These proportions did not differ significantly (v 2 ¼ 0.02, df ¼ 1, p ¼ .99).

Results
Descriptive results for this research are reported in Tables 1 and 2. Means ðX ), standard deviations (SDs) are indicated for each group and for each phase of the study. Gain scores for each group and variable and effect sizes for each variable are also reported in Table 1. Cronbach's alpha scores by group and phase of the study are reported for each variable in Table 2. Results for each research question are reported in the following sections.

Results for research question 1: effects of the reciprocal PT intervention
Overall, the intervention effect was significant on achievement (F (2,384) ¼ 98.23, p < .01), mathematics anxiety level (F (2,384) ¼ 87.45, p < .01), and mathematics self-concept (F (2,384) ¼ 56.81, p < .01) when comparing control and experimental group students. No statistically significant differences were reported for attitude towards mathematics (F (2,384) ¼ 0.07, p ¼ .79). Effect sizes by repeating condition and are shown for each variable in Table 3. Results for research question 2: moderation effects of repeating condition Results showed that repeating condition acted as significant moderators for mathematics anxiety level for students in the experimental group. No other significant moderation conditions were found as shown on Table 4.

Effect sizes and statistical improvements for each variable
The reported global effect sizes in this study are consistent with the findings of recent meta-analyses in the field. Leung (2019aLeung ( , 2019b and Alegre et al. (2019c) determined similar effect sizes in PT interventions regarding mathematics achievement. Although not specifically for mathematics, PT effects on anxiety level and self-concept were also similar to those reported in meta-analyses by Bowman-Perrott et al. (2014) and Lavrijsen and Verschueren (2020). In this sense, the reported statistical improvements as well as the percentage of students in the experimental group whose scores improved from the pre-test to the posttest in these variables are consistent with the findings reported for recent studies by Alegre Ansuategui and Moliner Miravet (2017), Connor et al. (2018), Neu and Greer (2019), Alegre et al. (2020), Alegre et al. (2021) andArn andiz et al. (2022). On the contrary, the global effect size for attitude towards mathematics may be considered null (Polanin et al., 2016), and no evidence of improvement was reported for this variable. Recent studies in the field by Bergstrom and Zhang (2016) and Song et al. (2018) did not report significant improvements regarding attitude towards mathematics. As indicated by McGowen and Davis (2019), ambitious pedagogical practices do not seem to have a reliable and positive impact on students' attitudes towards mathematics, although they may be effective in certain contexts. These authors argued that students' attitudes towards mathematics may be one of the most difficult psychological attributes to improve through learning innovations for middle school, high school, and college students.

Repeaters vs non-repeaters
The fact that repeating students showed lower effect sizes than their peers who have not repeated a grade is also consistent with recent studies in the field, including  research by Hickey and Flynn (2019), Losinski et al. (2019), and Moeyaert et al. (2021). In fact, repeating condition acted as a significant moderator for mathematics anxiety level and nearly reached the same classification for mathematics achievement. According to the abovementioned authors, PT mathematics interventions are usually expected to show lower effect sizes (although positive most of the time) for students who were already struggling academically before implementation of the tutoring program.

Moderation analysis
the only significant finding in this study referred to repeating condition acting as significant moderator of the mathematics anxiety variable. Regarding this issue, although not specifically for peer tutoring, authors such as Lazarides and Buchholz (2019) supported the finding that repeating condition may act as moderator variables for mathematics anxiety in educational interventions in which cooperation among students is necessary.

Usefulness of the intervention
Although statistical significant improvements were found in three of the four analysed variables, it must be noted that for the experimental group, in terms of percentage, overall improvements were slight (about 2% of improvements for each variable). The costs and time of intervention were notable, and, although the intervention lead to clinically significant outcomes, the slight overall change is not a strong justification to demand changes in middle-school policies taking the results from this intervention (Floden, 2020). In this sense, for instance, an increase of 2.08% in the TIMMS scores for mathematics achievements did not mean much for the school principal or the teachers. However, this peer tutoring intervention proved to be an enjoyable and worthy alternative way of learning for middle-school students. Students improved slightly in one academic and two psychological variables at the same time they learned in a different way, helping their peers and creating an inclusive environment. Teachers and students believed it was a worthwhile intervention for several reasons. On one hand, teachers noted many students were much more enthusiastic and curious during the peer tutoring sessions. The inclusive character of the intervention, given the current legal requirements for inclusion in education in the Valencian Region , was also considered an important factor by them. On the other hand, many students indicated they enjoyed helping other peers while others stated that receiving help by other peer had been really useful for them. Most of the teachers and students would be willing to participate in similar interventions in the future.

Limitations and future research
Although, as the authors of this research, we put forth our best efforts to follow the highest standards of a randomised controlled trial, the present study has some limitations that must be considered. First, because parental authorisation was necessary for students to participate in the study, there are possible selection effects in the selection of students (Puza & Bonfrer, 2018). Furthermore, the fact that almost 6.25% of the students in the experimental group (24 out of 384) achieved a perfect score on the pretest pertaining to mathematics achievement must also be considered, as ceiling effects may have underestimated the full effect of the reciprocal PT intervention for mathematics achievement (Christofides & Manoli, 2020). That findings in this research are limited to the domain of mathematics and to middle school students must also be considered. This research focussed on mathematics because it is considered one of the most important academic subjects worldwide, regardless of grade level; plus, students tend to find the subject difficult to master (Rittle-Johnson et al., 2021;Marb an et al., 2021). As a final limitation, the study sample is not representative of the population of middle school students in Spain. A more representative sample with students from different geographic and cultural contexts would have reinforced the results of this research (Arnold & Versluis, 2019;Eliyahu-Levi & Chen, 2022;Tan et al., 2021). Moreover, clustering effects must also be considered in any educational intervention (Hutchison, 2009). They represent a major challenge and most of the times are difficult to control in educational settings (Chan et al., 2022). Given the above mentioned limitations of the sample of this research, readers must understand the fact that undesirable clustering could have affected significantly the results reported in this research.
Further research is needed in order to investigate the effects of reciprocal PT at lower or upper educational stages in mathematics (primary school, high school, or college) or with different subjects. Considerable effect sizes and statistical improvements should also be found for subjects such as chemistry, physics, technology, or biology in academic and psychological variables, given the scientific character of the mathematics discipline and that mathematics fundamentals are needed to academically succeed in those subjects. As mentioned, further research in different contexts and cultures is necessary to confirm the promising results in this study. Since reciprocal PT does not require any specific technological support, its implementation in other contexts, such as in developing countries or with students whose families have low socioeconomic status, may be very interesting.

Conclusions
Mathematics reciprocal PT may be academically and psychologically beneficial for middle school students. Significant improvements and positive effect sizes were reported for students' mathematics achievements, mathematics anxiety levels, and mathematics self-concepts. However, the influence of reciprocal PT on students' attitude towards mathematics was null. Benefits for non-repeating students were greater than for repeating students. In this sense, repeating condition may act as a significant moderator of students' mathematics anxiety level during this type of experience. Mathematics PT effects were very similar for all grades across middle school.

Disclosure statement
No potential conflict of interest was reported by the author(s).

Data availability statement
All data regarding this research may be accessed using the link below: https://osf.io/ac57w/ ?view_only=68e60dd3886e486ea89a52e4c1c37497