Selection and validation of a measurement instrument for readability calculations in patient information lea fl ets for oncological patients in Spain

This article presents some findings which deal with text readability, obtained in a research project sponsored by the Spanish Ministerio de Economía y Competitividad.1 The main objective of the project was to improve the quality of written texts used to convey information to oncological patients in hospitals2 in Spain. Among other measurement instruments, it was proposed to use some readability index which allowed to detect the quality of the original texts considered (written in Spanish), and which additionally enabled the evaluation of the improvement in readability achieved as a consequence of the research. Literature review on readability indices, for the case of Spanish language, indicated three possible candidates. Statistical analysis guided the selection and validation processes carried out for the indices in the case of patient information leaflets addressed to oncological patients in two Spanish hospitals.


Introduction
The use of readability indexes as a tool to assess information exchange and content comprehension has been object of research for many years.In general,

Affiliation
Departament de Traducció i Comunicació, Universitat Jaume I, Av. de Vicent Sos Baynat, s/n, 12071 Castelló de la Plana, Spain.email: martij@uji.es the use of these indices and their corresponding formulas, although successful, has been subject to controversy (DuBay, 2004: 2-3), in that they may not provide a complete picture of the easiness and success in information transfer, which is normally also evaluated by means of additional methods, like questionnaires and interviews (focus groups).
In the case of the project from which the data presented in this article was gathered, the project main objective was the improvement of quality of written texts used to convey information to oncological patients in hospitals from the Valencian Community (Spain).In this the particular case, the approach followed was also the one mentioned above: the readability index selected and later validated was only one source of information to be completed and contextualized by other means.
Due to this, the information presented in this paper is not intended to give an overview of the project as a whole (for example, its different phases and conclusions), nor does it try to explain in detail how the readability index selection and validation processes were used in combination with other tools or methodologies along the project.This means that it is only the index itself as a tool that is described here.As a consequence, the scope of this article is just to explain the process and reasoning followed as far as the readability index were concerned, as well as to try to make them clear by describing the data and statistical tools involved.

Literature review
Literature on readability and readability indices and formulas is vast, and therefore it is not intended here to cover it all.Within this particular field one can find, among many other topics, definitions of the concept of readability, which date back many decades.Some authors who provided definitions for readability were, for example, Dale and Chall (1949), Klare (1963), Selzer (1983), Samson (1993), and Hargis et al. (1998).DuBay (20004: 3) claimed that readability is 'What makes a text easier to read than others.It is often confused with legibility, which concerns typeface and layout' .This terminological confusion is also present in Spanish language, where both readability and legibility are termed 'legibilidad': while the former is often referred to as 'legibilidad lingüística' , the latter is named 'legibilidad tipográfica' .However, when the hyperonim 'legibilidad' is used (like in some of the publications cited below), it is 'readability' what it is usually meant.
For this particular paper, and regarding readability literature review, the majority of sources mentioned below refer to the literature which covers studies in Spanish.Still, some additional references are also pointed out, because they can help understand the big impact this topic has recently had, especially in medical research.Since so much literature is available, it has been decided to present a list of limited and selected sources considered relevant for this specific study in four main groups, mentioned in a hierarchical fashion, from more general to more particular.These four groups are: (1) publications that deal with the topic of readability in general; (2) international articles which present specific readability applications for several disease-specific texts, (3) documents which propose guidelines for the writing of information addressed to patients in Spain, and finally, (4) empirical studies on readability for the particular case of patient information leaflets, or PILs (Montalt and González-Davies, 2007: 68-72), in Spain.
Of these four groups, the ones that deal with topics which match better the research described here are those cited below in the third and fourth groups.Most of them propose improvements in readability for PILs written in Spanish.In this sense, the research described here strives to provide additional insight into the use of readability indices for the particular case of patient information leaflets in Spanish.
Some general references of the first group, on readability and patient information are, for example, Pilegaard and Havn (2012), Mayor Serrano (2010) or Gröne (2009).A search on PubMed on recent articles related to readability of information for specific applications (the second group mentioned above) offers some examples for orthopaedics (Badarudeen and Sabharwal, 2010), paediatric patient information materials (Swartz, 2010), education material related to implantable cardioverter defibrillators (Strachan et al., 2012), or web-based cancer information (Friedman and Hoffman-Goetz, 2006).As far as guidelines, or best practices, for patient information written in Spain (third group mentioned above), we find the two by Mayor Serrano (2008) and the one by Barrio et al. (2011).The fourth of the above mentioned groups includes some empirical studies related to application under research here (the combination of readability and patient information leaflets in Spanish): the two by Barrio Cantalejo and Simón Lorda (2003), Barrio Cantalejo et al (2008), Barrio Cantalejo et al.(2008) and the one by March Cerdá et al.(2010).All of them use the readability indices also considered in this study.
As far as readability formulas is concerned, and according to the review by DuBay (2004: 21-22), some popular ones are the Flesch Reading Ease formula (1948), Dale-Chall (1948), Gunning's 'FOG' (1952), FORCAST (Caylor et al., 1973), and the Flesch-Kincaid Grade Level, which, according to DuBay (2004: 52) is the one used by Microsoft® Word.The parameters included in these formulae for their calculations are usually the number of words, number of sentences and number of syllables in a given text.Some of the formulae include additional or alternative parameters in their calculations, such as the 'number of difficult words' (Dale-Chall), the number of words with three syllables (Gunning's FOG), and the number of words with just one syllable (FORCAST).
With all this general information in mind, but adopting a more practical perspective in the research; this means, by trying to focus on readability studies devoted specifically to the Spanish language, it was decided to use a freeaccess tool to calculate the readability indices for the corpus considered in the research.This tool was the one employed in similar studies in Spain, whose main publications have been cited in the third and fourth groups mentioned above in this section.
The tool under discussion is the programme Inflesz v1.0 ('INFLESZ' from now on), a user-friendly one, which also includes useful information about the three indices considered for the project, all of them adapted form the original Flesch Reading Ease formula.The following descriptive information can be found in Spanish both in the programme documentation and in Barrio Cantalejo (2007: 291-294).It has been translated here into English, for the sake of understanding.
The three indices involved are: 1. Flesch-Szigriszt index: INFLESZ gives this name to the validation of the Flesch Reading Ease Formula, which Francisco Szigriszt Pazos carried out for his PhD thesis (1993).The Flesch-Szigriszt index is then an application of the Flesch formula to the particular case of the Spanish language, and it is calculated by means of the following formula: FLESCH-SZIGRISZT Index = 206.835− (62.3 * S/P) − P/F where 'P' is the number of words in the text, 'S' is the number of syllables and 'F' is the number of sentences.The degree of difficulty of a text, measured in terms of the so-called Inflesz scale, establishes five levels of difficulty, which are presented in Table 1.where 'P' is the number of words in the text, 'S' is the number of syllables and 'F' is the number of sentences.
3. Word® Correlation: the use of the Flesch formula was widespread and then added to the utilities provided by this word processor (Microsoft Office® 2000).This version of the word processor included the possibility to activate an option which automatically made the index calculation after a spell-check.INFLESZ produces the result 'Word correlation' , which generates the numerical result that Microsoft Office® 2000 would have calculated for the text under study.The formula is obtained as follows, as a function of the Flesch-Szigriszt index: As it will be explained in more detail in the following sections, these three indices were calculated with the programme INFLESZ for all the texts which belonged to the corpus of study.

Methods and materials
As was pointed out in the Introduction, the main objective of the research project was the improvement of the quality of written texts used to convey information to oncological patients in hospitals from the Valencian Community (Spain).These two hospitals were Hospital Clínico Universitario (HCV, Valencia, Spain), and Hospital Provincial (HPC, Castellón, Spain).Staff from these institutions (doctors, nurses, psychologists) was contacted, and they kindly agreed to provide the research group with texts which were used in their facilities.The researchers and the staff from hospitals held several meetings, where the latter described their daily working environment, one characterized by the lack of written information for patients.According to their experience, most of the information given to patients was oral.Still, they provided texts (13 from HCV and 14 from HPC, for a total a 27 texts) to the researchers.These texts were in most of the cases written by nurses as a part of their daily routine, in an attempt to supply patients and relatives with additional information they could take home with them.
As far as genre (here understood as a form of conventionalized text), 25 of the texts supplied (all of them except two: HPC01 and HPC02) could be included in the medical genre known as PIL (Patient Information Leaflet), as described in Montalt and González-Davies (2007: 68-72), and they dealt with side effects associated with the medication used in the treatment of breast cancer, as well as with the administration devices used for this medication.The two exceptions mentioned were examples of IC forms (Informed Consent forms), as described in Montalt and González-Davies (2007: 64-68).
The texts were supplied in paper, as printed leaflets.As a consequence, they had to be digitized so that they could be cut and pasted into the interface window of the INFLESZ programme.A few leaflets had images (iconic information), some of which also included small pieces of text inside the icons.These pieces of written information had to be discarded, as they could not be retrieved in the right format during the digitization process.This may be considered as a drawback, but the digitization process handled the image as a whole (also the text within) and there was no other alternative but to discard these pieces of text.However, these texts were associated with images which would not have been considered in any case by the readability formula.Besides, presenting this piece of text without the associated intersemiotic relationship (the iconic context), would have been a potential factor for incorrect readability calculations.
From the options on the INFLESZ programme interface, one can select the basic analysis (' Análisis Básico'), which provides the user with values of the number of words, sentences and syllables in the text, as well as the value of the Flesch-Szigriszt index, and the grade of the text difficulty (Table 1).The option for additional analysis (' Análisis Adicional') prompts with the results for the other two indices: the Fernández Huerta index, and the Word® Correlation.Then, readability calculations (expressed by means of the three indices) were made for the 27 texts.The results obtained can be found in the next section.

Results
These values were stored on an Excel table and plotted by means of a Excel graph.The table and the corresponding graph are shown in Table 2: Visual inspection of the values obtained for the 27 texts in the graph shows that values for the Flesh-Szigriszt and Fernández Huerta indices vary between 50 and 80, approximately (they could be termed as 'normal' or 'quite easy' , according to Table 1, as far as the first of these two indices is concerned), while the values obtained for the Word® Correlation oscillate between 0 and 30 (INFLESZ does not provide an equivalent to Table 1 for the Word® Correlation).Still, some qualitative 'parallelism' might as well be observed, in the sense that the three indices show similar behaviour, at a glance, independently of the actual quantitative values recorded for each of them.
The main criterion used to decide which readability index should be used in the project was the index sensitivity.'Sensitivity' is to be understood here as the capacity the index may have to detect variations of readability of different texts.Therefore, a sensitivity analysis was performed based on the values shown in Table 1.In order to do so, the above mentioned Excel table, which included all the values, was expanded.For each of three columns which con-tained the three sets of 27 values, additional calculations were performed.These included maximum and minimum values for each column, as well as mean and standard deviation calculations.As an indication of the sensitivity for the indices, a normalized (percentage) value of the ratio of standard deviation over mean was used.The results obtained are shown in Table 3.The normalized value for sensitivity expressed in percentage in the last row of the table clearly indicates that the Word® Correlation shows a higher result, and was thus initially selected as a candidate index for readability calculations to be made in the project.
However, the fact that the Word® Correlation value as obtained by INFLESZ is a calculated value based on other calculations (the Flesch-Szigriszt index and the correlation formula shown above), concerned the research groups members.This was because the exact value not only depends directly on the number of sentences, words and syllables included in the text, as well as on the formula used in the Flesch-Szigriszt index calculation, but also on some additional correlation which needs to be trusted.As a consequence, an independent and dedicated validation process for the Word® Correlation was also necessary, before a final decision on the use of this index could be reached.
In order to carry out the validation process, the most obvious available option was to let Microsoft Office® 2000 make directly the calculations of the Word® Correlation.This was implemented by finding and installing this older version of the programme in a dedicated computer.The 27 texts, now digitized (with the exception of the small pieces of text included in the images which some of them contained, as explained above), were directly input into the word processor, and the spell-check was run, while the functionality for readability calculations had been activated.
The values thus obtained were pasted to the original Excel table, and a new graph with the four indices (the three original ones and the new one, obtained directly from Microsoft Windows® 2000) were trended.These new table and graph looked as shown in Table 4.The square dots in the graph represent the values for the Word® Correlation obtained by INFLESZ, while the 'X' ones were calculated directly by the readability functionality of Word® 2000.As it turns out, the trends may seem parallel for some cases, but there are also a few discrepancies.
In order to quantify these preliminary observations, the next step was to obtain a correlation between these two series by using the Microsoft® Excel 2007 functionality, which produced a value of 0.855.This might seem, at first sight, to be acceptable, since this result might indicate that there existed some correlation between the two series.As an additional exercise, these two sets of values were plotted by using a X-Y plot (again, Microsoft® Excel 2007 was used for this), in an attempt to carry out a regression, which should theoretically reproduce a linear behaviour, and which would have no independent term (of the type y = m*x), due to physical meaning considerations.For this representation, the X axis was used for the Word® 2000 values, while the Y axis was used for the Word® Correlation.
The X-Y plot looked as follows, with three points directly located on the Y axis, and one outlier (Figure 3).The regression coefficient thus obtained (R2) had a poor value of 0.732, and the value of the linear regression slope was 1.18, above the ideal value of 1.The above mentioned values on the Y axis and the outlier on the far right part of the graph seemed to disturb the regression calculation, and a decision needed to be taken about them.
It was decided to eliminate the three values located on the Y axis.The three texts on the Y axis were HCV007, HCV13 and HPC01 (one of the examples of IC), which had values of 7, 11 and 5, respectively, for the Word® Correlation, while Word® 2000 produced 0 values for the three of them.It was considered, on the one hand, that the direct calculation made by Word® 2000 had to have physical meaning, in the sense that it was only based on texts characteristics (number of words, sentences, syllables, etc.) and on no other additional calculations.However, on the other hand, although the three texts were different, the result was identical for all of them.As a consequence, it was not considered appropriate to include in the regression identical Word® 2000 values for different texts (although similar, as indicated by the Word® Correlation index), since the physical meaning that may be attached to a value which treats different objects as equal may be questionable.
For the outlier, it corresponded to text HCV11, which consisted mainly of images including text discarded during the digitization process, thus leaving a short and easy text.Then, it was decided to remove the outlier too.
After this filtering process, a new X-Y plot was produced, and a new correlation and a new regression with just 23 out of the 27 texts were tried.The values obtained were as shown in Figure 4.The value for the correlation factor after the data filtering improved to 0.929, the regression coefficient thus obtained (R2) was an acceptable one of 0.863, while the value of the linear regression slope increased to 1.35, also above the ideal theoretical value of 1.

Discussion
It was considered that the validation carried out after the filtering process could be described as acceptable, because the correlation factor value and the linear regression coefficient (0.929 and 0.863 for the data set consisting of 23 out of the 27 texts, respectively), were good enough for the evaluation of the texts under study.However, it became obvious that the Word® Correlation values calculated by INFLESZ were some 35% (the slope of the regression calculated in the validation process was 1.35) above the ones that the Word® 2000 itself had produced, for the particular case of the application to the studied corpus.In other words, the validation performed for the research project had shown that INFLESZ calculated too high values for the Word® Correlation.
Information on how the Word® Correlation formula used by the INFLESZ programme was developed (how many data were used, for example), was not available to the research group.As a consequence, it seemed more reasonable to stick to the conclusions of the analysis and the validation performed for the project, given that their methodology and steps followed were known, rather than accepting the correlation values as such.
It was necessary then, to go back to Table 1 and to express the grade of difficulty of texts (in terms of readability) by using the Word® correlation, rather than in terms of the Flesch-Szigriszt index.This would be a new contribution of the analysis and validation exercises carried out.In principle, the values used as limits to specify the different degrees of difficulty by the Flesch-Szigriszt index could be transformed to Word® Correlation terms, if the conversion formula mentioned in Section 1 was to be used.This transformation is shown in the Table 5.Based on the findings of this research, some adjustments to the table were necessary, if INFLESZ was still required to calculate the readability of the texts belonging to the corpus (after the improvement process was finalized), and the results of the validation process were to be taken into account.The validation process had pointed out two main facts: negative values were not calculated by the Validated Word® correlation (due to the form of the equation used for the regression analysis, y = m*x, with no independent term), and the results provided by INSFLEZ turned out to be, based on this regression, 35 % higher than the real values calculated by Word® 2000.
As a consequence of these two facts, a simpler, more intuitive and easier to use grading table for text readability was introduced.This proposal is specifically dedicated to the particular kind of texts used in the project, both based on their genre characteristics (PIL), and language used (Spanish).The proposal is shown in the Table 6.In practical terms, and based on the consequences of the process described in this study, the way to proceed for the use of the Validated Word® Correlation in the research project would mean to keep on using the INFLESZ programme to quantify the readability of the texts in the corpus, once they had been improved.However, the values obtained from the programme would have to be divided by 1.35, in order to obtain validated Word® 2000 values.Their grading (referred to difficulty in terms of readability) would be reduced to only three levels: difficult, normal and easy, and the limit values to go from one category to another would be the ones shown in Table 6.
To sum up, the outcome of the study is the proposal of a simpler grading of the readability calculations of PILs written in Spanish, based on the Validated Word® Correlation, and obtained as a result of the methodology and statistical calculations described in this paper.The degree of application of the proposal is therefore somewhat limited, since it focuses only on documentation for patients which belongs to a specific genre and which is written in a given language.However, the main contribution of the paper, from a wider scope, may lie on the methodology followed to go about a research problem as the one described, and the possibility to follow a rigorous, data-driven approach in research decision-making processes.

About the author
José Luis Martí Ferriol is a full-time lecturer at Universitat Jaume I, Castelló (Spain), where he teaches audiovisual, scientific and medical translation, both at Grade and Master level.He is author of the books Cine independiente y Traducción (Valencia, Tirant Lo Blanch, 2010), and El método de traducción: doblaje y subtitulación frente a frente (Castellón, Servei de Publicacions de la Universitat Jaume I, 2013), as well as of several articles in national and international magazines, mainly related to audiovisual translation.

Figure 1 :
Figure 1: Graphical representation of the values of the three readability indices for the 27 selected texts

Figure 2 :
Figure 2: Values of the three readability indices plus the Word® 2000 results for the 27 selected texts

Figure 3 :
Figure 3: X-Y plot of the Word® Correlation index versus the Word® 2000 values for the 27 selected texts

Figure 4 :
Figure 4: X-Y plot of the Word® Correlation index versus the Word® 2000 values for the 23 remaining texts

Table 1 :
Huerta (1959)ficulty (in terms of readability) as expressed by the Flesch-Szigriszt index Fernández Huerta index: as proposed by José FernándezHuerta (1959), a Spanish teacher, pedagogue and specialist in the field of experimental didactics.He proposed the adaptation to the Flesch formula into Spanish, by using the same factors but by changing the weighting, probably as a result of a multiple regression analysis (not specifically explained in his work).INFLESZ calls this 'Fernández Huerta index' , whose formula is as follows: FERNÁNDEZ HUERTA Index: = 206.84− (60 * (S/P)) − (1.02 * (P/F)

Table 2 :
Values of the three readability indices for the 27 selected texts

Table 3 :
Sensitivity analysis of the three readability indices

Table 4 :
Values of the three readability indices plus the Word® 2000 figures for the 27 selected texts

Table 5 :
Degree of difficulty (in terms of readability) as expressed by the Flesch-Szigriszt index and the Word® Correlation

Table 6 :
Simplified degree of difficulty (in terms of readability) as expressed by the Flesch-Szigriszt index and the validated Word Correlation for PILs written in Spanish