Competing modal periphrases in Spanish between the 16 th and the 18 th centuries A diachronic variationist approach

The history of Spanish modal constructions has been widely discussed in the literature, focusing primarily on the semantic differences between the available alternatives. This paper offers an innovative analysis of the evolution of these constructions by adopting a diachronic variationist approach that takes into account a wider range of semantic, syntactic, morphological and stylistic factors that influence the choice between the competing modal periphrases during two key stages in the evolution of Spanish. The data is drawn from a diachronic corpus of personal correspondence, reflecting actual language usage during the respective periods as closely as possible. Particular attention is paid to the question of whether the influence of different factor groups remains stable over time or not, and it is shown that the most frequent form – context pairings are particularly resistant to innovation. This can be explained by cognitive entrenchment of the respective variant in specific linguistic environments.


Introduction *
It is generally accepted that most linguistic changes proceed gradually and over an extensive period of time, during which different forms compete with each other, vying for dominance and spreading to different areas of the grammatical system at different rates and to a different extent.
A better understanding of the details of this type of process can be gained by means of a variationist approach which makes it possible to obtain genuine insights into what is going on within the grammatical system at different stages during the period in which two or more variants compete for dominance in a specific area of grammar (Poplack & Tagliamonte 2001;Poplack 2011;Torres Cacoullos 2009). This approach is based on the idea that the patterns and principles underlying such changes can be identified by means of a quantitative analysis of the competing variants in the different environments that, together, constitute the VARIABLE CONTEXT of these forms (Poplack 2011: 212).
While the variationist approach has become indispensable for the analysis of recent and ongoing language change, it has so far only made fairly small inroads into traditional historical linguistics, though its methodological innovations are equally important for the study of changes that lie further in the past. In particular, the PRINCIPLE OF ACCOUNTABILITY, which requires us to establish not only how the variable context conditions the use of a particular linguistic variant, but also how it conditions the alternative forms within a given subsystem of grammar (cf. Labov 1972: 72), is essential for a complete analysis, as without taking all available alternatives into account, an apparent link between a particular variant and a particular function might cause us to draw incomplete or even incorrect conclusions.
The application of this principle is a key element of the study presented in this article, which offers a comparative analysis of the usage of the modal periphrasis [haber de + infinitive] in the 16th and 18th centuries, which alternates with two other constructions in the same semantic domain: [deber (de) + infinitive] and [tener de/que + infinitive]. While these competing constructions are not entirely identical in semantic terms, there is a great deal of overlap, and they have, over the centuries, been used as alternative variants to express similar modal meanings, though their usage frequencies have changed over time. In line with Sankoff's (1988) hypothesis of the neutralisation of potential semantic differences between different forms in discourse, we can identify the periphrases exemplified in (1)-(3) as variants of the same syntactic variable, used to express verbal modality: (1) Hija, en cuanto a mis cuidados, doite aviso, de lo que has de hacer, con Pedro Salvador, ya va informado de lo que has de hacer. (Cartas desde América, 1728) "Daughter, regarding my affairs, I am giving you notice of what you must do, with Pedro Salvador, [who] has already been informed of what you must do." (2) Le buscarás en casa de Don José Nolasco, que allí asiste, y si no ha venido de Vizcaya aguardarás que venga, que te dirá lo que tienes que hacer. (Cartas desde América, 1787) "Look for him in Don José Nolasco's house, where he usually is, and if he hasn't returned from Biscay, wait for him to come, and he will tell you what you must do." (3) … te encargo que consultes con nuestro rector o el doctor Cathalano -el que nos casó, que por entonces era vicario-, pues tengo hecha la súplica a dicho señor para que te dirija lo que debes hacer a mi favor. (El hilo que une, 1771) "I put you in charge of asking our rector, Dr Cathalano -who married us, though back then he was a vicar -for I have requested this gentleman to let you know what you must do for me." It can be seen that in all three sentences, taken from a corpus of 18th-century epistolary texts, the modal periphrases (underlined) have a lot in common: in all cases, (a) the main verb is hacer "to do"; (b) the type of modality they express is deontic, describing an obligation imposed by someone else (command/order); (c) the grammatical person and number (2SG) is the same; (d) the tense and mood (present indicative) is also the same; (e) the clauses containing the periphrasis are affirmative in terms of polarity and (f) active in terms of voice; and (g) the relationship between writer and addressee, as well as the subject matter of the letter, can be classified as personal, since we are dealing with correspondence between family members.
Despite the fact that there is much functional similarity and overlap between these three periphrases in discourse, existing studies examining their diachronic development tend to leave some important questions unanswered. Most quantitative analyses tend to be limited to the variation between the periphrases with haber and tener (e.g., Martínez Díaz 2003;López Izquierdo 2008), overlooking that the construction with deber is a further competitor in the area of verbal modality. As a result, the observations and conclusions regarding the changes in the usage of modal constructions provide an incomplete picture, according to which [haber de + infinitive] is practically the only variant used until the beginning of the 20th century. However, as will be shown in this study, quite a different picture emerges if [deber de + infinitive] is included in the analysis; though [haber de + infinitive] is still the numerically dominant option in the 18th century, there is evidence of strong competition from the alternative variants, especially in certain structural and stylistic contexts.
Furthermore, the existing studies generally provide a straightforward frequency analysis that does not take into account the variable context of these constructions. For instance, [haber de + infinitive] appears both in modal and temporal (future) contexts, while the use of the alternative periphrases is limited almost entirely to modal contexts, as we shall see below. As far as we are aware, there are no detailed grammatical studies of the diffusional pathway of this change beyond the simple analysis of global usage frequencies.
One of the objectives of this paper is, thus, to compare the patterns of variation in the two periods, the 16th and the 18th centuries, in order to identify language-internal and socio-stylistic factors that either favour or disfavour the progressive replacement of [haber de + infinitive] by its competitors, and to examine the explanatory hierarchy among the most relevant factor groups, as well as the direction of the effect within these factor groups. It is important to establish to what extent the underlying grammar changes between these two periods, which are generally considered to be key stages in the evolution of the Spanish language: the 16th century represents the beginning of Classical or Golden Age Spanish, with significant differences to the medieval language, while the 18th century marks the transition from Classical to Modern Spanish. 1 As will be shown in §6, contrary to what is suggested in some descriptive studies (e.g., López Izquierdo 2008), the periphrasis [haber de + infinitive], which is the dominant modal construction during earlier stages of the language, has already lost considerable ground to the variants [deber (de) + infinitive] and, to a lesser extent, [tener que + infinitive] 2 in 18th-century Spanish.
From a more theoretical perspective, this paper provides data in support of a usagebased approach to language change in which cognitive processes such as entrenchment have a decisive role (cf. Schmid 2012;Croft 2001: 28); this is particularly apparent from the fact that, in many cases, the most frequent contexts tend to favour the use of the older, more frequent and more established variant [haber de + infinitive], whereas low-frequency contexts are typically the point of entry and expansion for the newly emerging alternative.
Before presenting the data, a brief historical overview of the origin and evolution of the Spanish auxiliary constructions with the verbs haber, deber and tener is provided in §2, followed by a description of the corpus in §3. §4 examines the general distribution of the three periphrases in the corpus, which is followed by a discussion of some important methodological issues in §5. After the presentation and analysis of the data in §6, the most important conclusions of this study are summarised and some theoretical implications discussed in §7.

The origin and evolution of the periphrases with haber, tener and deber as auxiliary verb
In Latin, [HABĒRE + infinitive] is already used to express various types of deontic modality in the first half of the 1st century, for example in the works of Seneca the Elder (Hertzenberg 2012). While there is, initially, no linking particle between the auxiliary and the main verb of this construction, as early as the Late Latin and Early Romance periods the prepositions a and de appear in this periphrasis. In medieval Spanish, the three variants [aver + infinitive], [aver a + infinitive] and [aver de + infinitive] 3 can be found in all text types and registers; the meaning of all three constructions ranges from clearly temporal (future) to clearly deontic (obligation), though in some cases such a clear distinction is not possible, given the strong semantic link between obligation and future reference. The construction without a prepositional linker, [aver + infinitive], was infrequent even in medieval times, used rarely in the 12th and 13th centuries and even less from the 14th century onwards. Its demise appears to be linked to the emergence of the synthetic future, the outcome of the grammaticalization of [infinitive + aver], at a time when the prepositional variants [aver a + infinitive] and [aver de + infinitive] were extremely common. Of these, [aver a + infinitive] is used more frequently until the 14th century, after which [aver de + infinitive] gradually takes over as the dominant variant. By the 15th century, the variant with the prepositional linker a had all but disappeared, except in some markedly dialectal texts (Stengaard 2003(Stengaard : 1151. A further set of obligational periphrases, with tener "to have, to hold" as their auxiliary verb, emerges between the 13th and the 15th centuries, as part of a gradual process in which tener replaces aver in an increasing number of contexts. In a recent study, Garachana and Rosemeyer (2011) show that this is a clear example of how a grammatical change can have its origin in an initially purely lexical substitution, arguing that the rise of [tener de/a + infinitive] 4 "is based on a process of conceptual identification, in which speakers do not distinguish between the lexical and the grammatical level once the equivalence between the verbs has been established" (Garachana & Rosemeyer 2011: 39, our translation). In the same vein, Yllera (1980: 110) observes that "in the 13th century tener can be found in a variety of contexts that previously only admitted aver, both as an independent verb indicating possession and when used with an adjective or participle. In addition, though with a certain delay and less frequently, tener begins to substitute haver in the modal periphrases" (our translation). A further variant of the modal construction with tener, [tener que + infinitive], begins to gain ground from the 16th and 17th centuries onwards, at the expense of [haber de + infinitive] and [tener de + infinitive], which have all but disappeared from the present-day language (Blas Arroyo & González, 2014).
Finally, the Latin verb DĒBĒRE "to owe", takes on the meanings of obligation and necessity at an early stage, leading to the emergence of deontic modal periphrases in almost all Romance languages (Yllera 1980: 92), including Spanish. According to Beardsley (1921: 150), [deber + infinitive], without a linking preposition, is by far the most common variant of the deber-periphrasis in early Medieval Spanish, with a few sporadic cases of [deber a + infinitive] also attested in the 13th century, whereas "[t]here seems to be no trace of deuer 5 de in the early texts" (Beardsley 1921: 31). While the prepositionless construction is the more frequent variant throughout the entire 3 A fourth variant, [aver que + infinitive] is also occasionally documented, approximately between the 14th and the 16th centuries, but it does not turn into a serious competitor. Over the subsequent centuries, it acquires the specialised function that it has to the present day: the default impersonal modal periphrasis of necessity/obligation [hay que + infinitive] ('one must, it is necessary to' + inf.). 4 The variant [tener de + infinitive] is the first to appear, with a few cases documented as early as the 13th century; the short-lived variant [tener a + infinitive] appears in the 14th century but always remains rare and disappears together with its counterpart [aver a + infinitive] before the end of the 15th century, whilst the frequency of [tener de + infinitive] rises continually until the 16th century (Blas Arroyo & González 2014). 5 Deuer, as well as dever, are common historical spellings of modern-day deber.
history of Spanish, [deber de + infinitive] is also relatively common during the classical period from the second half of the 16th century onwards (Beardsley 1921: 31;Blas Arroyo & Porcar, 2016).

The corpus
The corpus used in this study, compiled as part of a wider project on diachronic variation in Spanish, 6 consists entirely of documents of a personal nature, primarily personal letters and private diaries. Such documents, characterised by a high degree of communicative proximity (Nähesprache; cf. Koch & Oesterreicher 1985;Oesterreicher 2004), tend to reflect natural, spoken language more closely than more formal texts (literary, legal, etc.) that have traditionally been used in diachronic linguistic studies. The documents selected to be included in the corpus represent different regional varieties of Spanish as well as different registers and degrees of formality, ranging from correspondence between close family members to letters sent by private individuals to the authorities.
The private nature of many of these texts, together with the relatively basic level of formal education of many of their writers, makes them a valuable resource for historical linguistics, as language change almost always has its origin in the spoken language, and the more formal or official a text is, the less likely it is to reflect the way in which speakers really used the language in everyday situations during the respective period. As observed by Oesterreicher (1996: 325), personal letters are a fertile ground for textual production within the category of written documents with oral features; as they were not written with the intention of their ever being published, there is little reason for the authors to avoid vernacular features. Elspass (2012) notes that there are also other reasons why diachronic linguistic studies using data from private correspondence have become more numerous in recent years; one advantage is that these collections of letters tend to contain information about the relationships of power and solidarity between senders and addressees, as well as their social status (Okulska 2010) and their geographical origin, which allows the linguist to draw detailed conclusions about the impact of diastratic and diatopic factors on linguistic choices. Furthermore, the fact that personal letters frequently lack a preplanned structure and are often emotionally charged means that they are likely to employ linguistic strategies aimed at increasing their expressive force (Danilova 2012), allowing us to identify potential conditioning environments (favouring one variable over another) such as emphasis, intensification or attenuation.
The 16th-century corpus used in this study contains 1,935 letters, a number of official statements recorded, in direct speech, by officials of the Inquisition (Eberenz & de la Torre 2002), as well as several diaries and chronicles authored by individuals with limited formal education (Stoll 2002;Stoll & Vázquez 2011); all in all, the corpus consists of texts by more than 700 Spanish speakers from a variety of social and regional backgrounds, totalling 842,658 words. The 18th-century corpus, meanwhile, contains 1,263 letters by approximately 500 different individuals, as well as two diaries and one account book, with a total word count of 624,456. The majority of the letters are written (or dictated in some cases in the 16th-century corpus) 7 by individuals from a variety of social strata, ranging from farm labourers and craftsmen to members of the aristocracy who had emigrated to the Spanish New World colonies, and the topics dealt with in their correspondence (as well as the relationship between writer and addressee) range from intimate or familiar on one end of the spectrum to formal on the other end. 8 Table 1 shows the overall distribution of the periphrases examined in this study during the two periods analysed. The most frequent construction in both periods is [haber de + infinitive], but its proportion decreases from 76% in the 16th century to 61.3% two hundred years later. As seen in Figure 1, this decrease becomes more pronounced as the 18th century progresses, from 64.7% in the first half of the century to 43.8% between 1750 and 1800 (p>0.05); by the end of the 18th century, [haber de + infinitive] is, for the first time, no longer the most frequently used modal periphrasis.

Distribution of the periphrases in the corpus 4.1. General observations
The proportion of periphrases with deber, on the other hand, more than doubles from 15.5% in the 16th century to 32.2% in the 18th century. This development becomes even clearer in the second half of the 18th century, when the variants with deber account for almost half of all modal periphrases, 9 used with approximately the same frequency as [haber de + infinitive]. This stands in stark contrast to the periphrases with the auxiliary verb tener, which remain relatively infrequent, though a certain increase can be observed towards the end of the 18th century, as seen in Figure  1. 10 showing that this is the key stage in the shift from the variant with de, which was more common in the medieval and classical language, to the variant with que, which has, in the modern language, entirely supplanted the former, except in a few regional varieties (cf. Blas Arroyo & González 2014).

Semantic values
As mentioned above, the periphrases examined here can express a range of modal notions, among them those exemplified in sentences (4)-(9): (4) … y ansí pienso gastar poco porque bale tanto allá el bestir y calçar que dizen a de trabajar para sólo esto (Vida y fortuna del emigrante navarro, 1596) "... and thus I intend to spend little, as clothes and shoes are so expensive there that it is said that one has to work just for that." (5) … aunque estoy con la cabeza como un cántaro por lo mucho que he tenido que trabajar a causa de estar sólo y haverse juntado tantas cosas de una vez… (Correspondencia extraoficial de Ignacio de Heredia con Manuel de Roda, 1774) "... though my head is spinning as I've had to work so hard because I'm alone and so many things have happened at the same time..." (6) …se cree que alguno dellos deve ser nicuesa capitan quel catolico Rey don fernando de gloriosa memoria mando yr a tierra fyrme (Textos del Caribe, 1519) "... it is believed that one of them must be Captain Nicuesa, whom King Ferdinand the Catholic, of glorious memory, sent to the Province of Tierra Firme." (7) En el día se halla este renglón hasta 10 pesos pero esperamos suba en junio, agosto y setiembre, por la mucha escasez de granos que ha de aver este año por las malas cosechas (Al recibo de esta, 1793).
"At present, the price is around 10 pesos, but we expect it to rise in June, August and September because of the shortage of cereals that will probably come about this year due to the poor harvests." (8) Y que le respondyo la dicha Françisca: ¿Como lo tengo de yr a dezir que lo vido Juan Xymenes y negalo y no tengo con quien provarlo? (Conversaciones estrechamente vigiladas, 1515) "And Francisca answered him: Why should I go and say that Juan Jiménez saw it, and deny it, without having anyone to confirm it?" (9) … siendo mi fin buscar la vida para nuestra vejez, que es mi ánimo éste y lo ha sido, hallarme en lo mejor de mi edad para poderlo trabajar, y si este tiempo lo pierdo, después qué había de ser de nosotros (Cartas desde América, 1723) "... as my objective is to prepare for our old age, which is and has been my motivation, to be of the best age to be able to work on it, and if I lose this time, what shall become of us later?" Examples (4) and (5) are typical cases of deontic modality, expressing obligation or necessity, as the majority of tokens in our corpus: in the 16th century, 61% of all modal periphrases are deontic, a figure that rises to 74.1% in the 18th century. (6) and (7), on the other hand, are examples of epistemic modality in which the speaker refers to probable, presumed or approximate events or states of affairs. There are far fewer cases of epistemic modality in the corpus, and their proportion further diminishes between the 16th and the 18th century, from 13.2% to 6.5%.  Examples (8) and (9), finally, are instances of much rarer modal usages with an expressive force implying surprise, indignation, reproach, etc. (Gómez Torrego 1999: 3356). Added together, these modal values account for a large percentage of the tokens found in the corpus, rising from 75% in the 16th to 81.5% in the 18th century.
This increase in the proportion of modal values is, in fact, primarily due to a decrease of the non-modal use of [haber de + infinitive] and [tener de/que + infinitive] expressing purely temporal future reference, a function already documented in medieval times (Yllera 1980;Lapesa 2000;Hernández Díaz 2006). It should be kept in mind that the modal usages of these periphrases frequently have an implicit prospective value; this is due to the fact that especially deontic modality often coincides with future reference, as the notion of obligation implies futurity (Sinner 2003: 200). In a considerable number of tokens, this implicit futurity has been reanalysed as the main meaning of the periphrasis, and the respective construction is used to refer to events that are predicted to take place in the future, without any hint or shade of modality; in these cases, the periphrases are in direct competition with other future constructions such as the synthetic and the periphrastic future tense, as exemplified in (10) and (11), where the writers alternate between the synthetic future and the periphrastic constructions.
(10) quiero dezir ansí que pienso que tarde a de ser mi venida, por eso ésta será primera y postrera que tengo de escribir… (Vida y fortuna del emigrante navarro, 1557) "So I want to say that my arrival will be late, which is why this [letter] will be the first and last that I will write." (11) Espero que toparé la orden de Vms. para entregar lo que les devo en España. Vms. [ ... ] por quien son, espero que me han de mirar con caridad (Al recibo de esta, 1776) "I hope that I will receive your order to send what I owe you in Spain.... because of who you are, I hope you will consider me with charity." While a quarter of all tokens of these periphrases have non-modal future values in the 16th century, this proportion decreases to 18.5% by the 18th century. Table 2, the periphrasis haber de remains the most frequently used variant until the 18th century, not only in terms of overall frequency, but also with each and every one of the semantic values examined. Nevertheless, some significant shifts in the effect of the different modal values on the choice between the available variants are also apparent. For instance, for the expression of epistemic modality, deber (46.4%) comes close to drawing equal with haber de (53.6%) in the 18th century, up from (41.4%) in the 16th century. A much stronger shift can be observed in the area of deontic modality, where the proportion of haber de decreases from 74.4% in the 16th century to 53.2% two centuries later. The reduction in frequency of [haber de + infinitive] is primarily due to the rise of its main competitor, [deber (de) + infinitive], which more than doubles its proportion in deontic contexts, rising from 16.1% in the 16th to 39.2% in the 18th century; [tener de/que + infinitive], on the other hand, remains rare, and its percentage in obligational contexts hardly changes between the two centuries examined.

Distribution of the periphrases according to their semantic values As shown in
In contrast, the percentages of non-modal future values hardly differ between the 16th and the 18th century: haber de accounts for 93.4% of all purely temporal future values in the 16th and for 94.4% in the 18th century. The only (limited) competition comes from the periphrases with tener, whilst [deber + inf.] never occurs in such nonmodal contexts. 11 In deontic modal contexts, on the other hand, the three periphrases examined here are in real competition, which is why the ENVELOPE OF VARIATION for the present multivariate analysis has been limited to these contexts.

Coding and methodology
The variationist approach aims to establish how certain contexts favour or disfavour the choice of one linguistic form or structure over alternative forms or structures that have the same referential meaning or function. To achieve this, it is necessary to test a series of hypotheses regarding the influence of certain constraints, which are the potential conditioning factors in a multivariate statistical analysis.
In a first step, all tokens of the variable to be examined, i.e., all instances of modal periphrases, were extracted from the corpus using the concordancer Wordsmith 6.0. Subsequently, it was determined for each token whether specific linguistic and extralinguistic factors were present in the particular instance or not, and this information was encoded for the statistical analysis. The factor groups taken into account in this study, listed below, are those that have been shown to be significant in previous work on modal periphrases (Balasch 2008(Balasch , 2012Blas Arroyo & Porcar 2014;Blas Arroyo et al. 2013); examples illustrating each of the (potential) factors are given in Table 3. 12 More specific details about the factors found to be statistically significant in this study will be presented in the corresponding parts of §6.

Extralinguistic factors:
For the classification of letters along the stylistic continuum, two basic criteria are taken into account: (a) the main topic, and (b) the closeness of the relationship between writer and addressee (see §6.1.5. for further details). "... and as it is a task that we must all do, we must accept God's will" Other modal periphrasis Y me parece que también se le debe hacer cargo del crédito de este dinero que injustamente ha retenido. Si la guerra permanece habré de remitir los reales asegurados (Al recibo de esta, 1795) "And I believe that he should also be charged for this credit that he has retained without justification. If the war goes on I shall have to pay the fixed amount of money" Applying the principles of the sociolinguistic comparative method (Poplack & Tagliamonte 2001), our quantitative analysis consists of two independent multivariate analyses with identical factor groups, one for each of the historical periods examined. By comparing the results of these two analyses, it is possible to trace the path along which the emerging and the receding variants gradually enter or leave the system, focusing on the trajectory of their functions (Poplack 2011: 215). These analyses were carried out using Goldvarb X, with the periphrasis [haber de + infinitive] as the APPLICATION VALUE.
14 While the examination of overall frequencies and percentages can provide certain insights, the multivariate analysis shows not only the differences in usage frequency of the competing variants for each of the contexts considered, but, more importantly, the degree of statistical significance and the order of relevance of different factor groups when considering all potentially significant conditioning factors simultaneously as well (Tagliamonte 2006: 235-245). The individual conditioning factors are arranged along a probabilistic scale between 0 and 1; favouring factors have factor weights greater than 0.5, while disfavouring ones have factor weights lower than 0.5. The further away from 0.5 this figure is, the greater the weight of the respective factor. The relative strength of each factor group is shown by the RANGE value, which is obtained by calculating the difference between the largest and the smallest factor weight in each factor group (Walker 2010); "[t]he higher this number is, the greater the contribution of that factor to the probability of the form" (Tagliamonte 2012: 127).

Results and analysis
As mentioned above, during the periods examined here [haber de+infinitive] is in competition with both [deber (de) + infinitive] and [tener de/que + infinitive] only when it is used to express deontic modality, which is why the ENVELOPE OF VARIATION for the present multivariate analysis has been limited to deontic contexts.
As will be seen in this section, despite a general decrease in frequency in the majority of contexts, a number of variation patterns remain the same between the 16th and the 18th century, both in terms of significant factor groups and the direction of the effect of individual factors within these groups ( §6.1). On the other hand, there are also some conditioning factors that are not yet significant in the 16th century but gain importance by the 18th, thus revealing new paths along which the emerging variants spread ( §6.2).
While the different patterns of variation will be discussed separately, it should be pointed out that the data in the corresponding tables is drawn from the same statistical run of Goldvarb. Table 4 shows the effect of some of the factors outlined in §5 on the selection of [haber de + infinitive] during the two periods examined in this study.

Patterns of continuity
In this subsection, we discuss those factor groups that are statistically significant, with the same direction of the effect, in both periods examined.

Type of deontic modality
As mentioned in 4.2. above, the overwhelming majority of tokens in both centuries express deontic modality (necessity or obligation), which makes it particularly interesting to determine whether there are any significant differences regarding the effect of different semantic subtypes within this modal range. Distinguishing and classifying such different types or 'shades' of deontic meaning is a difficult task that has been attempted by numerous scholars (Keniston 1937;Yllera 1980;Olbertz 1998;Gómez Torrego 1999;Fernández de Castro 1999;García Fernández 2006;López Izquierdo 2008;Martínez Díaz 2008). The statistical analysis carried out in this study reveals a clear difference between two basic types: externally imposed obligation on the one hand, and self-imposed as well as subjectively perceived obligation on the other. As seen in Table 4, contexts involving the former type are favourable to the use of haber de, whereas the latter type of deontic modality has a disfavouring effect on the choice of this periphrasis. The category of external obligation includes obligations imposed by written or unwritten rules, agreements, social conventions, laws, etc. (12), obligations as the result of a command or order (13), obligations imposed by external circumstances (14), and inevitability (15).
(12) … y logré el 22 de agosto sacar el documento de sus manos con motivo de cierta rebaja que havía de hacerse de 500 pesos… (Al recibo de esta, 1789) "... and on the 22nd of August I managed to take the document out of his hands because of a certain reduction of 500 pesos that had to be applied..." (13) te avisso que no as de venir aca sin traer tu madre y hermanas. (Cartas de particulares en Indias del siglo XVI, 1594) "I inform you that you must not come here without bringing your mother and sisters." (14) Si la guerra permanece habré de remitir los reales asegurados como V ms. me dicen en su apreciable de 4 de diciembre del año anterior (Al recibo de esta, 1795) "If the war continues, I will have to send the money insured, as you advise me to do in your much appreciated letter written on the 4th of December last year." A more fine-grained analysis shows that each of these types of externally imposed deontic modality has a different degree of likelihood to trigger the choice of [haber de + infinitive] in both centuries, with inevitability being the most favourable factor and obligation due to external circumstances the least favourable one for the choice of this variant. While the percentage of haber de decreases for all sub-types of deontic modality between the two centuries, in line with the general development, Table 5 reveals that there is a greater decline of the preference for this periphrasis in those semantic environments that are already less likely to trigger its use in the 16th century (rules and agreements, external circumstances) than in the environments most closely associated with haber de (inevitability, order/command). As will be discussed below in more detail, this is a clear example of more strongly ENTRENCHED form-meaning pairings (e.g., [haber de+inf.]-inevitability) resisting change and replacement for longer than less entrenched ones.
Furthermore, there appears to be a correlation between the degree of inevitability or coercive force and the prefence for haber de: the more predictable it is that the obligation will be carried out, the more likely it is that haber de will be used. Thus, if something must inevitably be done, the subject has no choice whatsoever; an order or command must normally be followed, but there is at least a theoretical possibility of ignoring it; rules and agreements must also be followed but are occasionally flouted, while external circumstances can oblige us to act in a certain way, but there may be an alternative course of action available.
In addition to these types of externally imposed obligation, there are also selfimposed obligations caused by the subject's internal convictions (e.g., religious, ethical or philosophical persuasions, gratitude or respect) 16 (16) and a subjective sense of necessity (17). The obligation or subjective necessity can be perceived either by the subject of the sentence containing the modal construction, or on behalf of that subject by the person uttering/writing the sentence. 17 (16) … yo siempre he de cumplir con mi ocupación, pues mi mayor deseo es darte gusto en todo para que conozcas lo mucho que te estimo y venero (Cartas desde América, 1717) "... I must always fulfil my duty, for my greatest wish is to indulge you in everything, so that you understand how much I love and cherish you." Subjective necessity or advisability perceived by the speaker has a lower degree of coercive force than obligations imposed by external circumstances. In contrast to the external circumstances causing necessity in (14), the modal periphrasis in (17)  As seen in Table 4 above, subjective necessity and self-imposed obligation have a disfavouring effect on the choice of haber de. Table 6 shows that internal (moral) obligation is already least likely to trigger the use of this periphrasis in the 16th century, and even less so two centuries later. 16 The fact that the need to meet the obligation is primarily subjective means that this type of deontic modality is, semantically, often very similar to volition (Yllera 1980: 114). 17 Following Martínez Díaz (2008Díaz ( : 1285, we understand modality as "the expression of the subjectivity of the (enunciated) statement, meaning that the subject of the enunciation may or may not be the same as that of the enunciated statement" (our translation). Otherwise, only 1SG statements would be eligible for certain categories such as internal obligation. 18 This category also includes cases in which the periphrasis is used as a phatic device (he de decirle "I must tell you", has de saber "you must know", etc.), which Gómez Manzano (1992: 160) describes as a "kind of crutch" and Gómez Torrego (1999: 3354) analyses as a manifestation of the speaker's desire "to enter into communication"; it is particularly common in the epistolary genre that most of the texts in our corpus belong to. Though this is a conventionalised and pragmaticalised usage, it nevertheless retains the notion of beneficial necessity or advisability perceived by the speaker, as in "it would be beneficial if I told you" or "in my opinion it would be beneficial for you to know". Subjective necessity/advisability is somewhat less self-imposed than purely internal obligation, as external circumstances frequently contribute to the sense of necessity. Considering the correlation between the degree of external coercion and the use of [haber de + inf.] identified in Table 5 above, the slightly higher percentage of haber de in contexts of subjective necessity does not come as a complete surprise.

Person and number
'Person and number' was divided into two factors because of a clear difference, in the 16th century, between haber de in 1SG contexts, where only 37.2% of tokens contain this periphrasis, and the other person contexts, in which the occurrence of this periphrasis is more than twice as likely (79.9%), as shown in Table 7. This translates into a statistically significant association of non-1SG contexts with haber de (FW .57), while 1SG contexts have a clearly negative effect on the use of this periphrasis (FW .17).
The disfavouring effect of 1st person singular contexts on the selection of haber de is also evident in the 18th century (34.1%). The main difference when compared with the 16th century is that this reluctance to use haber de has spread from the 1st person singular to the 1st person plural, where the most drastic change between the two periods can be observed, with a decline of haber de from 82.9% in the 16th to 23.4% in the 18 th century. In fact, a separate run of the statistical analysis in which 1st person singular and plural are considered together shows that this context is among the least favourable environments for haber de (FW .23) and thus a prime route for the expansion of alternative variants. 19

Tense/mood
A pattern of continuity between the 16th and the 18th century can also be identified for the factor group tense/mood. In terms of representation in the corpus, the present 19 Log-likelihood: -412.075; significance: 0.003 20 The considerable difference between the number of singular and plural 2nd-person contexts can be explained by the differences in the 'semantics of solidarity' (Brown & Gilman 1960) between the two centuries: while the 2PL verb forms (and the pronoun vos) were commonly used to address a single interlocutor, even in contexts of solidarity, in the 16th century, this had changed by the 18th century, when 2SG verb forms (and the pronoun tú) had come into general use in those contexts. It is, indeed, remarkable that there is only a single case of a 2PL modal periphrasis in the 18th-century corpus.
indicative is by far the most frequent in both periods, with figures consistently above 70%. This is followed, at a considerable distance, by the imperfect indicative (16th c.: 16.6%; 18th c.: 11.2%). All other tenses and moods occur at such a low frequency in both subcorpora that they were grouped together. 21 The results of the logistic regression analysis show that, despite a general decrease in the selection of haber de in the 18th century in comparison with two hundred years earlier, the constraint hierarchy remains the same. Thus, the imperfect indicative is the most likely to select this periphrasis in both centuries (16th c.: 85%, FW .71; 18th c.: 67.9%, FW .67), followed by the present indicative (16th c.: 73.6%, FW .48; 18th c.: 56.1%, FW .51), whilst the less commonly occurring tenses are less favourable contexts for haber de (16th c.: 53.6%, FW .28; 18th c.: 36.1%, FW .34). The fact that the vast majority of modal constructions appear in the present or imperfect indicative and that haber de retains a strong link with these tenses goes some way to explaining why this periphrasis remains the numerically dominant modal construction in the 18th century, making it a clear example of how the most frequent contexts tend to support the continued use of the older, more established variant due to their greater degree of entrenchment (cf. Bybee 2006; Rosemeyer 2015).

Clausal polarity and type
Given the general prevalence of affirmative clauses in the types of documents contained in the corpus, it is not surprising that the the proportion of affirmative clauses containing periphrastic modal constructions (approx. 85%) is also far greater than that of negative clauses (approximately 13.5%), while non-declarative clauses (exclamations, exhortations, direct questions) account for a mere 1.7% of all modal periphrases.
Leaving aside the latter because they are heterogeneous and also very infrequent (14 and 8 tokens in our 16th and 18th century corpora, respectively), our multivariate analysis reveals that we are dealing not only with a statistically significant factor group in both centuries, but also with the same direction of the effect: affirmative clauses favour the use of haber de, while the opposite is the case for negative clauses. Though the percentage of haber de in affirmative clauses decreases considerably, from 77% in the 16th to 54.3% in the 18th century, the fact that this clause type accounts for an extremely large proportion of all clauses containing modal periphrases makes them one of the main pillars supporting the use of this periphrasis, not only in the 16th, but also in the 18th century.
On the other hand, the much less frequent negative clause contexts act as an

Stylistic and register variation
A similar pattern of continuity can be identified when considering the factor group 'style', which in this study is understood to consist of a combination of two parameters: (a) the main topic of epistolary texts, and (b) the closeness of the relationship between the sender and the addressee. 22 Based on these two factors, we can establish a stylistic continuum with the following extremes: a) Personal correspondence of a private or intimate nature, with close ties between sender and addressee; in the majority of cases, these are members of the same family, but correspondence between close friends or lovers also falls into this category. b) Letters dealing with non-personal matters and those with a clear distance along the axes of familiarity and solidarity (e.g., letters sent by commoners to their superiors, to members of the clergy or nobility, or to other recipients belonging to higher social classes).
The analysis of the 16th-century data reveals that [haber de + infinitive] is, during this period, already more commonly used in familiar and informal contexts (80%, FW .62) than in letters of a more formal nature or with a greater social distance between the correspondents (66.3%, FW .38). Two centuries later, this trend has not changed: haber de occurs less frequently in global terms, but this development is particularly strong in formal letters, with a decrease of more than 50% (16th c.: 66.3%; 18th c.: 30.8%). It is in these contexts that the diffusion of the emerging variants is most notable. This stands in stark contrast to the present-day use of haber de in Peninsular Spanish, which, apart from a few regional varieties, is limited to the most formal registers of the written language (Gómez Torrego 1999;Sinner 2003;Martínez Díaz 2003;NGRALE 2009). On the other hand, the more personal and spontaneous contexts remain the stronghold of the older, established variant [haber de + infinitive] in the 18th century; the relatively high frequency of this periphrasis in everyday communication is likely to be the reason for its survival over so many centuries.

Discourse factors: the priming effect
In both periods, the choice of periphrasis is clearly influenced by a 'priming effect', i.e., a tendency for speakers to repeat linguistic material they have used in previous stretches of their discourse (Labov 1994;Pereira-Scherre & Naro 1992;Travis 2005). Taking into account the potential relevance of 'structural priming' (Pickering & Ferreira 2008) in the selection of haber de, one of the factors considered in this study is the presence of another, preceding modal periphrasis. Thus, all tokens in the corpus were assigned to one of three different groups. The first, exemplified in (18), contains the tokens immediately preceded by the same modal periphrasis. 23 The second group, exemplified in (19), contains the tokens preceded by a DIFFERENT modal periphrasis, whilst in the vast majority of cases there is no preceding periphrasis (group three).
(18) … y como es jornada que todos hemos de hazer emonos de conformar con la voluntad devina (Cartas de particulares en Indias del siglo XVI, 1565) "... and as it is a task that we must all do, we must accept God's will." (19) Y me parece que también se le debe hacer cargo del crédito de este dinero que injustamente ha retenido. Si la guerra permanece habré de remitir los reales asegurados (Al recibo de esta, 1795) "And I believe that he should also be charged for this credit that he has retained without justification. If the war goes on I shall have to pay the fixed amount of money." Our analysis reveals that this factor is a strong constraint in both centuries, with high RANGE values of 67 and 68 respectively. The explanatory hierarchy within this factor group also remains similar: in the 18th century, [haber de + inf.] is especially likely to be used when preceded by another [haber de + inf.] construction (83.9%, FW .77), and especially unlikely to be used when preceded by a different modal periphrasis (13.1%, FW .11), while the absence of a preceding periphrasis has practically no effect on the choice of periphrasis (54.6%, FW .52). What is important is that the effect of these three contexts hardly changes when compared to the 16th century (haber de: 91.4%,FW .79;others: 17.6%,FW .08;none: 74.9%,FW .50), despite the declining overall usage frequency of the construction.
A reason for this continuity might lie in the type of texts examined here, which are less prone to following the prescriptive stylistic norm, typical of more formal registers, which stigmatises the repetition of identical lexical items in close sequence. While literary, legal, administrative and religious texts, on which much of diachronic research has traditionally been based, tend to avoid repetition if possible, the personal nature and style of the letters and memoirs examined in this study, the limited formal education of many of their writers, as well as the spontaneity and emotion of these documents make them more similar to oral discourse, in which formal stylistic norms such as the avoidance of repetition are not generally followed.
This hypothesis is corroborated by the data presented in Table 8, showing that there is, indeed, interaction between the priming effect and register/style (examined in §6.1.5. above), particularly in the 18th century. The cross tabulation of the two factors confirms that a higher degree of formality reduces the priming effect, whereas a less formal, more spontaneous style appears to favour this cognitive process.

Diverging patterns of variation
The factor groups discussed in this section were selected as significant in one of the centuries but not in the other. In some cases, these factor groups do appear to have a similar effect on the choice of the modal periphrasis in both centuries, but the effect is only statistically significant in one of them.

Syntactic (im)personality
An example of this is the factor group 'degree of (im)personality', in which the active voice ( (20) and (21)) is contrasted with non-active constructions, i.e., the passive voice and the impersonal reflexive construction ( (22) and (23)).
(20) Creo que con el deseo que tengo de ayudar a vm con ese me a de faboreser dios (Cartas de particulares en Indias, 1594) "I think that, given my desire to help you in this matter, God must assist me." (21) … y si no se hacen las paces será más dilatada la ida, porque no me he de poner a riesgo de perder la vida. "That I finally have good credit and a startup capital of twenty thousand pesos, for which I am very grateful to God, which can probably be said of very few of those that came with this fleet." As seen in Table 4 above, the preference for haber de in non-active contexts is statistically significant in the 18th century (72.1%, FW .68), whereas the degree of impersonality is not a significant factor group two centuries earlier.

Agentivity
Similarly, the factor 'agentivity', which distinguishes 3rd person 24 human from nonhuman subjects, is also only significant in the 18th century. According to some scholars, constructions with the verb haber already had a deagentivising effect in Old Spanish; for instance, Stengaard (2003Stengaard ( : 1151 observes that "by means of the periphrasis with aver, the subject of the action expressed by the infinitive either loses its possible role as subject-agent, or the role of subject-recipient or patient implied by the respective verbal action". This semantic effect is related to the meaning of non-auxiliary haber, which expresses non-agentive or receptive possession in which the subject does not exert control over the possessed item (Seifert 1930). 25 As seen in Table 4, our data confirms this hypothesis for 3rd person subjects in the 18th but not in the 16th century. Though the differences are not very great and the low RANGE value (12) indicates that we are dealing with a relatively weak factor, the analysis nevertheless confirms that, in the 18th century, non-human subjects favour the use of haber de (70%, FW .57) more than human subjects do (60.1%, FW .45).

Assertiveness/intensification
The variation observed in the 18th century is also sensitive to the degree of assertiveness or intensification, a factor that has no significant effect two hundred years earlier. Despite the absence of intonation and the typical paralinguistic cues of the spoken language, it is possible to identify tokens with a high degree of assertiveness or emphasis in the written texts examined here; the authors of the letters have a range of morphosyntactic and semantic resources at their disposal that allow them to emphasise or intensify a statement, either with regard to the content of the statement itself or to the recipient of the letter (in some cases these two types of emphasis coincide). Possible indicators of an emphatic or intensified environment include a wide range of linguistic elements such as specific prefixes and suffixes (e.g., the superlative suffix), the use of evaluative lexical items and expressions, certain types of modality (e.g., imperative, exhortative), certain clause types (e.g., comparative clauses), as well as rhetorical devices such as repetition, enumeration, metaphors and hyperboles. 26 Examples (24) and (25), taken from the 16th and the 18th century corpus, respectively, are clear cases of tokens with a high degree of assertiveness: 27 (24) … y esto se a de esforçar muy de veras porque de otra suerte no quedará nada (Vida y fortuna del emigrante navarro, 1596) "... and this must be pursued very seriously because otherwise there will be nothing left" "In addition, so as not to pay for salaries, accommodation, expenses and other things, the executor would have to sell the goods for much less than they are worth." While the degree of assertiveness makes virtually no difference in the 16th century, a statistically significant effect of this factor group can be observed two hundred years later, when haber de has become associated with these intensified contexts (66.2%, FW .64), whereas non-intensified contexts seem to have an almost neutral effect on the choice of this periphrasis (50.8%, FW .46).
A more detailed analysis shows that there is an interesting correlation between this factor group and the stylistic tenor of the respective text. As discussed in §6.1.5., in letters dealing with private, personal or intimate matters, haber de is used more frequently than in less personal letters in both our corpora, but particularly so in the 18th century. When viewed in combination with the degree of assertiveness (see Table  9), it becomes apparent that this preference for haber de increases in intensified contexts in letters of a highly personal nature (72%), while it is disfavoured in those that are less private (54%). An even greater difference is observed in non-intensified contexts, where haber de is selected for 62% of modal periphrases in letters of a more intimate nature, while this construction accounts for only 27% of the tokens in less personal correspondence. These environments, i.e., non-intensified contexts in more formal, less spontaneous texts, are thus an important area for the expansion of the newer, less established variants, especially of [deber (de) + inf.], which accounts for a considerably higher proportion of tokens (66%).

Clause type
Finally, the reverse tendency can be identified with regards to the factor group 'clause type', i.e., whether the periphrasis appears in a main or in a subordinate clause. While subordinate clauses have been identified as more resistant to syntactic change in other processes of variation (Tarallo 1989;Matsuda 1993), for the Spanish modal constructions this is only the case in the 16th-century corpus, whereas there is no significant difference between main and subordinate clauses in the 18th century, as seen in Table 4.

Conclusions
As shown by the data presented here, there is a significant shift in the choice of modal periphrases between the 16th and the 18th centuries. The changes involve a gradual reduction in the use of the most common variant, This general shift is characterised by a significant drop in the frequency of haber de, over this 200-year period, in the majority of contexts examined here. However, a variable rule analysis comparing the two periods shows that the change does not occur with the same intensity in all contexts, and, more importantly, it does not progress along a single, uniform pathway; there are, in fact, at least two different patterns of development.
The first of these patterns, continuity through time, in which the same factor groups have a statistically significant impact with the same direction of effect in both centuries, can be identified for the following factor groups: type of deontic modality, person/number, clausal polarity, tense/mode, structural priming, and the stylistic tenor of the respective text. While the overall proportion of haber de-tokens drops across the board in the 18th century, there is little difference in the weight or ordering of factors within these factor groups -in both centuries, low-frequency contexts 28 such as negative clauses, 1st person singular subjects, non-external obligation and tenses other than the present and imperfect indicative favour the newly emerging variants, whereas high-frequency contexts such as affirmative clauses, non-1SG subjects, external obligation, as well as the present and imperfect indicative tenses favour the continued use of established haber de.
The effect of stylistic tenor (i.e., +/-formal) on the choice of the periphrasis also remains similar in the two periods examined, with haber de clearly preferred in less formal contexts. In other words, the replacement of haber de is particularly advanced in more formal contexts, whereas the more spontaneous environments that resemble the oral language to a greater extent are more favourable to the older, established variant. Given that our two corpora consist primarily of letters dealing with personal and private matters, this is also likely to be one of the reasons for the robustness of [haber de + inf.] in the present study.
Contrasting with this predominant pattern of continuity, our comparative analysis has also identified several factor groups that have a significant effect in one century but not in the other. In some cases, they appear to show a pattern of continuity (albeit not statistically significant in both periods), in the sense that an incipient effect in the 16th century turns into a fully-fledged significant factor two centuries later. An example of this is non-active contexts (i.e., passive and impersonal clauses), which, in the 18th century, become a stronghold of haber de. Generally, however, these cases simply show that a factor group can, over time, attain or lose significance regarding its impact on variation, which is to be expected in an area of grammar that is going through a period of change.
In conclusion, the prevalence of [haber de + infinitive] in the 16th century is weakened by alternative variants, in particular [deber (de) + infinitive] two centuries later (see Table 1), and this decline can be observed in the majority of the contexts analysed. Nevertheless, haber de generally retains its dominance in the most frequent environments, while the alternative periphrases spread primarily in the less frequent contexts.
It has long been known that high-frequency forms are more resistant to change, due to their greater degree of cognitive entrenchment: "the more a form is used, the more its representation is strengthened, making it easier to access the next time" (Bybee & Thompson 2000: 380). This 'conserving effect' of high frequency forms has been demonstrated not only at the lexical level (e.g., Bybee 1985;Langacker 1987), but also with regards to syntactic phenomena (e.g., Givón 1979;Croft 2000;Bybee & Hopper 2001). What the present study has shown is that this entrenchment effect does not only apply to the overall frequency of a form, but crucially also to individual form-context combinations: A variant (i.e., form) that occurs particularly frequently in a specific highly frequent morphosyntactic, semantic, pragmatic or stylistic context tends to be more resistant to being replaced IN THAT CONTEXT, but not necessarily in other, lowerfrequency environments, which are more prone to admitting alternative variables at an earlier stage. As pointed out by Bybee (2006: 715), the conserving effect of high token frequency means that the memory representation of specific strings or sequences of morphemes and words is strengthened in the language user's mind, making them more readily accessible and therefore reinforcing their entrenchment; these 'exemplar representations', which form part of clusters of similar settings, furthermore "allow specific information about instances of use to be retained in representation" (Bybee 2006: 717). As constructions are, in this model, understood as the result of a cognitive process in which strings that are both formally and semantically similar are "stored close to one another" (Bybee 2006: 716), the specific information about the use of a particular string that is stored together with individual representation also constitutes part of the information about the construction that these individual representations form part of.
Regarding our study of modal periphrases in Spanish, the relevant specific information associated with the respective (competing) constructions is the context in which their representations, be they semantic, stylistic or morphosyntactic, typically occur. This information is thus an integral part of the construction itself, which to some extent must lead to a perpetuation of its preferential use in these contexts. This does not, however, preclude constructions from expanding to contexts in which they are not initially the most typical choice, as such tokens are merely less prototypical and more distant from the exemplar representation, but still share a number of formal and semantic features with it.
While exemplar theory predicts a general correlation between high frequency and conservation of established linguistic forms, Poplack (2001) observes that this is not universal in ongoing competition between two variants, and some counterexamples have, indeed, been identified in our analysis, such as passive/impersonal contexts which, despite their comparatively low frequency, favour the use of the older, established variant to a greater extent than the far more frequent active construction does (cf. §6.2.). However, in the vast majority of significant factor groups, the most frequent contexts are also the most conservative ones, thus generally confirming the important role of entrenchment in slowing down syntactic change, even at the microlevel of individual contextual factors.
As a result of this split pattern, haber de remains the overall most frequently used variant during the 18th century and beyond; despite the gradually increasing proportion of contexts favouring the variant [deber (de) + inf.], its overall dominance only comes to an end at the beginning of the 20th century (cf. Blas Arroyo & Vellón 2014).
Summing up, the end of the 18th century represents a milestone in the evolution of modal periphrases in Spanish, though there is also evidence that certain patterns of variation continue beyond this period. Future variationist studies examining the more recent history of Spanish will be able to reveal whether the variables identified as significant in this analysis continue to play a role in the further decline of [haber de + inf.] and the consequent rise of its competitors, or whether other factor groups gain importance and begin to affect the language users' choice between the different available modal periphrases, thereby implicitly contributing to long-term language change.