The refuge of a dying variant within the grammar: Patterns of change and continuity in the Spanish verbal periphrasis haber de + infinitive over the past two centuries

Abstract Based on a corpus of ego-documents (private letters, diaries, memoirs) from the 19th and the first half of the 20th centuries, this paper presents a variationist comparative study to determine the fate of the modal periphrasis haber de + infinitive in the history of modern Spanish. Detailed analysis of the envelope of variation enables us to show that, despite an abrupt decline in the selection of haber de relative to tener que, both ‘to have to’, grammatical environments that favor its use remain in the mid-20th century. Many of the factor groups and the hierarchy of constraints during this period are similar to those that operated in previous periods. Nevertheless, a generalized decrease in the explanatory power of these factor groups, as well as some divergent patterns within several of these groups are also observed, mainly as a result of the fact that haber de + infinitive is increasingly relegated to some restricted areas of the grammar and lexicon. Based on these results, some theoretical implications for changing rates and constraints in language change and grammaticalization are discussed.

In the summer of 1936, the father of the man who was to become a leading figure in the Spanish Socialist Party, Victor Manuel Arbeloa, wrote 11 letters to his wife, Josefina. In these personal missives, he shared his exploits on the San Sebastián front, shortly before being seriously wounded in the siege of this Republican stronghold. On reading those letters, one finds fragments such as the following 1 : (1) . . . si tengo tiempo voy un rato a la iglesia a la noche mientras los mozos se van al bar, aunque algún día ya les acompaño a echar una copa, todo no ha de ser tampoco estar pensativo y triste. (Once cartas de mi padre, 8-25-1936) '. . . if I have the time I go to church for a while at night while the lads go to the bar, although some days I go with them to have a drink; one doesn't have to be thoughtful and sad all the time, either.' (2) Ahora que me acuerdo, el domingo, según la Epístola, era el de los ramos y todo aquello. Ahora todos tenemos que ser bravos. (Once cartas de mi padre, 8-25-1936) 'Now that I remember, that Sunday, according to the Epistle, was Palm Sunday and all that. Now we all have to be courageous.' given 3 based on data extracted from the Corpus del Español (Davies, 2002), a 100million-words corpus mainly from texts belonging to formal discourse traditions, such as literary works, moral and pious books, and administrative and scientific works. As illustrated in Figure 1, the proportions of the periphrastic units indicate a fundamental change in this grammatical paradigm in the past century, confirming that, in the 18th and 19th centuries, haber de was still preferred over tener que in all text types (Martínez Díaz, 2003).
With this in mind, in this study, we shall (a) attempt to establish which factors condition the selection of this periphrasis in the 20th century, considering all potential factors simultaneously and thereby allowing us to compare their explanatory magnitudes and hierarchy; and (b) check the (in)consistencies of those factors against the data from the 19th century, with the aim of analyzing the process of change that affects these periphrases.
As we shall see, despite confirming the generalized extension of tener que to the linguistic contexts that were previously occupied by or shared with haber de, the system still offers some refuge for the latter. Most of the relevant factors are the same as in previous periods, though with some changes in their explanatory power and the hierarchy of constraints.

C O R P U S A N D M E T H O D O L O G Y
Within the framework of a wider research project on historical sociolinguistics (see note 1), we compiled a corpus of written ego-documents for this study. Such materials are considered to be more informal and closer to the vernacular than other, more formal text types (Oesterreicher, 2004). The texts, mainly private letters and (to a lesser extent) several autobiographical works, were written by Spaniards of different social and dialectal origins. Various registers are represented, ranging from documents dealing with more personal matters to others of a less intimate nature.
The private properties of these texts make them attractive for the study of informal language in earlier periods for which no oral testimonials have survived. This is especially true in the case of personal letters (Elspass, 2012). They contain many autobiographical details, which make it possible to determine the relationships of power and solidarity between senders and Source: Davies (2002); López Izquierdo (2008:793). 92 addressees, as well as their social status (Okulska, 2010). Likewise, they contain ethnographic information that enables researchers to unravel some of the details of social life in bygone times (Raumolin-Brunberg, 2005). Moreover, the letters were not written with the intention of their ever being published, which ensures that the language employed in them is closer to the vernacular than other types of discourse. For the 20th century, the corpus used here contains 2045 letters and two autobiographical works, amounting to a total of 695,090 words, written by more than 350 different authors. The 19th-century corpus, on the other hand, contains 1389 letters, two autobiographical texts, and one account book, authored by approximately 250 different writers, totaling 490,014 words. In Table 2, the period covered by this diachronic corpus is subdivided further into 33-year subperiods. For a complete list of the sources and the corresponding periods, see the Appendix.
A concordance program (WordSmith version 4; Smith, 2004) was used to locate all the occurrences of the two variants. This method resulted in a total of 1326 tokens: 282 from the 19th-century and 1044 from the 20th-century materials.
The tokens were then coded on the basis of 14 factor groups, namely: (i) length of the periphrasis; (ii) phonemic context (phoneme following the complementizer); (iii) modal shades of meaning; (iv) sentential modality; (v) tense and mood; (vi) person and number; (vii) agentivity; (viii) level of semantic (im)personality; (ix) lexical aspect of the main verb; (x) clause type; (xi) contextual modalization; (xii) syntax of the main verb; (xiii) subject expression; and (xiv) lexical priming. In Table 3, we show a representative example for every factor group in which the first periphrasis is the instance found in the corpus (sometimes simplified for the sake of brevity) and the second is the same sentence with the competing periphrasis, provided to show the contextual equivalence between them.
For the quantitative analysis, we apply the sociolinguistic comparative method (Poplack & Tagliamonte, 2001), in which two independent multivariate analyses (with identical factor groups) are carried out and then compared. By comparing the data of two historical periods, we can trace not only the fate of the emerging FIGURE 1. Distribution of the periphrases made with auxiliary verbs haber and tener by century (%). (NB. For the sake of a more accurate comparison between the periphrases with one or the other auxiliary verb, tener que and tener de are collapsed in this graph).
T H E R E F U G E O F A D Y I N G VA R I A N T W I T H I N T H E G R A M M A R 93 and the receding variants, but also the path along which they enter or leave the system, that is, the trajectory of their functions, which is of particular interest for the study of grammaticalization (Poplack, 2011:215). These analyses are carried out using GoldVarb X (Sankoff, Tagliamonte, & Smith, 2005), with the periphrasis haber de as the application value.
In this study, the variable context is both form-and function-based. On the one hand, we consider tokens of two different constructions, but at the same time, we limit the multivariate analysis exclusively to the expressions that are clearly modal, such as those exemplified in (1) and (2). We do not include purely temporal meanings, expressing future rather than modal senses, most commonly when the speaker or author uses the periphrases to situate events in a more or less defined future, that is, competing with other prospective forms such as the morphological or the periphrastic future tense: (3) Tú, María, no te muevas de ahí mientras no vaya yo; si no, no he de ir a verte (=iré a / voy a ir a verte) aunque estés en Pamplona. (Once cartas de mi padre, 9-2-1936) 'You, Maria, don't you move from there until I go there; otherwise, I shall not go and see you even though you are in Pamplona' (4) Así es que se me va la vista escribiendo y no tengo aliento para levantarme. Así es que si llego a salir, no tengo que valer (=valdré / voy a valer) nada. (Solo habremos muerto si vosotros nos olvidáis, 9-1-1940 The prospective uses of haber de in (3)-equivalent to the morphological and the periphrastic future-were far more frequent in medieval and early modern Spanish (NGRALE, 2009(NGRALE, :2146. On the other hand, although in the Spanish of times gone by these nonmodal, temporal meanings, were also common in the periphrases with tener (Blas Arroyo & González, 2014;Yllera, 1980), five centuries later, their uses as a future variant have practically disappeared, although we still come across the occasional case, as in (4). Consequently, in the pages that follow, we will focus our attention exclusively on the semantic field of modality, the main locus of variation and change between our periphrases. As a result, the number of tokens finally considered in the analysis is reduced to 1206: 224 from the 19th-century and 982 from the 20th-century corpora.

Overall results
In the 19th century, the frequency of haber de þ infinitive versus tener que is 46% (n = 104 of 224) but in the 20th century it is 22% (n = 215 of 767). 4 These figures confirm that, by the mid-20th century, haber de þ infinitive had already lost much of the prevalence it had enjoyed in previous centuries, whereas it still accounted for just over half of all modal periphrases in the 19th century. Figure 2 shows that the rate of haber de remains very steady throughout that century, where the temporal axis is divided into shorter periods of 33 years (compare with Figure 1). In contrast, the same graph shows a sharp decline in the use of haber de as the 20th century advances; from the 1930s onward, the overall frequency of haber de barely reaches 20%. The results of the two independent multivariate analyses, from which all tokens with future nonmodal values are excluded, are presented in Aquí cada cual tiene que/ha de atender a su trabajo 'Here everyone has to attend to his work' Lexical priming Same periphrasis Tiene que usar gafas, padece del higado y tiene que trabajar mucho 'He has to wear glasses, suffers from a liver condition and has to work a lot' Other modal periphrasis Pastora tiene que estar bajo órdenes del yerno y siente disgusto. Quien ha de saber bien eso es el marido de Olivia 'Pastora has to be under the orders of her son in law and she is unhappy. The one who must be well aware of this is Olivia's husband' None Le queda poco para terminar, pero para el año que viene tiene que/ha de ir al Servicio Militar 'He'll soon be finished, but next year he has to do his military service' following sections, we will examine the factor groups and constraints that have a statistically significant effect on the choice of haber de.

Modal meanings
We begin our analysis by examining the factor group that has virtually monopolized the debate on this case of grammatical change among Spanish linguists, namely modality. Among the modal meanings associated with the use of both periphrases in the literature, we find a very strong connection with deontic modality, which encompasses a range of meanings including obligation, permission, and necessity (Bybee, Perkins, & Pagliuca, 1994;Fernández de Castro, 1999;García Fernández, 2006Gómez Torrego, 1988Keniston, 1937;López Izquierdo, 2008;Martínez Díaz, 2008;Olbertz, 1998;Yllera, 1980; among others). Indeed, a clear majority of all tokens in the corpus (88%) express deontic modality. However, other modal meanings conveyed by these periphrases are the emphatic expression of notions such as surprise, indignation, recrimination (Gómez Torrego, 1999:3356). This is illustrated in the following examples, where the writer verbalizes his surprise (5), or emphatically confirms something obvious (6). These expressive values seem to be more closely associated with haber de (72%, n = 13, combining both centuries due to small raw frequencies): (5) Su descripción es demasiado lacónica pero a pesar de ello muy favorable por reunir todas las cualidades que prefiero en la mujer. ¿Por qué no ha de mandarme su fotografía? (Madrina de guerra, 2-9-1938). 'Her description is too laconic but nevertheless very favorable, since it includes all the qualities I prefer in a woman. Why wouldn't she send me a photograph of herself?' (6) … para mí eres mucho más que un sobrino pero porque te ayudé a criar, te tuve muchas veces en mis brazos y entonces cómo no tengo que tenerte cariño. (As cartas do destino, 7-7-1958). '. . . you are far more than a nephew to me, but because I helped bring you up, I held you in my arms many times, so how can I not be fond of you?' FIGURE 2. Usage frequency (%) of haber de þ infinitive relative to tener que þ infinitive over the 19th and 20th centuries.
Tokens with an epistemic meaning have also been collapsed with these nondeontic tokens. In these cases, the speaker makes reference to an event or state of affairs that he believes to be probable, presumable, or approximate. Note: Not selected as significant: lexical priming, following phonological context, length of the verbal group, contextual modalization, syntax of the main verb, and subject expression. Nonsignificant factor groups appear in brackets.

98
Those uses appeared for haber de in earlier periods (Yllera, 1980), but today they seem more frequent in Latin America than in European Spanish (NGRALE, 2009(NGRALE, :2147. As for tener que, according to López Izquierdo (2008:802) ". . . they began to spread from the late 18th century onwards, and above all during the 19th century" (our translation). (7) and (8) are examples of these usages: (7) Quien ha de saber bien eso es el marido de Olivia (Una familia y un océano de por medio) 'The one who must be well aware of this is Olivia's husband' (8) . . . el cartero leía las cartas y no faltaban más que tres sin haber aparecido la tuya aún, aunque no perdía la confianza de que tenía que estar allí por ser ya hoy jueves. (Once cartas de mi padre, 9-10-1936). '. . . the postman was reading the letters and when only three were left, yours had still not appeared, but I remained confident that it had to be there because today is Thursday' In the 20th-century corpus, the preference for haber de (65%; n = 34 of 52) over tener que for the expression of epistemic modality remains. This trend of using haber de þ infinitive in epistemic contexts is also aided by the increasing favoring with nonhuman subjects (see "Person/number and degree of animacy of the subjects"), which are a long way from the most prototypical sense of obligation, because commands need a human being to be required to obey an order. In fact, cross-tabulation of these factors shows that in the epistemicnonhuman contexts, the uses of haber de exceed (71%; n = 22 of 31) those of tener que by a wide margin. In any case, the proportion of epistemic tokens in the corpus is very low in comparison with deber (de), the default periphrasis for expressing conjecture in the first half of the 20th century (Blas Arroyo & Vellón, 2014).
In the end, Table 4 shows that nondeontic contexts are by far the best allies of haber de, with factor weights very high in both centuries (19th c.: .83; 20th c.: .87). With regard to deontic modality, which is expressed by the vast majority of tokens analyzed here, a number of authors have attempted to pinpoint its different shades of meaning (Bybee et al., 1994;Fernández, 1999;García Fernández, 2006Gómez Torrego, 1988Keniston, 1937;López Izquierdo, 2008;Martínez Díaz, 2008;Olbertz, 1998;Yllera, 1980). However, this is by no means a straightforward task; among other things, the imposition of the analyst's own subjective interpretation can lead to circularity (Tagliamonte & Smith, 2006:345). To prevent this from occurring to the extent possible, we have divided the deontic axis into two categories that can be measured more objectively: (i) the degree of obligation/necessity or advisability imposed 5 ; and (ii) the agent that imposes that obligation/necessity.
Combining these two categories results in the following main values 6 : Subjective or internal necessity or obligation. ("Internal obligation" in Table 4.) This refers to duties that are generated by inner conviction, or by the subject's will or intention based on reasons that may be of a religious, ethical, or philosophical nature or that arise from gratitude, respect, or any other internal motivation. It is therefore based on the convictions or the desire of an agent. Thus, the need to fulfil the obligation is felt primarily by the agent, which places these periphrases closer to those of a volitional nature (Roca Pons, 1980:73;Yllera, 1980:114). In case of no coincidence between the speaker and the subject of the clause, it is usually the former who refers to a moral obligation of the latter based on the above-mentioned moral values, as in has de/tienes que obedecer a Dios 'you must obey God'. Examples (9) and (10) are typical of tokens expressing internal obligation.
(9) Creo que la política de ahora no ha de ser de engaños ni es cuestión de forjarnos vanas ilusiones que después la realidad de los hechos ha de desvanecer. (Un catalanófilo de Madrid, 4-27-1930). 'I believe the politics of today must not be about deceiving, nor is it a question of dreaming up illusions in vain, only for the reality of the facts to make them fade away afterward.' (10) He pensado que ya que tú te habías tomado esta molestia, por educación me la tenía que tomar yo también, para darte las gracias por lo que has hecho por hacerme agradable la vida, escribiéndome al frente. (Madrina de Guerra, 6-9-1939) 'I was thinking that as you had gone to all that trouble, out of courtesy I had to do the same, to thank you for all you have done to make life more pleasant for me by writing to me on the front.' Agent-oriented or external obligation. ("External obligation.") This is obligation in its most literal sense, that is, the unavoidable necessity or the imperative/coercive advice are of an external nature for the agent of the action described by the verb. Hence, we are dealing with directive statements, among which we can distinguish different shades of meaning, such as obligations imposed by norm, agreement, social convention, legal code, etc., as in (11), or those forced by external circumstances beyond the subject's control, as in (12) Necessity or advisability. This is felt by the speaker, and therefore with far less coercive power than that expressed in cases such as those just mentioned. Thus, unlike in (12), in (13) and (14), the senders of the respective letters say that it is advisable to perform certain actions that they themselves will benefit from: (13) Allí mismo escribí otro volumen que titulo "Cinco hombres", impresiones sobre Pablo Iglesias, Jaime Vera, Tomás Meabe, Largo Caballero y Julián Besteiro.

100
Son a la vez crítica de un libro de cada uno de ellos. He de completarlo con algo más de lectura. (Dramas de refugiados, 9-6-1946). Right there I wrote another volume entitled "Five Men," impressions about Pablo Iglesias, Jaime Vera, Tomás Meabe, Largo Caballero and Julián Besteiro. At the same time they are a review of a book by each one of them. I must finish it with a little more reading.' (14) Esto que dices de la ropa, si me hace falta, pues no me hace falta que aquí solo se tiene que tener lo más necesario porque si no siempre tienes que ir cargado y con poca ropa hay bastante y no padezcas por mí. (Cartas del iaio, 10-12-1938).
'What you say about clothes, whether I need any, well I don't because here all you need are the bare essentials, otherwise you always have to carry a lot and you can get by with few clothes and don't worry about me.' Table 4 shows, in addition to the overall reduction in the use of haber de in the passage from the 19th to the 20th century, that some of the factors within the modal senses factor group are ordered in a similar way, with nondeontic meaning in the lead, followed at a considerable distance by internal obligation. However, these cases of persistence in the semantic distribution of the receding variant stand in contrast to changes in the two remaining contexts: In the 20th-century corpus, agent-oriented obligation is the least favorable setting for haber de, whereas in the 19th century, the least favorable context is necessity.
It should be noted that the considerable magnitudes, indicated by the ranges, of the modality factor group in both periods can be explained mainly by the high association of haber de with nondeontic meanings, as previously stated. On the other hand, there is far less difference within the deontic field, especially in the 20th century, which confirms the pervasive extension of the emergent tener que in this modal use.
Summing up, it can be observed that by the middle of the 20th century, haber de has found a special refuge in nondeontic modal use, such as the expression of probability and conjecture, as well as in several expressive and emphatic usages. In contrast, in the far more common domain of deontic modality, the use of this periphrasis decreases considerably. It seems, thus, that the regression of haber de þ infinitive as a result of the pressure of its competitor tener que þ infinitive took a similar path to that followed in the lexical area. Tener first discarded haber as a possessive verb in the area of prototypical possession, that is to say, in those contexts in which the possessor was [þhuman, þagentive, þvolitive, þcontroller] and the possessed was [-human, -agentive, -volitive, -controller]. Little by little, tener reached the nonprototypical side of the meaning of possession. In a similar way, tener que þ infinitive beat haber de þ infinitive in most prototypical modal meanings, that is to say, deontic senses, whereas haber de þ infinitive maintained meanings connected to conjecture and probability. 8 However, there are also other structural factors that explain the process of variation and change we are dealing with here and which have gone practically unnoticed in the linguistic literature of Spanish. It is on these that we will focus in the following sections.

Sentential modality
As in the case of modal senses, both convergent and divergent tendencies are observed within this factor group from the diachronic perspective, as illustrated in Figure 3.
Clearly, both in the 19th and the 20th centuries, nondeclarative sentences (mainly interrogatives and exclamatives) are associated with haber de (20th c.: 50%; 19th c.: 91%). Notable, however, is the very low number of these syntactic contexts (19th c.: n = 11; 20th c.: n = 22) compared with declarative sentences (19th c.: n = 213; 20th c.: n = 961)-not to mention the noteworthy interaction with other factors such as semantic modality. Indeed, the cross-tabulation between these two factor groups shows that a majority of nondeclarative sentences contain periphrases with nondeontic modal meanings, that is, those that are very highly associated with haber de, as already discussed. In the end, this explains why we have focused the multivariate analysis exclusively on the two groups of declarative sentences: affirmatives and negatives.
Unlike semantic modality, this factor group is now only selected in the 19th century, where negative contexts strongly disfavored the selection of haber de (.18; 17%) (on the role of negation in grammaticalization, see Givón, 1979;Poplack & Dion, 2009:575;Tagliamonte, Durham, & Smith, 2014;Torres Cacoullos & Walker, 2009). Nevertheless, differences between affirmative and negative polarity are entirely lost in the 20th century, when the frequency differences of haber de in these contexts are completely neutralized.

Tense and mood
The explanatory power of tense and mood, unlike that of sentential modality, is maintained in the 20th century. Haber de manages to hold its own in the more favoring conjugation paradigms already observed in the previous century, although now with a lower magnitude. This is what happens in the case of both the present (19th c.: .64, 58%; 20th c.: .56; 25%) and, to a lesser extent, the imperfect indicative (19th c.: .56, 52%; 20th c.: .50, 21%), the two most favorable contexts for the periphrasis in the 19th century, though with considerably higher frequencies at that time. Among the remaining tense/mood paradigms, some cases of almost categorical avoidance of the use of haber de can be observed, for instance with nonfinite forms of the auxiliary verb (infinitive, gerund, past participle) (19th c.: 8%, n = 1; 20th c.: 2%, n = 1), as well as compound tense forms (haber þ past participle) such as the present perfect indicative, with no tokens in the 19th century and only 1 (of 37) in the 20th century. It appears likely that the reason for this avoidance of haber de in combination with the present perfect could be a stylistic one, such as the speakers' reluctance to repeat the same verb twice in a single auxiliary slot (he habido de þ infinitive).
The most important change in this factor group affects the future indicative, whose role in the selection of haber de changes significantly from one century to the next. From a moderately unfavorable position in the 19th century (.40), this tense becomes the most favorable context for haber de in the 20th-century corpus (.61), exceeding even the present or imperfect indicative, traditionally the most favorable environments for this periphrasis. One possible hypothesis for this change would be that future uses were connected with the tendency of the periphrasis to be employed in epistemic contexts, given the specialization of this indicative tense in expressing probability and conjectural meanings.
In any event, what can be seen in the 20th-century data is that the future indicative emerges as the strongest factor within this partial reorganization of the variable context. In sum, while frequency drops, haber de becomes favored in a context that previously disfavored its use.

Subject person/number and degree of animacy
Another factor group selected in both centuries is the grammatical person and number of the periphrasis. Table 5 shows the usage frequencies of haber de, combining three person and two number paradigms, as well as in the nonfinite forms of the main verb (infinitive, gerund, and past participle).
The table shows that in the 20th century, the third-person verb forms, both in the singular (29%) and even more so in the plural (39%), seem to be most likely to trigger the selection of haber de. This tendency is already visible in the 19thcentury corpus, although at that time with far higher frequencies. Diametrically opposed to this, nonfinite verb forms have a clearly unfavorable effect on use of the traditional periphrasis (2%), a result also observed in the 19th century (8%). The other person/number paradigms are situated in between these two poles, with a substantial decline in frequency between the 19th and the 20th century (in particular with the first-person plural).
The multivariate analysis confirms the relevance of this factor in both centuries: third-person verb forms continue to be the most favorable for the selection of haber de (.58), when compared to all other contexts (.44). This result relates to the fact that the third person is not the personal paradigm that is most prototypically connected to obligation. Although our interpretation of the deontic modality is a broader one, as emphasized previously (see note 5), it should be recognized that some directive

T H E R E F U G E O F A D Y I N G VA R I A N T W I T H I N T H E G R A M M A R
speech acts, such as orders and the like, cannot strictly be given to a third person. This probably explains why in all the deontic senses considered, third persons show systematically lower uses of haber de than other persons do. Conversely, in the nondeontic senses, the proportions are inverted: the uses of haber de are double (67%; n = 31) those of tener que (33%; n = 15) in the third person, surpassing the figures of the other personal paradigms (haber: 57%, n = 8 vs. tener: 43%, n = 6).
The continuity of the interaction between third person and the degree of animacy is another revealing fact. According to some authors, in Old Spanish constructions with the verb haber seem to have shared a common deagentivizing factor. As stated by Stengaard (2003Stengaard ( :1151: ". . . by means of the periphrasis with aver, the subject of the action expressed by the infinitive either loses its possible role as the subjectagent or reinforces its role as subject-recipient or patient involved in the verbal action in question" (our translation). This semantic effect is related to the meaning of the verb haber, which originally implied nonagentive or receptive possession in which the subject does not exert any control over the possessed object, unlike the verb it competes with: tener (Seifert, 1930). From a cognitive perspective, Garachana (1997) explained this opposition in terms of the prototypicality of the possession, according to which haber experienced semantic bleaching of the figurative control over the possessed object, while this process did not reach tener (see also Garachana & Rosenmeyer, 2011).
In order to verify these hypotheses, we analyzed the influence of animacy of the subject on the selection of haber de, with a distinction being drawn between human and nonhuman subjects of these third-person verb forms. 9 The data from this analysis confirms that the preference for haber de is somewhat greater with nonhuman subjects, making it one of the few contexts studied here in which this periphrasis reaches frequencies similar to those of tener que in the 20th-century corpus (.61; 50%). Conversely, these numbers are considerably lower with human third-person subjects (.44; 23%). Essentially, these figures are similar to those found in the 19th-century corpus, in which a greater association of  nonhuman subjects with haber de (.59; 73%) than among the human subjects (.46; 55%) can be identified.

Level of (im)personality of the sentences
A similar trend can also be observed in the area of semantic (im)personality, a factor group in which active sentences on the one hand contrast with passive and impersonal ones on the other. As illustrated in Table 4, it is among the latter, in which the subject is syntactically and/or semantically camouflaged, and consequently less prototypically connected to obligation, that the use of haber de is prevalent (77%), whereas only 20% of active sentences use this periphrasis in the 20th century. By contrast, the differences between these two contexts in the 19th century are considerably lower despite the almost categorical use of haber de in passive and impersonal sentences (90%). It should be noted that the overwhelming majority of tokens in the corpus occur in active constructions (95%; n = 944), and it is precisely these structures that disfavor haber de in both the 19th and 20th centuries. Indeed, even as haber de recedes, this factor increases its strength across the corpus, becoming the strongest constraint affecting the selection of this periphrasis in the 20th century.

Lexical aspect and lexicalization
The lexical aspect of the main verb is also a significant predictor on variant choice. In the analysis of this factor group, patterns of both continuity and divergence can be identified, similar to those discussed with reference to other factor groups (e.g., semantic and syntactic modality, polarity).
Some factors reflect a pattern of continuity. For instance, among the different semantic verb types, motion verbs are the least favorable contexts for the selection of haber de in both periods (19th c.: .30, 31%; 20th c.: .29, 7%). In this regard, it is interesting to note that in the 20th-century corpus some of the most frequent motion verbs-volver 'to return', venir 'to come', llevar 'to take', pasar 'to pass', traer 'to bring', salir 'to leave'-hardly ever appear with haber de. An exception to this rule is ir 'to go', as shown in Table 6.
The same hierarchy of constraints is also shared by stative verbs, which clearly favor haber de in the 19th century (.62), whereas a century later they have lost some of their favorable influence (.51). Yet some particular stative verbs still show an attachment to this receding periphrasis in the 20th-century corpus, with values that are much higher than the mean for the group. This is mainly the case for ser 'to be', which appears with haber de in 36 of 76 tokens (47%).
However, the main novelty within this factor group can be observed with the speech verbs, the group of verbs most clearly associated with haber de in the 20th century (.71; 31%). This association seems to derive from some kind of lexicalization of the receding periphrasis with specific lexical verb types, in a process that has also been described for other grammatical variables (cf. Poplack & Dion, 2009;Poplack & Malvar, 2007;Poplack & Tagliamonte, 2001). With this in mind, it is revealing that just over half of the top 15 most frequent verbs

T H E R E F U G E O F A D Y I N G VA R I A N T W I T H I N T H E G R A M M A R
appearing with haber de in the 20th-century corpus belong to this group of verbs (see Table 6). In descending order of frequency, these verbs are decir 'to say', saber 'to know', confesar 'to confess', reconocer 'to recognize', manifestar 'to express', escribir 'to write', agradecer 'to thank', and juzgar 'to judge'. Especially relevant is decir 'to say', which was already one of the verbs co-occurring most frequently with haber de in the 19th-century data. However, its relative frequency has increased considerably, rising from 13th place in the 19th century to 2nd place a century later. On the other hand, a more detailed analysis of these verbs shows that many of their periphrastic uses appear in what can be called "phatic" contexts. In these periphrases, which are close to fossilized lexical expressions and which are common within the epistolary genre, speakers use this type of verb to "enter into communication" with the addressee or to divide the pieces of information in small parts (Gómez Torrego, 1999:3354), as illustrated in (15) and (16). (15) Por lo que a nuestro querido y llorado José María se refiere, he de decirles que lo he tenido muy presente en la Santa Misa. (Cartas de dos hermanos requetés, 5-4-1937). 'As far as our beloved and lamented José María is concerned, I must tell you that I bore him very much in mind in the Holy Mass.' (16) . . . pues has de saber que el día 17 Adonis me escribió diciéndome que tenía libre y si quería o podía que le saliera en Vigo . . . (Una familia y un océano, 8-1, 1961) '. . . because you must know that on the 17th Adonis wrote to me saying that he had time off and if I was able or wanted to I could go to pick him up in Vigo . . . ' Table 6 shows that haber de co-occurs particularly frequently with a small number of main verbs, such as ser 'to be', ir 'to go', dar 'to give' and estar 'to be', which indicates that a process of specialization has taken place. As noted by  Elsig (2009:19), a receding variant's lexicalization in some restricted lexical contexts may be related to the loss of productivity elsewhere. This is supported by the fact that, in our case at least, lexical specialization takes place, primarily, in a specific set of grammatical settings. Thus, of the 36 tokens of haber de ser, 26 (72%) occur with the third-person present indicative, that is, ha de ser. With decir 'to say', on the other hand, this periphrasis is found primarily in firstperson singular present indicative contexts (54%; n = 7), that is, he de decir.

Clause type
In spite of a lower explanatory power than the factor groups already discussed, the selection of haber de also seems to be affected by the type of clause that the verbal periphrases appear in, both in the 19th century (range 15) and 20th century (range 10). Haber de was weakly favored in both centuries in subordinate syntactic settings, as opposed to nonsubordinate contexts that exert a slight disfavoring effect. This constraint is thus stable across time, supporting the retentive role of subordination in processes of language change noted elsewhere (Blas Arroyo, 2008;Matsuda, 1993;Tarallo, 1989; but some counterexamples in Tagliamonte et al., 2014;Torres Cacoullos & Walker, 2009).

D I S C U S S I O N
By the mid-20th century, use of the periphrasis haber de had undergone a striking decline in comparison with previous centuries. This change took place both in the domain of modality, where this periphrasis had been leading for centuries, as well as in the future temporal domain, in alternation with other verbal variants, such as the synthetic and periphrastic future forms. This decrease is particularly visible when comparing usage to early modern Spanish, but also in comparison to the 19th century, in which haber de þ infinitive was still very much alive. Moreover, an analysis of the diachronic axis has confirmed the continuing downward progression of the periphrasis throughout the first six decades of the 20th century. Judging by more contemporary studies, this decline seems to have continued where haber de appears to have been relegated primarily to some formal registers of the written language (Fernández de Castro, 1999;García Fernández, 2006;Gómez Torrego, 1999), as well as a few dialectal uses in some bilingual areas as a consequence of language contact (Blas Sinner, 2003). The loss of prominence of what had been the dominant periphrasis since the Middle Ages occurs in virtually all the linguistic contexts analyzed, resulting in replacement by its competitor, tener que, as the prevalent form nearly across the board. This change does not only affect frequencies but also the relative ranking of the factor groups selected across time. As can be seen in Table 7, while some groups keep a similar rank in each period (i.e., person/number, type of clause), the relevance of others clearly changes from one century to another. Thus, if in the 19th-century modal meanings appear as the most significant predictor in T H E R E F U G E O F A D Y I N G VA R I A N T W I T H I N T H E G R A M M A R variation, this role is occupied by the level of (im)personality in the 20th-century data. On the other hand, both agentivity and the lexical aspect of main verbs take on a more relevant role in the 20th century than in the previous one. Another change concerns the sentential modality group, with negative clausal polarity disfavoring haber de in the 19th century, but the difference between affirmative and negative contexts vanishing in the 20th century.
However, despite all these differences in frequencies as well as in the ordering of factor groups, the data of this study also show continuity with the underlying grammar. Thus, it is of particular interest that most of the factor groups selected as predictors in the 19th century remain significant in the 20th century, thereby confirming the principle of persistence that characterizes many processes of grammaticalization (Hopper & Traugott, 2003). This is the case for semantic modality, tense, person, agentivity, level of (im)personality, lexical aspect, and clause type. And no less significant is the fact that those factor groups not selected by the statistical model in the 19th century (lexical priming, phonological context, length of the periphrasis, contextual modalization, syntax of the main verb, syntax of the subject) remain the same in the following period.
Moreover, a similar hierarchy of constraints is observed in both centuries in at least four of the factor groups selected across time. In other words, there is remarkable consistency in the operation of the variable grammar throughout obsolescence. For example, we have seen that third-person verb forms, both singular and plural, are the most favorable context for haber de, ahead of the remaining finite and particularly the nonfinite forms, the latter occurring almost categorically with tener que in both the periods examined here. Similar patterns in both centuries have also been identified for the factor groups clause type (with subordinate clauses more likely to contain haber de), agentivity (confirming the preference for haber de with nonhuman subject, as noted in the literature), and the level of (im)personality (maintaining a historical association of passive/impersonal sentences with haber de in the first half of 20th century).
By contrast, in the remaining factor groups, both convergent and divergent tendencies can be observed in the hierarchy of constraints; a general trend of continuity in the direction of change is broken by some specific factors whose figures change markedly when compared to those factors in the previous century. One such case is the factor group tense/mood. Although the present indicative and, to a lesser extent, the imperfect indicative continue to have a favorable effect on the choice of haber de in the 20th century, it is the future (indicative) that now shows the clearest association with haber de. At the same time, these indicative forms run counter to the continuous development of most of the remaining tense/mood paradigms, which clearly disfavor the receding periphrasis haber de in the 19th and 20th centuries. The lexical aspect of the main verbs also shows a degree of continuity as a factor group, with stative verbs constituting a particularly favorable context for the selection of haber de, whereas motion verbs have a negative effect on the selection of this periphrasis in both periods. However, within the same factor group we also find an important change in the relative ranking of one specific factor, namely speech verbs, which in the 20th century are almost twice as likely to appear with haber de as in the 19th century. This is mainly the result of lexicalization of the use of these verbs with haber de, a process that can, to some extent, already be observed in the 19th century, but that sharply accelerates in the subsequent period examined here. Last but not least, similar patterns can be identified when analyzing semantic modality, the factor group that has almost monopolized the debate about this syntactic variable in the literature. Indeed, in both centuries modality appears as one of the most relevant factor groups, with the scarce cases of nondeontic modality being the most favorable environment for haber de, followed by internal obligations imposed by the subject on him or herself due to inner convictions. Conversely, the influence of obligations resulting from necessity or advisability differs between the two centuries examined. Although these types of deontic meaning are clearly the least favorable modal settings for haber de in the 19th century, this is no longer the case in the 20th century. Instead, the lowest proportion of periphrases with haber de is now found in the domain of external obligations (imposed from the outside by forces or circumstances beyond the subject's control). As external obligation is by far the most frequently expressed type of modal meaning in the corpus (both with haber de and tener que) and diffusion of the emergent variant tener que is particularly strong in this frequent environment, our data confirms the often-observed relationship between frequency and grammaticalization (Bybee, 2003). We have observed that, in the 20th century, haber de takes refuge in some very specific and restricted areas of the grammar and lexicon while its frequency of use drops dramatically in all other contexts. This is the case, for instance, for nondeontic modality, expressing notions such as conjecture, as well as expressive meanings such as surprise, indignation, recrimination, etc. In the 20th century, these types of modality, of which there are very few tokens in the corpus, seem to be acting as barriers to tener que taking over entirely in the field of modality. The same pattern can be observed in other comparatively lowfrequency areas of the grammatical system, such as nonhuman third-person subjects, or, even more so, in passive/impersonal and nondeclarative sentences, where haber de remains entrenched. Complementing these cases, we have seen several instances of lexicalization, that is, the preferred use of this periphrasis with a small number of main verbs, such as ser and estar 'to be', dar 'to give', and ir 'to go', as well as certain speech verbs (decir 'to say', saber 'to know', confesar 'to confess', reconocer 'to recognize', etc.), mainly when used in a phatic sense, and in some specific grammatical contexts.
Summing up, many of the contexts that are still favorable to haber de by the mid-20th century belong to areas of the grammar that are comparatively infrequent and/ or marked. By contrast, the more frequent/unmarked contexts lead to a significant reduction in the use of the receding periphrasis, in a process that, moreover, accelerates over the course of the century. External obligation, declarative and active sentences, as well as human subjects are among the increasingly favorable environments for the now-dominant periphrasis tener que, paving the way for its generalization.
Other cases of grammatical change in Spanish, in which the use of an older variant becomes increasingly restricted to some specific contexts, while a newer alternative variant replaces it in its more traditional functions, have been discussed in the literature (e.g., Aaron, 2006;Company, 2003;Klein-Andreu, 1991;Torres Cacoullos, 2008). Similar patterns have also been identified for the diachronic development of modal periphrases in Spanish during other historical periods. Thus, in previous studies we have shown that, over the past five centuries, deber de þ infinitive has also been favored over the alternative construction without the preposition (deber þ infinitive) in more marked (and usually less frequent) contexts (Blas Arroyo & Porcar, in press; Blas Arroyo & Vellón, 2014). This is the case, for instance, for epistemic-conjectural meanings, for which modal periphrases are used far less commonly than for the expression of deontic modality as already observed. The same is true for negative clauses, which consistently favor the use of deber de þ inf. at all times, whereas affirmative clauses, which are far more frequent, are a more favorable environment for deber þ inf. Emphasis (which can involve different intensification strategies on the speaker's part) also favors the use of the older variant, deber de þ infinitive.
From the theoretical perspective of grammaticalization, it has been noted that linguistic change is not abrupt, but comes about through a series of small adjustments affecting both the emerging and the declining variants. For instance, referring to the evolution of future temporal reference in Portuguese over the past centuries, Poplack (2011:219) noted that: "This change was driven by the gradual expropriation by the incoming P[eriphrastic]F[uture] of the preferred contexts of the older layers, culminating in the contemporary situation in which PF has become the default choice everywhere but in the remaining few bastions of P [the (futurate) present]." Similar patterns can also be observed in this study, where the receding variant for the expression of modality, haber de, becomes entrenched in some specific and restricted contexts, at the same time as the emergent variant, tener que, is gaining a strong foothold in other, more frequent settings, such as affirmative clauses, the expression of agent-oriented obligations, as well as active clauses, especially those with humans subjects.

110
However, our case study also shows some differences to the grammaticalization processes observed, for instance, in the evolution of the way that the future is expressed in several Romance languages (Poplack & Dion, 2009;Poplack & Malvar, 2007;Poplack & Turpin, 1999). In these studies, constraint hierarchies associated with each variant rarely remain constant across time, and relevant factors gain and lose importance as variants grammaticalized. In our case, although some significant changes in the importance and the relative ranking of specific constraints have been identified, it cannot be ignored that there are certain clear similarities between the two periods examined in this study. In fact, many of the factor groups selected as significant-and not significant-in both periods remain the same. This raises the question why a higher degree of persistence can be observed in the grammaticalization of the system of modal periphrases in Spanish than in the above-mentioned studies on the future.
The answer to this question is not an easy one, and there may not be a single reason, but rather a set of factors. For instance, it cannot be ruled out that the binary nature of the variable (i.e., the fact that we compare only two competing constructions) causes the results to differ from studies examining the choice between three or more different forms, as is frequently the case in studies of the future tenses. At the same time, it should be noted that the perspective of our analysis also differs significantly from those studies, in the sense that we focus our attention on the outgoing variant (haber de) and not on the incoming one (tener que), as is often the case. It should be taken into account that this change of perspective could have repercussions on analysis and interpretation.
However, despite all these cautions, it has also been observed that old distribution patterns may persist, even into the most advanced stages of grammaticalization (Poplack, 2011:223). This has been seen, for instance, in the distribution of must in the modal system of some dialects of English (Tagliamonte & Smith, 2006:372), except that this form may never have been firmly established in these regional varieties, whereas haber de was the star of the Spanish modal system over centuries. In any event, we must not forget that the period of the 20th century analyzed in this study coincides precisely with the beginning of the sharp increase in the use of tener que, which replaces its previously dominant competitor in environments in which it was previously absent. As already discussed, this change accelerated as the 20th century progressed, gaining momentum just around the time that the period covered by our corpus ends. Our picture of the substitution process is thus incomplete, and more recent data will be needed to gain insights into how the development continues. Nevertheless, there is still a possibility that cannot be excluded: both periphrastic constructions had already grammaticalized by the time considered in this study, so what we are seeing here is a mere change in their distributions. 10 Finally, we would like to highlight the benefits of using a corpus consisting of epistolary and autobiographical texts closely reflecting the spoken language. Indeed, it is possible that certain variation phenomena may vary significantly depending on the genre and text type analyzed. In this regard, elsewhere we have already suggested that certain differences in the distribution of the prepositional and nonprepositional variants of deber (de) þ inf. observed in different studies published recently might have be due to the different types of documents the data was drawn from (cf. Balasch, 2008(cf. Balasch, , 2012Blas Arroyo & Porcar, in press). Likewise, our data on the variation between haber de and tener que in the 19th century differs markedly from some previous studies in which the surprising prominence of haber de (see Figure 1) may well be related to the text types analyzed whose formal nature contrast with those that we have analyzed in these pages (López Izquierdo, 2008;.

N O T E S
1. Within parentheses is the title and date of the document from which the respective fragments are taken (see Appendix).
2. This study is part of the "Variación y cambio lingüístico a través de textos de inmediatez comunicativa: Un proyecto de sociolingüística histórica [Linguistic variation and change through texts of linguistic immediacy: An historical sociolinguistic research project]," funded by the University Jaume I (Ref. P1·1B2013-01) and the Spanish Ministry of Science and Technology and carried (Ref. FFI2013-44614-P) by a research team led by the first author (2013)(2014)(2015)(2016).
3. Unlike the original table provided by López Izquierdo, here we have not included occurrences of the impersonal periphrasis haber que, because its impersonality does not allow it to alternate with other modal periphrases in the same contexts.
4. We have included 15 occurrences of tener with the preposition de as a complementizer (tener de þ infinitive). Despite its notable vitality in the past (Blas Arroyo & González, 2014;Yllera, 1980), usage of tener de in modern Spanish is now limited to some very specific dialects.
5. In line with Martínez Díaz (2008:1285, we understand obligative modality in a broad sense, "as an expression of the subjectivity of the utterance," which implies that the syntactic subject of a clause does not necessarily have to be identical with the speaker. Otherwise, we could consider only sentences with a first-person singular subject for some modal categories, such as moral, internally motivated obligation. 6. In our case, the task has not been simple either. In any event, each of the examples was encoded by both authors independently. At a later stage, all cases in which there was any discrepancy (,10%) were reviewed jointly or submitted to a third party for evaluation, with the aim of reaching a decision about those tokens on which the authors still could not agree.
7. Other common cases of agent-oriented or external obligation are those of commanding or ordering someone to perform actions (Has de/tienes que entregar este documento 'You have to deliver this document'), as well as those characterized by a sense of inevitability in which the idea expressed by the verb is felt to be so certain that its occurrence is considered necessary and inevitable (Todos hemos de/tenemos que morir algún día 'We all have to die some day'). 8. We are grateful to an anonymous reviewer for making this point. 9. We exclude all tokens with first-and second-person verb forms because their subjects are unavoidably human (or viewed as human-like in the respective context). 10. We are grateful to an anonymous reviewer for making this point.