French de and en as expressions of the genitive case: a unified analysis within LFG and computational implementation in XLE

ABSTRACT The French clitic pro-form en represents a wide range of heterogeneous constituents: de-PP complements and adjuncts, partitive objects, and prepositionless objects of cardinals. The main goal of this paper is to formalize this relationship computationally in terms of genitive case. This is apparently the first non-transformational counterpart to Kayne (1975)’s unified analysis, which derives en from a deep structure with de by means of syntactic transformations. Transformational grammars are problematic from the parsing perspective. In order to test our analysis automatically on a large amount of data, we implemented it in a computational grammar of French in the Lexical-Functional Grammar (LFG) formalism using the XLE system. This non-transformational framework is particularly fit for expressing systematic relationships between heterogeneous structures and has successfully been used for the implementation of natural language grammars since the 1980s. We tested the implementation on 320 grammatical sentences and on an equal number of ungrammatical examples. It analyzed all grammatical examples and blocked almost 95% of the ungrammatical ones, showing a high empirical adequacy of the grammar.


Introduction
There is a striking parallelism in French between forms de and en, see (1)-(8), where the constituent containing de in (a) is anaphorically substituted for by the pronominal clitic en in (b).
(1) a. La population dépend [de la forêt]. 4 the population depends DE the:F.SG forest 'The population depends on the forest.' b. La population en=dépend.
the population EN=depends 'The population depends on it.' (2) a. This article pursues two goals. First, we propose a formal account of the relationship between these two forms within Lexical-Functional Grammar (henceforth LFG), a framework particularly fit for expressing systematic relationships between heterogeneous structures (Bresnan, 2001). Its adequacy for implementing computational grammars of natural languages has been continually demonstrated for over the past 35 years (cf. Müller, 2018, p. 219-220). Second, we implement the proposed analysis computationally in the Xerox Linguistic Environment (XLE) 5 as an extension of FrGramm, an LFG grammar fragment of French developed in this system (Schwarze & Alencar, 2016;Alencar, 2017). A computational implementation enables us to check automatically a particular approach to a grammatical phenomenon for empirical validity on a large amount of data.
To our knowledge, our proposal is the first non-transformational unified analysis of de and en. It distinguishes itself from previous LFG approaches in a two-fold way. First, it explains en-pronominalization of a wide range of heterogeneous constituents in terms of a single 5. http://ling.uni-konstanz.de/pages/xle/doc/xle_toc.html common feature, namely, genitive case. Second, it postulates a single representation for both items. It is a lexicalist counterpart to Kayne (1975)'s transformational analysis, which uniformly relates diverse uses of en to a single deep structure representation with the preposition de.
The next section presents the basic facts to be modeled. Section 3 then outlines the theoretical framework. After a review of previous directly related approaches in Section 4, Section 5 details the formalization of our analysis. Section 6 deals with the implementation methodology and evaluation results. In the last section we summarize the main conclusions and point out directions for further research.

A closer look at the relationship between de and en
De is a highly ambiguous form. In (1)-(5), it satisfies a subcategorization requirement of a verbal, adjectival, nominal, and numeral head, while in (6) and (7) it introduces an adjunct to a noun in subject and object position, respectively. In (8), however, it does not function as an independent syntactic word, but is instead an element of the multiword determiner de la in the partitive direct object. Diachronically, French partitive determiners du and de la derive from the preposition de and the singular masculine and feminine definite article, respectively (Carlier et al., 2013). The status of de in constructions of the type of (2) is controversial. For Frank (1996, p. 165), it is a semantic preposition. However, Carlier et al. (2013) show it to have semantically bleached (see Section 4). We follow this view here, treating it as a genitive marker in all constructions (1)-(7).
Other usages disallow pronominalization by en, e.g., (9)-(11). In (9), de is a semantic preposition heading a locative adjunct, while in (10) it introduces an infinitival complement pronominalizable by the accusative clitic le (Carlier et al., 2013). In (11), it heads a classificative PP (Fábregas, 2017), which does not denote an event participant capable of being pronominalized.  The systematic relationship between de and en can be described in terms of the shared functional properties in Table 1. Adjunct to a noun (6) and (7) (iv) Partitive direct object (8) Properties (i), (iii), and (iv) use familiar terminology. Property (ii), however, demands a detailed explanation. Quantified terms are expressions made up of a quantifying form (henceforth QForm) and its domain, e.g., five apples or a liter of milk. The QForm types in French that were implemented in our grammar are presented in Table 2. The canonical domain of a QForm is a determiner phrase (DP) or a prepositional phrase headed by de (henceforth de-PP). If the QForm is a cardinal up to 999999, the choice depends on the set designated by the domain. If it is determined, the domain is a de-PP, otherwise it is a DP, see (5) and (12), respectively (Milner, 1978apud Hulk, 1983 In direct object position, the domain can unconstrainedly be referred to by en. The pro-form is obligatory if the QForm is a cardinal, compare (17) with the pronominalized version of both (5a) and (12) in (5b). By contrast, the grammaticality of en as an OBJ of a QForm in preverbal subject position has generally been denied (Hulk, 1983;Kayne, 1975;Lagae, 1997), but positive evidence can be found with Google, e.g., (18), extracted from a French archeology journal. Although three native-speaker informants were unhappy with this example, we hypothesize that this configuration is optionally licensed for passive or unaccusative verbs and copulas, at least in formal language.

The theoretical framework: Lexical-Functional Grammar
Lexical-Functional Grammar (LFG) is a non-transformational generative model that strictly adheres to the Lexical Integrity Principle, only allowing transformations in the lexicon (Bresnan, 2001). It factors the syntactic analysis of a sentence into two distinct representation levels related by a projection function: f(unctional)-structure and c(onstituent)-structure. These are exemplified in Figure 1 and    These representations were automatically generated by XLE from the grammar in (20)-(26). Numerical indexing indicates the mapping between f-structure and c-structure, i.e., each c-structure node in Figure  2 is labeled with a number that labels the corresponding f-structure in Figure 1.
C-structure expresses precedence and part-whole relations between the constituents of a sentence. For example, the representation in Figure  2 shows that the sentence (S) is made up of a noun phrase (NP) and a verb phrase (VP). The latter phrase, in turn, consists of a verb (V) and an NP made up of a determiner (D) and a noun (N).
F-structure encodes morphosyntactic aspects like agreement, tense, subcategorization, etc. in a format of features. A feature consists of an attribute and a value, e.g., GEND(er)=F, NUM(ber)=SG, and SPEC(ification)=PART(itive) in f-structure 46 of Figure 1 state that de la glace is feminine, singular, and partitive. 6 An attribute must not have divergent values, e.g., NUM=SG and NUM=PL(ural) in f-structure 1, if we substitute l'enfant 'the child' for les enfants 'the children' in (19). A value may be atomic, e.g., PL, or constitute a feature structure, as is the case with the value of grammatical functions, e.g., SUBJ (subject) and OBJ (object). The value of a PRED(icate) attribute is a semantic form, which includes the subcategorization frame of valence-bearing lexemes enclosed in angle brackets.
6. In the f-structures throughout the paper, category names with up to 5 characters (and a few longer names) are not abbreviated. Abbreviations for categories dealt with in the main text or in footnotes are explained as introduced. Otherwise, the following abbreviations are used: ACC=accusative, ATTRIB=attributive, ATYPE=adjective type, AUX=auxiliary For the sake of readability, we adapted XLE's syntax, as far as possible, to the traditional LFG notation. Throughout the paper, the ellipsis indicates omission of code irrelevant to the discussion, e.g., (23). Both constituency rules and lexicon entries are endowed with functional annotations, where "↓" refers to the feature structure of the node the annotation is attached to, while "↑" denotes the feature structure of its mother node. These annotations indicate the f-structures the c-structure nodes project to. A semicolon separates the annotations pertaining to a node from those of its sister category. 7 Functional annotations mostly have the form of equations in the form (f ATTRIBUTE)=VALUE, assigning VALUE to ATTRIBUTE of f, where f is a feature structure. For example, (↑MASS)=+ in (24) assigns MASS=+ to D, blocking an NP with a MASS=-(i.e., count) noun head.
7. In the traditional notation, annotations are written below the respective nodes, demanding much more space. 8. In XLE, ↑=↓, which identifies the f-structures of daughter and mother modes, is automatically attached to nodes deprived of further equations.
The rules in (20) define the c-structure and f-structure of S, VP, NP, and PP (prepositional phrase), using the categories D, V, P, and N from the lexicon. An annotation of the form (↑GF)=↓, where GF is a grammatical function, states that the constituent in question realizes function GF of the mother category. Thus, in the first rule, NP is the SUBJ of S. In the second rule, NP is the OBJ and PP is the oblique (OBL) of VP. These two categories are connected by the Boolean disjunction "|", marked as optional with #0#1 9 , thereby licensing VPs without any complements.
The completeness and coherence conditions ensure that all and only the governed grammatical functions listed in a predicate subcategorization frame are realized in the syntax. For example, the verb manger 'eat' requires a SUBJ and an OBJ, see (23).
Provided with additional entries, e.g., (26) and (27) 10 , this small grammar fragment is also capable of generating (1a), producing the representations in Figure 3 and  9. XP#m#n means from m to n repetitions of XP. 10. Both based on Schwarze (1996), see Section 4.2.
Entry (26) states that verb dépendre 'depend' subcategorizes for a SUBJ and an OBL, the prepositional case (PCASE) of which must be DE. This requirement is satisfied by entry (27). Notation "=c" in (26) represents a constraining equation. While defining equations with "=" set the value of an attribute, constraining equations require its value to be defined elsewhere, in the case at hand, by the lexical entry of the preposition.
The inflectional features of the verb are encoded by means of a template in the last line of (26). In XLE, templates are analogous to functions in a programming language, enabling code reuse, so that the same blocks of commands need not be written over and over again. Template definitions have the general form NAME=definition or NAME(P1 P2… Pn)=definition, where P1, P2, etc. are parameters. According to the definition in (28), V-INFL takes four parameters: T(ense), M(ood), P(erson), and N(umber). "@" is the template call operator, which has the following syntax: @NAME or @(NAME P1 P2… Pn). In the call to V-INFL in (26), the first two parameters are themselves template calls, see definitions in (29). When processing a grammar, XLE substitutes definitions for template calls, instantiating the parameters.

Previous approaches
There is a vast literature on Romance pronominal clitics, within different theoretical frameworks, e.g., Heap et al. (2017). In generative grammar, it seems that much less attention has been paid to grammatical prepositions such as de, despite the overall awareness that en is related to de-PPs. We limit ourselves here to what is immediately relevant to our own analysis.
Due to the historical connections between the different uses of de and en, we first summarize the study by Carlier et al. (2013) on the grammaticalization of these two elements from the Classical Latin period onwards. In the next subsection we mention some studies carried out in the framework of early generative grammar, before dealing with LFG analyses in 4.2.
According to Carlier et al. (2013), the Latin ablative preposition de, whose core meaning was spatial distancing from a source, underwent extension to other domains in the course of time, e.g., origin and lineage, extraction, partition, and inclusion. Two additional parallel developments led to the distribution of de and en in Modern French. First, the Latin genitive case, whose core function was linking two NPs in a possession relation, was progressively substituted for by de-PPs. Second, the pronoun en, derived from the Latin adverb inde 'from there', underwent successive bleaching in Medieval French and spread as a replacement of de-PPs with different non-spatial meanings, e.g., as the domain of a cardinal.

Leonel Figueiredo de Alencar, Christoph Schwarze
Carlier et al. (2013) show that the continuation of these developments produced a threefold result in Modern French. First, de lost semantic content. This, in turn, brought about major changes in its usage. For example, it may introduce the complement of verbs with opposite meanings such as s'approcher de 'to come closer to' and s'éloigner de 'to get further from' and is often used with other elements to reinforce the spatial meaning of a verb complement, see (30). While de retains its status as a semantic preposition in locative adjuncts, typically with perception verbs as in (9), it is reduced to a genitive marker of noun or verb arguments, see (1a). Second, the combination of de and definite article grammaticalized into a full-fledged partitive article, see (8a). Third, en fully desemantized and became a clitic proform for genitive objects and quantified direct objects (i.e., partitive objects), see (1b) and (8b).
(30) Elle revien-t du médecin / de chez le médecin she come-PRS-3SG of.the doctor / of at the doctor 'She comes from the doctor' (Carlier et al., 2013, p. 43, their translation and glosses) Two critical remarks to Carlier et al. (2013) are in order. First, they do not deal with de-PPs functioning as adjective complements, domain of a cardinal, or adjunct to a noun. Second, they treat genitive objects and quantified direct objects as two separate functions without any common link. We address these issues in our unified account in Section 5. Kayne (1975, p. 107-110) uniformly categorizes en as a de-PP pro-form even in cases where it does not correspond to an overt PP as in (12). He argues that this pro-form is derived from a deep structure representation with de by means of syntactic transformations.

Earlier generative research
Hulk (1983) opposes Kayne's unitary solution. For the quantitative construction in (12), she proposes an additional PRO-N' variant, i.e., a pro-form for the intermediary projection N-bar. She argues that this type of quantitative NP is derived from (31), where the Spec(ifier) is 37.1 2021 marked with +Q, i.e., it is a quantitative determiner. The corresponding constructions with en result from pronominalization of N', e.g., (5b).
According to Hulk,examples like (5b) are ambiguous, since they also have a "partitive" interpretation, corresponding to (5a). For these "partitive" NPs, she proposes (32), where a represents an empty N head. En pronominalizes the de-PP in this construction.
Hulk motivates the distinction between (31) and (32), among other evidence, with agreement facts, cf. (33) and (34), respectively. (33) is ungrammatical because Spec and head N do not agree in number. In (34), by contrast, Spec and head N need not agree, since an empty N is not marked for number. Note, however, that these two constituents must agree in gender, so that Hulk's analysis fails to predict the ungrammaticality of (35). In the 1990s, generative linguistics abandoned the transformational frameworks underlying these two approaches. As Klenk (2003, p. 78-80) shows, parsing with transformational grammars is difficult, if not impossible, especially in case of deletion transformations. Kayne's analysis, however, is still inspiring, in that it tries to capture the systematic relationship between en and de-PPs in a unified way.
In Section 5, we propose a unified lexicalist account of this relationship without resorting to transformations, while at the same time handling the agreement facts in (33)-(35) and also examples like (18), considered ungrammatical by Kayne and Hulk. Jones (1996) categorizes en as a pro-PP and proposes that the function of grammatical prepositions is to assign Case to an NP. This approach eliminates the need for deriving en from a deep structure with de, preparing the ground for a unified analysis in terms of case. However, Jones (1996) did not undertake such analysis, which we do in Section 5.

Previous LFG analyses
The grammar of French clitics has been a topic in LFG since the origins of the model. Grimshaw (1982) treats them as members of the clitic category (CL) expanding V to V'. The lexical entries proposed comprise case features. However, en and y are disregarded. Schwarze (1996) was one of the first lexical-functional accounts of the systematic relationship between en and de-PPs. He argues that the nonsemantic de has the same function as the genitive suffix in languages with morphological case like German. For constructions (1a) or (3a), he provides de with the feature (↑PCASE)=DE and assigns the same feature to en, see (27) and (36). Accordingly, entries for the corresponding predicators must contain a constraining equation requiring the oblique to have PCASE=DE, see (26), an adaptation of Schwarze's partial entry for parler 'to speak'. The proposed analysis of de-PPs, but not of en, was tested on an LFG parser. This approach, however, has some drawbacks. First, the notion of PCASE is inappropriate within the system of French pronominal clitics: subject and direct object clitics correspond to noun phrases without a PCASE, so that it seems more reasonable to establish distinctions based on traditional morphological cases, e.g., Heap et al. (2017, p. 189-193). Second, it treats en as three-way ambiguous, proposing two additional variants with (↑SPEC)=PARTITIV that only differ from one another in the grammatical function they perform, namely direct object and MOD(ifier) of a direct object. As we will show, such lexical ambiguity can be avoided. Third, the corresponding constructions with de, e.g., 37.1 (8a) and (4a), are not taken into account, nor is the use of both de and en as adnominal adjuncts in examples like (6) and (7). Frank (1996) is so far the most complete documentation on a large-coverage grammar of French in the LFG formalism. 11 Lexicon entries of key items and all syntactic rules are described in detail. The complete source code of the syntactic component is made available in an appendix.
Although Frank's analysis has a much wider scope than Schwarze (1996)'s, both share strong similarities in the treatment of the common subset of constructions with de and en. Remarkably, however, these two coetaneous studies do not cite each other. Frank proposes that heads governing obliques with the nonsemantic de, as in (1a) and (3a), subcategorize for a DE OBJ, where the attribute DE is provided by the PCASE value of the preposition. The clitic en is likewise assigned the feature PCASE=DE, so that the corresponding constructions with en can be analyzed, cf. (1b) and (3b).
Complex annotated c-structure rules handle a large range of different possibilities of clitic placement and of internal arrangement of clitic clusters, undoubtedly one of the most intricate aspects of French syntax, due to the variety of intervening factors. One of the complexities that are successfully dealt with is pronominal clitic climbing, e.g., (5b), where the pronoun moves from its canonical proclitic position in relation to its governing head, as in (8b), to the left-adjacency of the auxiliary. Causative faire 'make' and the copula behave similarly as far as pronoun climbing is concerned: pronominal clitics are attracted to the leftmost member in a series of verbs of this type, cf. (10) and (37)-(39) from our handcrafted corpus. Negation introduces a further complexity layer when combined with pronominal clitics: negative particles (e.g., ne and pas) also cliticize to the verb, either enclosing an individual pronoun or a pronominal cluster and its finite host or preceding both pronominal clitics and their infinitive host, cf. (37)-(39).   In LFG, control verbs subcategorize for an XCOMP, a predicative complement whose open SUBJ slot is filled by a grammatical function of the matrix clause (cf. Bresnan, 2001). In Frank's grammar, auxiliaries are control verbs, alongside copulae, causatives, modals, and aspectual verbs. Clitic climbing is licensed by control verbs lacking a complementizer, i.e., copulae, auxiliaries, and causatives. By contrast, control verbs with a complementizer, e.g., COMPL FORM=de in case of décider in (10), disallow climbing. Modals are assigned COMPL FORM=null, so that they are also unable to host climbed clitics, cf. (8b).
Frank proposes different constituency rules for generating clitic clusters with negation and up to two pronouns in the different varieties of finite and infinitive structures. Due to space limitations, the full details of the implementation cannot be presented here; we focus on the sentence type exemplified in (37). Disregarding the functional annotations for now, the complex formed by a clitic cluster and a finite verb in this construction is generated with the rules in (40)-(43), where the brackets-enclosed constituents are optional. The IP category consists of the I2 complex, formed by a finite verb and (optional) clitics, and zero or more complements. NEGAT and NEGP introduce negation ne and negative particle (e.g., pas), respectively. Analogous rules are proposed for the other types of structures containing verb clitics.  (45), which contains a disjunction with two alternatives, corresponding to local and non-local cliticization. The equation in the first disjunct states that the clitic is the DE OBJ of V, which is the case with (1b), while the equation in the second disjunct states that the clitic is the DE OBJ of an embedded VCOMP (a verbal XCOMP in her terminology), as in (37), where en is a complement of parler 'speak'. The other element of the second disjunct is a negative existential constraint specifying that the VCOMP governing the clitic have no COMPL ("¬" symbolizes negation). VCOMP+, where the plus sign symbolizes one or more instances of the preceding string, means that the VCOMP whose verb governs the clitic can be embedded in another VCOMP (which, in turn, can be nested in another VCOMP, and so on). Frank's grammar accounts for the parallel behavior of de and en in only two constructions of (1)-(8), i.e., as the oblique of a verb or adjective, cf. (1) and (3). The use of en in the other six construction types was not implemented. Examples like (2b) cannot be analyzed because, according to Frank, this verb type subcategorizes for a thematic oblique PP with PCASE=source, which is incompatible with the proposed entries and c-structure rules, cf. (40)-(45).
In constructions of the type in (4a), the PP is analyzed as a "partitive object" (PART OBJ), for which case a variant of de with (↑PCASE)=part is postulated. Only types (ii) and (iv) of Table 2 were implemented, i.e., measure names and collective numerals. According to Frank, the former require singular mass objects, while the latter require plural non-mass nouns. Again, there is no corresponding variant of en.
As regards (5), she categorizes cardinals as NUM. They are optionally generated in an NPDET projection between an optional DET (i.e., article) position and an obligatory NPMOD projection, which comprises head noun and modifiers, see (46) (5a) and (5b) cannot, since both constructions lack the noun head required under NPMOD, see (47)-(50). Genitive PPs such as (6a) and (7a) are generated as adjuncts under NPKOMPL (i.e., noun complements), see (47)-(48), and assigned the feature ROLE=obl_poss, provided by an additional semantic variant of de. Partitive determiners, see (8a), are encoded in the lexicon as indefinite determiners, an equation of the form (↑CLASS)=c mass ensures that they are only combined with mass nouns. Since Frank's grammar has no corresponding variants of en, it cannot analyze (6b)-(8b). Butt et al. (1999) report on the development of large-coverage parallel LFG grammars for English, French, and German. In the French grammar, the clitic y is assigned an f-structure with a PCASE=À feature similar to that of an adverbial à-PP. However, the source code is not publicly available and the implementation details are sparse, without information on how (if at all) examples like (1)-(8) are analyzed by the French grammar.
For Schwarze (2001), en is ambiguous between two functions: "Oblique" in cases like (1b) and "Partitive Modificator of the DIRECT OBJECT" in cases like (4b). By contrast, Schwarze (2012)  In sum, Frank (1996) is the most complete LFG implementation available of the systematic relationship between de and en, yet it covers 37.1 2021 only a small subset of the data in (1)-(8), which are accounted for by the proposed unified analysis, presented in the following section. Another issue deserving improvement in Frank's implementation is the proliferation of entries for en and de. Schwarze (2012) is a precursor to our present account, insofar as it abandons the PCASE-based analysis of en in favor of traditional case distinctions. However, it is little formalized and treats en as three-way ambiguous. By contrast, we propose an implemented (and thereby completely formalized) grammar fragment, which enables us to automatically test it on a large amount of data. This grammar has just one lexical representation for the clitic en, besides being able to handle the full range of uses in (1)-(8).

A unified account
In this section, we first show how the grammatical preposition de, clitic en, and partitive determiners should be represented in the lexicon in order to account for their systematic relationship. Then, in 5.1-5.3, we detail our analysis of properties (i)-(iii) in Table 1, exemplified by structures generated by the parser. We focus here on the c-structure rules for de-PPs, postponing corresponding rules for en to Section 5.4.
The basic idea of our approach is to assume a CASE=GEN(ITIVE) feature as a means to account for the functional relationship between de and en: In view of this analysis, entry (24) for the partitive determiner de la is unsatisfactory. Since it does not share any feature with (51), en-pronominalization seems fortuitous. To solve this problem, two options are available: (a) replacing the third line of (51) with {(↑CASE)=GEN|(↑SPEC)=PART}, stating that either the case is genitive or the specification is partitive, or (b) appending a genitive case feature to (24), whereby we obtain (53).
Alternative (a) has an undesirable side effect: it creates a potential source of parsing ambiguity, since each disjunct represents a different lexical variant of en. On the other hand, alternative (b) assumes a single lexical representation for en, so we consider it preferable.

Obliques
Oblique (OBL) complements are typically realized by PPs and fall into two subclasses depending on whether the preposition is semantic or nonsemantic, the latter assigning case to a PP, cf. Butt et al. (1999), Bresnan (2001), etc. OBLs with the nonsemantic de are genitive marked and thus pronominalizable by en. Figure 9-Figure 14 exemplify this analysis.     Entries for verbs and adjectives subcategorizing for a nonsemantic de-OBL follow the general pattern of (26), by substituting (↑OBL CASE)=c GEN for (↑OBL PCASE)=c DE. This constraint is satisfied by either en or a PP with the grammatical de. For generating examples with de-OBLs, we adapted Frank's c-structure rules in (40) to our genitive case analysis.

Adnominal adjuncts
In LFG, an ADJ(unct) is a non-subcategorized-for grammatical function (Bresnan, 2001). To generate examples with adnominal adjuncts headed by the nonsemantic de, we adapted Frank's NPstructure rules (47) and (48) to the DP-analysis, see provisional version in (54), to be revised in (59). Rule (54) states that an NP consists of an N head optionally followed by a PP performing one of two functions: ADJ, see (6a) and (7a), or OBJ, see (4a). In the second disjunct, a call to the OT-MARK template defined by King (2004) ensures that the PP is preferably analyzed as an OBJ if the noun subcategorizes for one, as it is the case with nominal QForms (see next section). This reduces parsing ambiguity. Figure 15-Figure 18 exemplify our analysis.

Quantified terms
This subsection details the analysis of quantified terms. QForm classes were implemented by means of templates. However, to unburden the reader, we present full entries here.
Following Mittendorf and Sadler (2005)'s proposal for Welsh, we represent French quantified terms like (55)-(57) as numeral phrases (NumPs). Entries for simple cardinals follow the general pattern of (58); complex cardinals, e.g., (13), were not implemented. The second line in (58) encodes subcategorization in form of a disjunction: the first disjunct states that the numeral requires an OBJ, as in (5) and (12), while the second allows for uses without an OBJ, which are restricted to non-accusative DPs (remember that "¬" represents negation). The third line specifies number. In French, all plural Num forms are underspecified for gender, only singular forms un and une manifest gender variation. The last line makes a call to the DEFAULT template (King, 2004), specifying that INDEF is the default value of SPEC. This can be overridden, e.g., by SPEC=DEM(onstrative), see (56).        (5b). CHECK features prevent overgeneration, but are not theoretically relevant (King, 2004).
The attentive reader may have noticed that the OBJ is marked with genitive case both in Figure 20 and Figure 24, although it is a bare NP in the former. The main motivation for this assumption is that both OBJs are pronominalizable by en, which we claim to be a proform for genitive-marked grammatical functions. Additional support comes from languages that mark the domain of a low-valued cardinal 37.1 2021 in constructions analogous to Figure 20 with a preposition (Welsh) or partitive (Finnish) or genitive case (Russian) (Corbett, 1978;Hurford, 2003). In French, this pattern is restricted to "nounier", higher-valued cardinals, see (13), but in Romanian de is required for cardinals from 20 upwards.
To generate nominals containing a NumP, we adapted Mittendorf and Sadler's DP-analysis of Welsh to the grammatical facts of French, see (59) and (60), a simplified version of the actually implemented rules. 12 The first rule states that a DP consists of an optional D followed by an NP or a NumP. The second rule states that a NumP consists of Num followed by either a PP or an NP functioning as OBJ, cf. (57) and (55). These two alternatives are represented in a disjunction. In the first disjunct, a constraining equation ensures that the preposition heading the PP be endowed with genitive case, i.e., it must be de. In the second disjunct, the second equation assigns genitive case to the OBJ and the last two handle head-complement agreement. The remaining QForm classes of Table 2 are nouns, which we also assume to have a one-place and a zero-place variant, however, the former seems not to be constrained to non-OBJ positions, see (61), where the domain (seafood referred to in the previous sentence) is left unexpressed. The templates modeling the different classes include both variants, but, to simplify, we limit ourselves here to the monovalent variants.
(61) Nous avons acheté un kilo que nous avons dégusté […] (Google) we have bought one kilo that we have enjoyed 12. The full rules handle, e.g., * The full rules handle, e.g., *un un livre 'an one book' vs. les/ces/leurs deux livres 'the/ these/their two books' or *trois de livres 'three of books'.
Collective numerals require the OBJ to be a plural count noun, see (62). The constraining equation in the last line enforces realization of the OBJ by a genitive-marked element, i.e., a PP with the nonsemantic de or the clitic en. There is no need to impose any constraints on the number of the OBJ of fraction names, since they allow both mass nouns in singular and count nouns in singular and plural, see (4a) and (63). Accordingly, in entry (64), the number of the OBJ is underspecified.  Entries for measure names follow the same pattern of (64), see (14) and (65). Whether a count noun OBJ of a measure name must be plural, as suggested by Jones (1996, p. 219), is a question we leave for further research.
(65) un kilo de pommes a kilo DE apples 'a kilo of apples'

Implementation of the climbed en
In this section, we detail the implementation of clitic climbing, which affects the genitive en and all other non-nominative clitics. Figure 30 exemplify the treatment of climbing to non-causative full verb hosts, for which we reimplemented the corresponding c-structure rules proposed by Frank (1996).       In case of auxiliary constructions, we also departed from Frank's approach. Instead of treating auxiliaries as verbs subcategorizing for a VCOMP, we preferred to implement them as items deprived of a PRED attribute. As such, they do not govern any grammatical function, only contributing to the f-structure of the sentence with morphosyntactic features like person, number, tense, etc. For the sake of computational efficiency, however, auxiliaries were assigned a special c-structure category Aux, instead of being differentiated from full verbs by means of features, which would demand higher processing costs (Butt et al., 1999). Analogously to Frank's I1 and I2 categories,see (40), the complexes formed by auxiliary, pronominal, and negation clitics were labeled Aux1 and Aux2, see Figure 31 and Figure 35.

Figure 29 and
The functional annotations for pronominal clitics follow the general pattern of (45), however, they were adapted to our analysis of en and de, as formulated in (51) and (52), and extended to cover all constructions (1)-(8).
In our implementation, we made extensive use of metacategories, a powerful resource of XLE that is not available in the formalism in which Frank's grammar was implemented. A metacategory is a variable for one or more c-structure non-terminal nodes with the respective annotations. A metacategory can be non-recursively used in the definition of another metacategory. Templates can also be used in these definitions. Analogously to macros in programming languages, metacategories, like templates, allow for code abstraction, enhancing readability and maintainability of a computational grammar. Thanks to metacategories, it was not necessary to stipulate different positional variants for en and the other pronominal clitics, as Frank does.
Generalizing (45), restricted to OBLs with the nonsemantic de, we defined the template (67), which can be used with any governed grammatical function F given as parameter, see (69)-(71). The constraint on clitic climbing excluding an XCOMP with a complementizer form (CFORM) is encapsulated in template (68). In (69) and (70), we make use of template (67) to define metacategories CL-IO and CL-OBL for dative indirect object (OBJ2) and genitive OBL clitics, respectively, see (72). An OBJ clitic can bear either genitive or accusative case, depending on whether it is a partitive object or not, so we defined template (71) for clitics performing this function, where parameter C is the case of the clitic. This template, in turn, is used to define templates (73) and (74) for accusative-marked and genitive-marked clitic OBJs, respectively. The latter is assigned SPEC=PART, so that it coheres in specification with partitive DP-objects, see Figure 6 and The constraints on en as ADJ(unct) to a clausal SUBJ or OBJ, see (6b) and (7b), are encoded in (76) by means of a disjunction. The constraint equation in the first disjunct requires feature EN to have a positive value. This is provided by verbs enabling en in subject position, e.g., être 'be', whose entries have (↑EN)=+. Analogously, to capture the use of en as OBJ of a QForm in OBJ or SUBJ position, cf. (4b), (5b), and (18), the metacategory CL-QFORM-DOM is defined following the general pattern in (67).
All functions of en are collapsed into the metacategory (77), which is used in (78), together with metacategories for the other clitic types, to disjunctively represent all possible clusters of genitive and thirdperson accusative and dative clitics. This, in turn, enables us to formalize in a single rule the optional attachment of a varied range of pronominal clitics to a verbal head, as shown in (79). (79) I1 → (CL-PRON) V.

Implementation methodology and evaluation
An LFG grammar is a declarative model of a language fragment, encoding constraints at different levels. Due to the complexity of these constraints, the implementation of a non-trivial computational grammar fragment must be an incremental process. One starts with a very small grammar capable of analyzing simple examples and progressively extends it to cover an increasingly larger subset of the phenomena to be modeled. These successively more complex fragments must be tested not only on grammatical sentences, but also on examples that violate the postulated constraints. These two data types are labeled positive test set and negative test set. Thanks to the declarative nature of the formalism, an LFG grammar can be used for both analysis and generation. These two dimensions can be evaluated by the positive test set and the negative test set, respectively. 13 In our case, we did not have to start the grammar development from scratch. Two previous works were available to start from. On the one hand, we could reuse large portions of code from FrGramm, which covers basic French syntax phenomena, although it cannot handle nonsubject pronominal clitics, partitive DPs, and quantified expressions. On the other hand, Frank's grammar already handles the relationship between de and en in (1) and (3) and can analyze constructions (2a), (4a) and (6a)-(8a), so we could reimplement the corresponding c-structure rules and annotations in XLE and extend them to cover the other constructions. The implementation of causative faire, however, 13. On the development and testing of LFG grammars with XLE, see Butt On the development and testing of LFG grammars with XLE, see Butt et al. (1999).
demanded an extra effort, since Frank (1996) only handles a small subset of these constructions.
The final grammar fragment was tested on a positive test set with 320 grammatical sentences and on a negative test set with an equal number of ungrammatical examples. Figure 37 presents the results. The positive test includes all grammatical examples of this paper, except for (13) and (30). 14 As Figure 37 shows, all sentences received at least one parse, i.e., a valid f-structure according to the grammar. 65,9% received exactly one parse and 24.1% exactly two. Only 10% of the sentences were assigned between 3 and 8 parses. Thus, the grammar is lowly ambiguous, which is a desirable feature from a natural language perspective. In Figure 38, we have the parsing results for three sentences from the positive test set treated as ambiguous by the parser. Ambiguity arises lexically or structurally. The preposition de exemplifies the first type. For example, (6a) is assigned two valid f-structures, although only the one with the nonsemantic de seems plausible. The second type results from the functional ambiguity of en, which can realize either a complement or an adjunct, both in object and subject position. Since many complements are facultative, sentences such as (80) are assigned two valid f-structures in addition to the one where en functions as complement of adjective plein 'full'. In the other two less preferred f-structures, en functions as a complement and an adjunct of the QForm sac 'bag', respectively. Likewise, (10) receives an additional parse where en is an adjunct to the clitic object. Such reading is clearly spurious, because pronominal clitics cannot be modified by adjuncts, showing the need to further constrain the annotations of (76). The negative test set includes all invalid constructions exemplified in this paper. It was built by systematically injecting errors into sentences of the positive test set. For example, (82) derives from (81) by pluralizing the noun, thus getting the determiner-noun agreement wrong, cf. (33). she=EN=must buy Figure 39 shows the parsing results for analogous ungrammatical variants of example (38), all of which violate ordering constraints between clitics, negation, and modal verb. The negative test results show the grammar is highly constrained, since it blocks 94,7% of the ungrammatical examples. One of the 17 false positives is (84). This example is assigned a valid f-structure because en is analyzed as an adjunct of the subject clitic, to which en cannot cliticize. There are 6 other similar examples, showing the need to fine-tune the metacategory definition in (76). The other 10 false positives involve the structural ambiguity of en and/or the lexical ambiguity of prepositions, as in (85) and (86). In the former example, en is an adjunct of chamber 'room', which does not seem to make sense. The latter example is assigned a valid (though nonsensical) f-structure where de is a semantic preposition introducing an adjunct. Summing up, the grammar can be said to be empirically valid, inasmuch as it was tested on a large amount of data. On the one hand, it analyses all 320 grammatical examples and, on the other hand, blocks most of the 320 ungrammatical examples. As a fragment, however, the grammar probably still has gaps that testing on more data could reveal.

Conclusion
We reported on an LFG implementation of French en and de in a wide range of constructions. Previous LFG approaches only cover a small subset of these structures and handle their heterogeneity by means of lexical ambiguity. Instead, we proposed a single variant for each of the involved elements, whose entries are linked by the genitive case.
Our proposal is a lexicalist reformulation of the main insight behind Kayne (1975)'s analysis, namely that de and en represent a single abstract category. There are, however, two important differences. For Kayne, en is derived from a deep structure with de by means of syntactic transformations, which are problematic form the parsing perspective. By contrast, we claim that both en and de map to a genitive feature in f-structure, dispensing with any transformations. The second difference is that our grammar licenses a QForm in preverbal subject position governing an en OBJ, see (18). This construction does occur in real texts, so that a parser should be able to analyze it. However, Kayne (1975) considers it ungrammatical, in which he was followed by the subsequent literature.
The other reviewed approaches abandoned the pursuit of a unified treatment of de and en. We should point out other important features that set our proposal apart. First, we correctly handle both number and gender agreement in constructions like (34), while Hulk limits herself to the former. Second, differently from Carlier et al. (2003), en can represent outside the verbal domain not only noun complements, but also adjuncts to nouns and complements of adjectives and cardinal numbers.
The implementation in XLE enabled the grammar to be extensively tested on a large amount of examples. The results revealed a high level of coverage and low overgeneration: all grammatical sentences were successfully analyzed, with a low ambiguity rate, while only 5,3% of the ungrammatical examples were assigned valid f-structures.
As opportunities for further research, we suggest: implementing other QForm types; investigating the occurrence of en in preverbal subject position in large corpora and modeling the constraints it is subject to; and reducing ambiguity and overgeneration of the grammar.