Systematising corpus-based defi nitions in second language lexicography *

This study explores the application of a semantic analysis methodology to the creation of defi nitions of learners’ dictionaries. Defi ning words is a delicate art, traditionally left to the general writing skills of the lexicographer, introspection, or even intuition. In the past decades, many efforts have been directed towards systematising the technique, particularly in the pedagogical dictionaries momentum. In this paper, we try to demonstrate that there is still room for improvement, and that a systematic corpus analysis can be applied to build better explanations for the meaning of words. We explain the theoretical background chosen for our study, the associated methodology and fi ve specifi c strategies which can guide the lexicographer through his/her task. In order to give a concrete example, we show our work with a Spanish dictionary project for foreign learners which is currently under development and has a core of very frequent Spanish verbs already available on the Internet.


Introduction
It is well established in the lexicographical tradition of monolingual pedagogical dictionaries that these dictionaries must be tools for learning a language, and not only resources to solve specifi c doubts in a quick, practical way during the process of decoding or encoding (Bogaards 2010;Atkins and Rundell 2008: 406-411).This position has been defended both from theoretical and methodological points of view, and many dictionaries in different languages have been edited following this general consensus.As an example, the Collins Cobuild English Language Dictionary (Sinclair 1987a), the Oxford Advanced Learners Dictionary of Current English (Hornby and Cowie 1963), and other pedagogical dictionaries of English can be included in this category.
Specifi cally, there is a fundamental aspect which seems to make monolingual dictionaries useful tools for learners: they include syntactic as well as lexical and semantic information in a way that neither grammars nor traditional dictionaries seem to cover.In this sense, electronic dictionaries can improve substantially the proposals of paper dictionaries (DeSchryver 2003).For example, in Figure 1 we show one of the meanings of the entry jouer ('to play') in the Dictionnaire d'apprentissage du français langue étrangère ou seconde, DAFLES (Verlinde and Selva 2006).Full-sentence defi nition provides information about meaning (…practique un sport où l'on utilise une balle, un ballon), syntax (une personne joue au foot… is a transitive pattern the defi nition shows) and collocations (jouer au foot, au basket, au rugby…).Examples are rich in syntactical information: in the fi rst one, the transitive pattern of the defi nition is repeated (jouer au foot), and the second one provides the alternative passive pattern with se particle (le foot se joue aussi en salle).As can be observed in the previous example, defi nitions are one of the main parts of the dictionary entry in which relation between syntax and lexical units can be shown (Bosque 2006: 47-48).Classically considered as explanations of the meanings of words (Johnson 1755), they have become also patterns of syntactic and semantic behaviour of the word in context.Concretely, full-sentence defi nitions -fi rstly used in the Cobuild dictionary (Sinclair 1987a) -, as DAFLES' in Figure 1, are those which include the defi niendum in the defi nition: lorsque un personne joue au foot… There are also rhetorical resources to introduce the explanations, such as lorsque in the previous example or If a person plays a sport…, commonly used in the Cobuild dictionary.Another typical beginning for a full-sentence defi nition of verbs are the so called when-defi nitions (Lew and Dziemianko 2006;Adamska-Salaciak 2012): When somebody plays a sport...These formulae aim to be a more natural way to explain the meaning of the word, and, as already mentioned, they allow to show the syntactic pattern of the verb which is directly related to a specifi c meaning.
Full-sentence defi nitions have been upheld and studied from different perspectives.Hanks (1987) explains that this kind of defi nitions allows to provide a more precise information related to syntax and collocations.Harvey and Yuill (1997), in a study about dictionary use, concluded that full-sentence defi nitions were helpful for encoding tasks.As Rundell (2006) points out, these defi nitions, also called 'folk defi nitions', refl ect the natural way in which a teacher describes what a word means, in order to make the explanation accessible to learners, as they sound natural and also show how the word is used in context.At the same time, they are not without controversy: the same author urges to be prudent with their use as the quantity of semantic and syntactic information they provide can diminish its clarity.Fullsentence defi nitions are larger and more complex indeed than classical ones, and this could be the reason - Rundell (2006) explains -why they are not a generalised lexicographic practice.
The present paper starts from the work engaged by lexicographers and linguists we briefl y showed in the previous lines.We consider that there has not been yet a specifi c proposal about how full-sentence defi nitions are to be built, despite the fact that they have been used and considered useful for learners in different ways.The debate about folk defi nitions is, nevertheless, part of the long discussion, which has been from Aristotle to the present, about how to explain word meanings.The art of defi ning a word in a clear, rigorous way is fi ne and delicate and, traditionally, it has been addressed with the help of the general writing skills of the lexicographer, by introspection or even intuition.In the specifi c context of language learning, two basic problems must be tackled: a) establishing the criteria for adapting the semantic and syntactic information extracted from the corpus, and b) establishing the criteria for explaining this information in a pedagogical way, in order to make all these data more comprehensible to a foreign user.
In this study, we will focus on verb defi nitions in a pedagogical dictionary, specifi cally, an online dictionary of Spanish for foreign learners, the Diccionario de aprendizaje del español como lengua extranjera, DAELE (Battaner, in process -cf.Arias-Badia, Bernal and Alonso 2014).We deal with verbs as, in the case of this part of speech, the observation of syntactic features related to a lexical unit is particularly important and tricky.In relation to the methodology, we consider Corpus Pattern Analysis (Hanks 2004a) -also known as CPA -as the technique for exploiting and analysing our corpus data, as this methodology was specially created for lexical analysis and for lexicographical purposes.Thus, in the following pages, we argue why a systematic corpus analysis is needed in front of the traditional introspective model (Section 2) and we make a brief introduction to CPA and its theoretical background (Section 3).Section 4 is devoted to describing our proposal which connects CPA with full-sentence defi nitions of a Spanish dictionary for foreign students.Finally, in Section 5 we draft some conclusions and a few lines for future work.

Defi nitions in corpus-driven dictionaries vs classical dictionaries
In this section, we illustrate some examples of classical, non corpus-based defi nitions, in order to justify why learners' dictionaries could benefi t from corpus analysis procedure and from more systematic defi nitions.Two key questions are strongly connected: on the one hand, the need of having a system to make corpus-driven dictionaries; and, on the other hand, the possibility of offering more rigorous grammatical information to the user, systematically linked to the semantic information offered by the dictionary.As already said, traditional dictionaries were -and in most cases still are -based on introspection and previous dictionaries (Sinclair 1991: 37-41), and the traditional lexicographer's task was to try to discover the inherent meanings of words.However, corpus data have shown that this system is not capable of producing a satisfactory description of the normal patterns of use of words.To put it simple, traditional lexicographers ask themselves 'What does arrive mean?' or 'Which are the different meanings of arrive?', in contrast with corpusdriven questions about meanings in language, such as 'What does arrive mean in this context?' -see Section 3 for a review of the theoretical postulates underlying this change of perception.Since the 'corpus revolution' and its application to lexicography in the fi rst corpus-based dictionary, Cobuild (Sinclair 1987b), the use of a corpus for building dictionaries became a sine qua non conditionin the state-of-the-art.Even if the task can be done without this type of analysis, corpus analysis has proven to improve lexicographical work in many different ways.See, for example, the meanings of three different verbs in the Diccionario Salamanca de la lengua española (Gutiérrez Cuadrado 1996) 1 : In the fi rst case, two different patterns of the verb estallar ('to burst') have been included in the same meaning: estallar + en + noun 1.Despite the lack of precision found in some of the entries, we have considered this dictionary because it is one of the most complete, rigorous traditional Spanish dictionaries for foreign learners currently available.The English version of the examples is intended to be literal word-for-word translations of the Spanish entries.However, it has not always been possible to state to this rule.For instance, in the case of coser, the literal equivalent in English would be 'to sew'.But for sense 5, it is not correct to give 'to sew' as an equivalent.In English, the verb used in this case is 'to riddle'.It must be highlighted, however, that 'to riddle' activates a slightly different conventional metaphor than the one activated by the Spanish verb coser.and estallar + de + noun.If we look up in a corpus 3 , two different groups of concordances, according to the complement, can be observed: a) estallar + en + abucheos, gritos, aplausos, blasfemias, cánticos, carcajadas, llanto...; b) estallar + de + alegría, gratitud, furia, ira... Group a) consists basically of external expressions of feelings which manifest themselves through noise (such as cries, shrieks, or guffaws); group b) refers to intense feelings (such as joy or anger).Thus, it would not be accurate, in the most frequent use of these two patterns, to combine these two groups, such as *estallar de llanto or *estallar en alegría.If both structures are combined in the same meaning in a dictionary, it would not be possible for a learner to predict how to use them.
In the case of the verb sorprender ('to surprise'), if we look into the IULA50 corpus (see note 2), 167 concordances denoting meaning 1 of Salamanca, but approximately 10% of them are complemented by a clause, as in the following sentence: Me sorprende que la detención de Isabel Pantoja sea objeto de controversia política ('It surprises me that the arrest of Isabel Pantoja is an object of political controversy.').Thus, a very common use of this meaning is not indicated in the entry, and the learner can have doubts about the correct use of this structure.
Finally, the fi fth meaning of the verb coser ('to riddle') is defi ned as 'to cause injuries'.If we look up this word in the corpus, we fi nd the following group of complements: coser + a + balazos, codazos, patadas, porrazos, puñaladas... ('bullet wounds, elbows, kicks, blows, knife wounds...').That is, the action is not only restricted to 'injuries', but to many types of aggressions a person can infl ict on another, for example with elbows, legs, or truncheons.Furthermore, all complements are in plural (it is not possible to say ...lo habían cosido a *balazo), and this is not illustrated in the entry.
To summarize this section, we conclude that many aspects of the real usage may be omitted from a traditional dictionary entry, and these omissions could cause confusion among students.A corpus-driven approach offers clues to make the dictionary closer to the users' needs and makes possible to offer data connected with normal and real uses of a given word.
3. For the analysis, IULA50 corpus, consisting of 50 million words of press articles linked with the Spanish CPA project, has been used (Renau 2012: 185-186).

Theoretical and methodological framework: the Theory of Norms and Exploitations and Corpus Pattern Analysis
In the previous section, we showed that corpus analysis methodology can be used for improving the quality and quantity of the information offered to the learner in a dictionary.The present section is devoted to making a general presentation of CPA, the specifi c methodology for corpus analysis we chose for our proposal.CPA is theoretically supported by the Theory of Norms and Exploitations -henceforth, TNE - (Hanks 2004b(Hanks , 2013)).We will briefl y present the main postulates of TNE and how they are connected to CPA.We will also illustrate some samples of practical work with CPA in English and Spanish.

TNE, a theory for explaining how meaning is created through words
TNE is a lexically based and corpus-driven approach, whose main objective is to describe how speakers use words to make meanings.In this theory, it is postulated that, on the one hand, words are used in normal lexical patterns: in TNE, a pattern is defi ned as 'a semantically motivated and recurrent piece of phraseology' (Ježek and Hanks 2010: 8).Each normal pattern is associated with a unique meaning.On the other hand, norms may also be 'exploited' for rhetorical or another effect.As Hanks (2013: 211-215) states, 'an exploitation is a dynamic mechanism in language to create new meanings ad hoc and to say old things in new ways'.Anomalous collocations, ellipsis, creative linguistic metaphors and similes, as well as other creative fi gures of speech are examples of exploitations.In relation to dictionaries, lexicographers have a duty to describe norms, but to ignore exploitations, though the dividing line between a normal use of a word and exploitations of that norm may be fuzzy (Hanks 2013: 16).
In order to briefl y exemplify how this double system of norms and exploitations works, we consider the intransitive verb to arrive.This verb is usually used in the structure subject + arrive + at + complement, but this is not suffi cient for establishing the difference between 'He arrived at the house' and 'Jane and I quickly arrived at joint decisions about the project', two sentences that are syntactically identical but semantically different.Thus, it is the semantics of the valency structure of the verb, and not the verb in isolation, which gives evidence of the specifi c meaning for these instances of to arrive.The two patterns exemplifi ed above can be formalised as follows 4 : after a process of long and careful thought and/or discussion.
As what regards to CPA's annotation, semantic types are written in double square brackets, complements between curly brackets, one of them between parenthesis meaning that it is optional.The [NO OBJ] indication blocks the possibility of a direct or indirect object complementing the verb.In italics, there is the implicature, a paraphrasis or explanation of the conventional meaning connected to the pattern.TNE has its origin in the work of a large number of authors who focused on the study of the lexical unit and its connection with the context in which it is used.It is especially relevant the theory of Sinclair (1991Sinclair ( , 1999)), who was in turn infl uenced by the work of Firth (1957) and Halliday (1976).The pioneer work of Hornby (1954) in dictionaries for foreign learners was also fundamental, and Hunston and Francis (2000) made also contributions to the study of grammar patterns.Specifi cally, Sinclair (1999) argues that words have their meanings in context and not in isolation, and by 'context' we mean not only the syntactic structure, but also collocations.
Nevertheless, in all the previous approaches there is still not a formalised methodology of mapping meaning onto use, despite the established theoretical basis.In TNE, the grammar pattern is populated with semantic and statistical information about collocates.While syntax is analysed according to clause roles, using the SPOCA model 4. See the PDEV (Hanks, in progress) for the whole analysis of this verb.The PDEV is available at http://www.pdev.org.uk(last access: 27/5/2016).] is a semantic type, and 'Professional', 'Footballer', or 'Judge' can be roles.In our example, the semantic type [[Concept]] of pattern 2 is specifi ed by the role 'Considered opinion'.So, if we focus only on patterns 1 and 2, the verb arrive means something similar to 'to come to a place after a journey' if the complement is a location (pattern 1), and 'to adopt an opinion after a process' if the complement is a concept or opinion (pattern 2).Furthermore, only persons and vehicles can be normal subjects of pattern 1,whereas only persons and institutions can be normal subjects of pattern 2. It is not possible to know in advance what arrive or any other word means,without taking into consideration its context of occurrence.In this way, let's imagine we make again the general (and common) above-mentioned question: 'What does to arrive mean?'.Taking into account a contextual analysis, a possible answer would be, according to Firth (1957: 11): 'It depends on the company it [the word] keeps'.The context activates one of the various meanings that only virtually exist in the verb.Thus, a word requires the presence of other words if it is to mean something -'many, if not most, meanings require the presence of more than one word for their normal realization' (Sinclair 1999: 133).
5. The ontology used in CPA project -see section 3.2 -is in progress as all PDEV project (Hanks, in progress).The current version is available at http://www.pdev.org.uk/#onto(last access: 27/5/2016).See also Ježek and Hanks (2010).Some native speakers may point out that pattern 2 can be classifi ed as a conventional metaphor based on the more literal use in pattern 1.This may well be true, but metaphorical status is irrelevant to the reader's or listener's task of decoding the meaning of an utterance.In this sense, the picture is not complete without the concept of 'exploitation', a mechanism for creating unusual meanings for a particular context when the word does not convey the exact meaning the speaker wishes to express.In the sentence 'The plot had arrived at Beirut', the noun plot is being treated as if it was a moving vehicle.With rigorous respect for corpus data, this sentence does not fi t with pattern 1, but the notion that a plot is something that moves is not frequent enough to be considered a separate pattern.According to the TNE, this is an exploitation, a metaphorical, creative modifi cation of an established pattern -exploitations are explained in detail in Hanks (2013: 211-250).

Systematising corpus analysis of lexical patterns with CPA
CPA (Hanks 2004a(Hanks , 2010) ) is the procedure for analysing normal patterns of usage of words in context.It is based on the TNE postulates and establishes the formula to corpus analysis and pattern extraction.The result of an analysis applying CPA is the one shown in the previous section as an example (the verb to arrive).
CPA is inspired mainly by lexicographical needs, but in fact represents an innovative way of doing corpus analysis that could be used for natural language processing (Hanks and Pustejovksy 2005); for instance, for word sense disambiguation (El Maarouf, Baisa, Bradbury and Hanks 2013).It has also been applied to terminology (Alonso 2009;Alonso and Renau 2013) or pedagogical lexicography (Renau 2012).It is still currently a manual system, though it is supported by computational tools.There are already some preliminary attempts to automate certain parts of the task (Nazar and Renau, in press), but this is still work in progress.Finally, CPA is the basis for compiling the Pattern Dictionary of English Verbs,PDEV (Hanks, in progress), in which the main analysis of CPA patterns is being developed.

Applying CPA to pedagogical full-sentence defi nitions
In this section, we describe a proposal for adapting CPA patterns to full-sentence defi nitions of Spanish verbs, in the context of the already mentioned DAELE project.As we stated in previous sections, an appropriate defi nition for learners must take care of the following aspects: a) Correspond with real usage, that is, strictly follow corpus data.
b) Offer information about the semantics of the word not in isolation but connected to other lexical units, apart from collocations.
c) Show information about how the word can be used in terms of most frequent syntactic structures.d) Finally, offer all these components in a clear and comprehensive way, in order to make the information easy to understand for a non-expert user.
We fi rst make a brief presentation of the DAELE pilot project.Secondly, we illustrate the application of CPA for the building-up of full-sentence defi nitions.

A pilot online dictionary of Spanish for foreign learners
As already mentioned in Section 3.2, DAELE is a pilot dictionary for Spanish learners which, being monolingual, is conceived for intermediate or advanced levels.The project is currently in its fi rst stages of development, and the fi rst grammatical category being treated is the verb, as it is a fundamental part of the sentence and one of the most diffi cult categories of the dictionary in terms of grammar complexity.DAELE is based on the work developed fundamentally by British pedagogical lexicography, and it is trying to apply the Sinclairian conception of dictionaries, above all in his major dictionary project, Cobuild.We adopt the conception of corpus as the origin not only of examples but of the whole analysis of entries; full-sentence defi nitions are also used, according to the principles set out in Hanks (1987: 116-136).Every defi nition is supported by examples of real usage.A description of various aspects of the dictionary is issued in Battaner (2010), Renau (2012: 244-245) and Arias-Badia, Bernal and Alonso (2014), among others.In DAELE's website (http://www.daele.eu,last access: 27/5/2016) there are currently around 350 high frequency Spanish verbs.From these verbs, a core of 60 verbs was analysed by applying CPA and adapted to the dictionary following the methodology we are describing in this paper.A sample of these verbs can be consulted in a preliminary version of Spanish CPA database (Renau 2012: 179-242): http://www.tecling.com//cgi-bin/dsele/scpa.pl (last access: 27/5/2016).The adopted web format is fundamental to offer information about grammar and collocations, because it allows to provide extended explanations and a large amount of data.These data can be connected through hyperlinks creating a net of semantic and syntactic features.Nevertheless, in web applications, it is also necessary to be concise and to devote attention to the user's specifi c needs (Atkins and Rundell 2008: 20-24).But, at the same time, space limitations, one of the biggest diffi culties in all lexicographical traditions, is no longer a problem, and it is now possible to organise the information with labels that can be either exposed or hidden by the user.This entry, as a result of intensive corpus research, has two meanings, one labelled as 'tener como coste' ('to cost') and the other as 'ser difícil' ('to be diffi cult').In the fi rst case, this wide meaning is divided into two more specifi c uses, a and b.The second meaning is constituted by only one use.The difference between the two uses of meaning 1 is that a is devoted to products or other things that have a price, and b describes actions or processes that must take place or that happen by spending time or effort.There are also notes (headed by the word 'nota'), specifi c notes for examples (in square brackets) and collocates (headed by the label 'combi').

Proposal for the application of CPA to full-sentence defi nitions of a learners dictionary
Regarding the application of CPA to defi nitions of a Spanish learners' dictionary, in Section 1, two key aspects to take into account were already explained: criteria of adaptation and pedagogical goal.In this section, we will show the procedure and illustrate how we proceed with some verbs.
The methodology involves the following three basic steps: Figure 3 shows a schema of the whole process.As step a) is the main corpus analysis, in the next section it is left behind in order to focus on the connection of steps b) and c).

CPA-DAELE connection
The process to convert a CPA pattern (step b) into a full-sentence defi nition (step c) has different implications: it means to convert a highly encoded information made for being understood only by specialists -being linguists, language teachers or computer scientists -into a pedagogical explanation for non-native students.To sum up, it means to make the process we synthesise in Figure 4 6 .6.For the whole analysis and the lexicographical proposal of the verbs shown as examples in this section, see Renau (2012).The samples can be found at http://www.tecling.com//index.php?l=dsele(last access: 5/6/2016).In this fi gure, fi rst part of the defi nition (underlined) corresponds to the CPA pattern, which contains the semantic and syntactic information about the context in which these specifi c meanings of these two verbs are used.The part of the defi nition after the defi niendum is the explanation of the meaning, which corresponds to the conventional meaning or implicature.
The following strategies can be assumed to adapt lexical patternssuch as CPA ones into full-sentence defi nitions for a learners' dictionary such as DAELE.a) Convert CPA semantic types into basic vocabulary.The fi rst criteria of adaptation are to change semantic labels used on CPA ontology into basic vocabulary.As explained in section 3.1, in terms of semantics, CPA patterns are created mainly with semantic types (concepts) inter-connected in a shallow ontology.They are used to characterise the semantics of verb arguments.For example, semantic types are [[Human]], [[Artifact]], that is, all the things created by human beings; [[Process]], all things which happen spontaneously or without human intervention; [[Emotion]], all feelings, etc.
In order to adapt these labels, the most obvious step is to keep the same noun, when it is clear enough for the non-expert user.For example, [[Process]] or [[Illness]] are directly adapted to proceso 'process' and enfermedad 'illness'.Nevertheless, in many cases, some partial change is needed.In the case of [[Human]], for instance, it is converted to persona 'person' or alguien 'somebody', because these are the most common options in dictionaries to refer to humans, and are familiar to users.Another example is the more general semantic type [[Physical Object]], which in CPA ontology refers to 'anything with physical nature', such as a cup, a chair but also a building or a planet.In our case, [[Physical Object]] is normally adapted to cosa 'thing' or objeto 'object', because it is the most common, natural word to refer to these objects, without further specifi cation.Also, in any natural language, such as English or Spanish, objects are prototypically physical.Figure 5 shows an example of strategy a).b) Selection of a lexical set to delimit a semantic type.In some cases, semantic types mentioned in a) can be less informative for a user due to its general scope.For example, it is more informative to defi ne open with Somebody opens a box, bottle, can… when… rather than with Somebody opens a container when…Two reasons may explain this fact: fi rstly, that vocabulary units such as box, bottle or can are more frequent than container, and secondly, related to the previous one, that they are more illustrative and informative.This happens more often in arguments in direct object position rather than in subject position.Another clear example for the same Spanish verb abrir 'to open' is its use with the meaning of 'to make an injury', but in this case, it is restricted mainly to head and brow.It is not possible to create a clear defi nition with a strategy such as Alguien abre una parte dura y redondeada del cuerpo a otra persona cuando…('Somebody causes an injury in a hard, rounded body part to another person when…').It is more clear and simple to say Alguien le abre la cabeza o la ceja a otra persona cuando… ('Somebody causes an injury in other person's head or brow when…').This strategy allows to include other aspects of usage meaning, such as the expletive (redundant) pronoun le (Alguien le abre la cabeza…).
In sum, lexical sets are a group of words which populate the semantic valency structure of the verb and are grouped by a semantic type.They usually do not include all the lexical items that could potentially be included on it, that is, there are open sets.In defi nitions, this can be solved with ellipsis (…) or the abbreviation etc.We use frequency criteria to decide which lexical units to select from the set.To obtain frequency and salient data, Word Sketch tool in Sketch Engine (Kilgarriff, Baisa, Bušta et al. 2014) is used.Figure 6 shows different examples of this option.b) is also used in many cases, when it is considered that a general semantic label is explicative enough by itself, but some example of a lexical item may be of help.For instance, in the case of casar ('to marry'), the lexical set cura, juez, etc ('priest, judge') can be restricted with … or another competent authority, an adaptation of the semantic type [[Human = Civil or Religious Authority]] (Figure 7).d) Making the syntactic structure explicit.Finally, there are some patterns that are only used in a specifi c syntactic structure.In this case, this structure is made explicit in the defi nition, instead of 'hiding' it in a more general pattern.For example, it is relatively common that some senses are activated only with clauses in direct object position.In this case, the structure is refl ected in the defi nition.In Spanish CPA, clauses are represented by the semantic type [[Eventuality]], which alludes to actions or processes.See Figure 8 for an example with the verb imaginar(se) 'to imagine'.4.2.2.'Detaching' meanings from the corpus to the dictionary: an example with the verb desprender/se This section is devoted to explaining in detail the process shown in the previous section with the verb desprender(se) ('to detach, to give off') as an example.
Both English and Spanish CPA use a random sample of a minimum of 250 concordances, in the case of a verb such as desprender(se).Highly frequent and polysemous verbs need, however, larger samples.For creating the sample, and also for labelling each concordance with its respective number of pattern, all versions of CPA for each language are using a modifi ed version of Sketch Engine.In the case of Spanish, a journalistic corpus of 50 million words is used -IULA50, see note 2.
Table I shows the CPA patterns (and implicatures) derived from the analysis of the IULA50 random sample, and the defi nitions of DAELE created from these patterns.
For desprender/se, the following patterns were detected in the IULA50 corpus:

DAELE defi nition
Corpus example 1 (i) Algo o alguien desprende una cosa de otra a la que estaba unida, pegada o en la que estaba sostenida cuando la separa de modo que deje de estar en contacto con ella.

CPA pattern
Para apoyar la causa, el pintor se desprendió de una de sus obras.Prototypically desprender means -shown in pattern 1a) -that an agent like a human or an event (for example, the wind) detaches a thing or part of a thing from another object.This prototypical meaning is infrequent in the corpus, and, on the contrary, pattern 1b is fairly frequent: it is used to express that a thing or a part of it detaches from another object without the intervention of any agent (inchoative structure).The rest of the patterns are fi gurative meanings derived from the fi rst two: pattern 2 is used when a person or institution ceases to possess something else, generally by donating it.Pattern 3 expresses that a physical object gives off smaller parts of itself away.Pattern 4 is, as pattern 1, another case of causative-inchoative alternation, and both denotate the situation in which something or somebody causes certain emotion in people.Finally, pattern 5 is used for ideas that are derived from some piece of information.

3[[Physical
With respect to the defi nitions, all the effort must be made in order to make them easy and quickly understandable.As explained in the previous section, it seems impossible to create the implicature exactly as it is created in CPA: some semantic types, such as [[Event]] (evento in Spanish), refer to very broad concepts which may not be suffi cient for clarifying the use of the verb to a learner.
For this reason, in some cases, a lexical set is used in the defi nition instead of the corresponding semantic type -strategy b in Section 4.2.1.The words in our lists are chosen as being typical members of the relevant lexical set.For example, in pattern 3 the semantic type [[Stuff]] ('materia') is used to describe the complement, but this is non-restrictive for a learner.In the defi nition, therefore, we help the learner by including the list olor, gas, sustancia o radiación ('smell, gas, substance, or radiation'): the most frequent options (smell and gas) are indicated at the beginning, whereas the less frequent ones are included at the end.Furthermore, the opposite situation must also be avoided: there are semantic types that are so specifi c that restrict too much the meaning of the word.In this case, more general nouns such as algo ('something') or cosa ('thing') are used in the defi nition.In sum, when a CPA pattern is used as a basis for an entry in the DAELE, a balance between generalisation and specifi cation is required in order to clarify the meaning to the user and to adapt the process to the own user's needs.
DAELE entry built from the patterns being illustrated in Table I is shown in Figure 9.

Conclusions and future work
In this paper, we have shown a proposal for systematising defi nitions for learners' dictionaries.Using CPA for building lexicographical entries is highly time-consuming.In addition, the database and other tools still need to be improved to becomemore effi cient, not only in terms of the time invested but also in the quality of the resulting data.However, CPA provides a fi ne-graded analysis of language in use, and it can be considered a systematisation and extension of Sinclair's ideas.As pointed out in Section 1, it is a reasonable assumption that dictionaries must be built from a corpus, but corpus analysis must be supported by a system which not only guarantees coherence of the work of one lexicographer but also -and this seems even more importantthe work of every component of a lexicographical team.Spanish CPA and DAELE are currently ongoing projects, and many tasks are still left for future work.Apart from some obvious steps, such as increasing the number of verbs and types of verbs to be analysed or testing the same methodology for other languages, automatising the process is one of our main concerns for the near future.CPA is very time-consuming.If it is confi rmed as a proper methodology for dictionary making, proposals for making the work easier, faster and more precise need to be developed.In this sense, our work is following two ways: a) The semiautomatic creation of verb patterns, that is, that the process of analysing the corpus and creating the patterns is executed partially by automatic procedures (Nazar and Renau, in  press 7 ); b) Automatising some parts of the creation of defi nitions: once the patterns as the ones showed in Figure 3 have been created, it would 7.An ongoing project started by Renau and Nazar to automatise Spanish CPA can be found in http://www.verbario.com(last access: 27/5/2016).be relatively easy to implement templates to automatically generate natural defi nitions from the patterns by translating CPA's annotation, facilitating therefore the defi nition writing process.

Aknowledgments
Many thanks to Paz Battaner and Patrick Hanks for their invaluable constant support.
2. Translations of the three entries are (respectively): burst v. intr.[...] 5 To manifest < somebody > [an emotion or feeling] suddenly and strongly: The boy burst into tears./ It seemed that she was going to burst with joy when they gave her the prize.surprise v. tr. 1 <Of a person or thing> to cause [somebody] to feel surprise: The question surprised me.riddle v.intr.[...] 5 To cause [a lot of injuries] [to another person]: I saw the corpse in the morgue, and it was riddled with bullets.
(subject, predicate, object, complement, adverbial), semantics require using semantic types.Semantic types are intrinsic attributes of a noun; they represent cognitive concepts such as[[Human]],[[Institution]],[[Vehicle]],[[Event]], etc.They can be seen as hypernyms to defi ne more specifi c words.Thus, in the previous example of the verb to arrive,collocates in the subject position of pattern 1 could be ambulance, guest, train, messenger, visitor, plane or convoy, among others, providing that the associated semantic types are[[Human |  Vehicle]].For the sake of coherence, semantic types are hierarchically organised in a bottom-up shallow ontology of basic concepts built from the corpus 5 .Semantic types are complemented by lexical sets and contextual roles.Lexical sets are groups of collocates occupying an argument position, and they are used to complement the semantic type when the latter is too general to characterize the intended meaning.Contextual roles, in turn, are more specifi c concepts belonging to a semantic type and are assigned by context.For instance,[[Human]

Figure 2 :
Figure 2: (see page 966) shows the DAELE entry of the verb costar ('to cost') as an example.

Figure 2 -
Figure 2 -The verb entry costar 'to cost' in DAELE, expanded version.

Figure 3 -
Figure 3 -The process of corpus analysis, pattern extraction and adaptation to the dictionary.

Figure 4 -
Figure 4 -Two examples of CPA patterns of the verbs beber 'to drink' and admirar 'to admire', and their corresponding defi nitions for DAELE.The part of the defi nition corresponding to the pattern of usage is underlined; the rest of the sentence corresponds to the explanation of the pattern.

Figure 5 -
Figure 5 -A pattern of the verb cortar 'to cut' and the corresponding defi nition in DAELE.

Figure 6 -
Figure 6 -Patterns of the verbs abrir 'to open' and admirar 'to admire' and their corresponding defi nitions in DAELE.

Figure 7 -
Figure 7 -A pattern of the verb casar(se) 'to marry' and its corresponding defi nition for DAELE.

Figure 8 -
Figure 8 -A pattern of the verb imaginar(se) ('to imagine') and its corresponding defi nition for DAELE.

Table I -
Correspondence between CPA patterns and the meanings shown in DAELE.