The learner corpus path: a worthwhile methodological challenge

Corpus compilation is a challenging research endeavor that many researchers decide to pursue. Few learner corpora, however, can be easily accessed (e.g.,the International Corpus of Learner English), and none of them carry a variety of text registers written by English learners at different proficiency levels studying in the Brazilian university context. Therefore, the aim of this paper is to present the compilation of a learner corpus, much needed in our research and teaching context, pointing out the advantages of building this type of corpus for the understanding of learners’ needs as well as for pedagogical decision-making based on sound data. Presenting a detailed rationale of the corpus compilation, this article reveals the various decisions made in order to guarantee that fair comparisons can be made. To exemplify the value of building a carefully designed corpus, results of previous studies are compared. Some of the conclusions reached refer to the need for discipline-specific tasks to propel writing proficiency and for authorship skills to be developed in English for Academic Purposes classes to foster academic success.

When linguists select their research questions and choose how to investigate what they are interested in, so many decisions have to be made. Undoubtedly, the methodology has to be appropriate for the study. Using a corpus linguistics methodological perspective may be the best choice if the questions are related to how people use language in different contexts, as Crawford and Csomay highlight: While understanding variation and contextual differences is a goal shared by researchers of other areas of linguistic research, corpus linguistics describes language variation and use by looking at large amounts of texts that have been produced in similar circumstances. (Crawford & Csomay, 2016, p.5) This empirical approach allows results to be generalized, as a well-designed corpus will adequately represent a register, which can be understood as "a variety associated with a particular situation of use (including particular communicative purposes)" (Biber & Conrad, 2009, p. 6). The researcher, then, should consider the situational characteristics of a register: the participants, the relations among participants, the channel used, and the production circumstances (Biber & Conrad, 2009). Having established clear research questions to answer and the characteristics of the register or registers the linguist is interested in investigating, it is time to choose the corpus to be used. Would a readily available corpus be suitable, or would a corpus compilation be necessary?
In our research area, learner language, there are few corpora that can be accessed, for instance, the International Corpus of Learner English (ICLE) 1 , the Louvain International Database of Spoken English Interlanguage (LINDSEI) 2 , Louvain Corpus of Native English Essays (LOCNESS), the Michigan Corpus of Upper-Level Student Papers (MICUSP) 3 and the British 1 ICLE, released its third version in 2020 (Granger et al., 2020) is a corpus of argumentative essays written by learners of English from upper intermediate to advanced levels of English and from 25 different language backgrounds. It has over 5.5 million words and is hosted on a web-based interface. https://uclouvain.be/en/researchinstitutes/ilc/cecl/icle.html. 2 LINDSEI is a corpus of interviews gathered from learners of English speakers of 11 different native languages. (Gilquin et al.,2010). https://uclouvain.be/en/research-institutes/ilc/cecl/lindsei.html. 3 MICUSP, a written corpus, has about 2.6 million words. Corpus information is available at http://micusp.elicorpora.info/ Academic Written English corpus of English texts (BAWE) 4 . Each of these corpora was designed with a specific purpose. While ICLE and LINDSEI were compiled to allow access to English learners' interlanguage 5 , MICUSP and BAWE focused on high grade written papers of different genres and LOCNESS on essays written by American and British university students.
However, there are several similarities regarding the situational characteristics involved in the compilation of these corpora. The participants are all students at higher education institutions.
They are authors who write or speak, either in a context where what is produced is being assessed or not being assessed, including or excluding time constraints. Besides the fact that the production circumstances may vary, the addressors are students and can be considered novice or apprentice writers. 6 The main difference among these corpora is that participants have different first language backgrounds. After reflecting on these characteristics, a researcher may wonder how suitable such corpora would be for their study. In our case, our research context is a Brazilian university; consequently, some questions would remain unanswered if our studies are limited to these corpora. Despite the two facts that the corpora are all quite large and that ICLEV3 has a subcorpus of essays written by Brazilian students (Br-ICLE 7 ), we ultimately found them insufficient for our needs, particularly to deeply investigate linguistic variation across text genres, across academic levels (undergraduate and graduate), across disciplines and across proficiency levels to understand the users' choices with cross-sectional or longitudinal data perspectives. Such aspects cannot be fully covered with Br-ICLE data. Furthermore, making a new corpus available to other researchers has also been one of our goals. A CQPweb framework will soon be available for searchers on our corpus 8 with tools such as keyword 4 The BAWE corpus contains 2761 pieces of proficient assessed student writing. It can be accessed through the Oxford Text Archive https://ota.bodleian.ox.ac.uk/repository/xmlui/handle/20.500.12024/2539. 5 The term interlanguage was coined by Selinker (1972, p. 214): "… the existence of a separate linguistic system based on the observable output which results from a learner's attempted production of a TL norm. This linguistic system we will call 'interlanguage' (IL)." 6 Scott and Tribble (2006, p.133) prefer to use the terms 'apprentice' and 'expert' writers rather than 'learner' and 'native speaker'. "Expert texts may most easily be identified on the crude basis of their having been published, or their having been disseminated to specific readeships within bureaucratic, commercial, professional or other organizations". 7 Br-ICLE, coordinated by Tony Berber-Sardinha, has 200.000 words. 8 Besides the university learner corpus described in the article, we will also make available other corpora organized by our CNPq research group, Grupo de Estudos de Corpora Especializados e de Aprendizes (GECEA), such as CALIEMT:Corpus de Aprendizes da Língua Inglesa do Ensino Médio Técnico (Xavier et al., 2019;Oliveira et al., 2017) and CorAChem (Corpus of Articles in Chemistry) and CorAAL(Corpus of Articles in Applied Linguistics) (Dutra et al., 2020). search, collocate list, with different association measures, and visualization of occurrence dispersion.
To the best of our knowledge, there was no comprehensive learner corpus of Brazilian university learners' English written texts compiled in the classroom context and available for the studies our research group was aiming at. Therefore, in 2013, as Section 3 lays out comprehensively, a Brazilian learner academic English corpus was designed (CorIFA 9 ). Our efforts were motivated, fundamentally, by our desire to improve learners' use of academic English. Corpus analysis allows the studying of a particular group and the corpus compilation seemed to be a great challenge to be pursued as, ultimately, it would be the basis for developing appropriate materials and new courses. Accordingly, the aim of this paper is to present the compilation of a learner corpus, much needed in our research and teaching context, pointing out the advantages of building this type of corpus for the understanding of learners' needs as well as for pedagogical decision-making based on sound data. The following sections will deal, firstly, with the literature, which is the basis of our work, secondly, with the methodological paths taken to compile the academic English learner corpus and thirdly, with studies based on CorIFA.

Theoretical background
Types of English spoken as an additional language 10 far outnumber native-speaker varieties 11 inspiring the study of non-native spoken or written English. This is the focus of 'learner corpus research' 12 (LCR), an umbrella term (Granger et al., 2015) to refer to interlanguage investigations that are based on corpus linguistics. A corpus is a "collection of pieces of language text in electronic form" (Sinclair, 2005: 19) compiled according to some criteria, and representing a language or language variety. Corpus compilation requires establishing precise and broadly-inclusive criteria for the consideration of the mode (e.g.,written), the type (e.g.,a research article) and the domain (e.g., academic) of the texts, the language or language varieties (e.g.,learner English), location of texts (e.g.,compiled in Brazil) and text production dates (e.g., from 2015-2020), according to Sinclair (2005). With a focus on description of language use, corpus linguistics, using a range of linguistic software tools, allows both quantitative and qualitative approaches to learner language. Among its advantages are the capacity to deal with a considerable amount of data, and generalizability of results across similar groups. A welldesigned corpus, therefore, is representative of a population. As Biber points out: Any selection of texts is a sample. Whether or not a sample is 'representative', however, depends first of all on the extent to which it is selected from the range of text types in the target population; an assessment of this representativeness thus depends on a prior full definition of the 'population' that the sample is intended to represent, and the techniques used to select the sample from that population. (Biber, 1993, p. 243) A careful corpus design enables generalizations of results, as statistical tests are often used to treat data. In this section, we will highlight some learner corpus research, focusing on their design characteristics and how they have coped with representativeness to be able to make comparisons across registers or groups, for instance.
The design of two of the largest learner corpora compiled in the 1990s (Granger, 1998) are worth mentioning: the International Corpus of Learner English (ICLE) and the Longman Learners' Corpus (LLC), especially due to how they have dealt with language, task and learner-related features (Tono, 2003). Language-related features are mode, genre, style and topic. These corpora encompass texts in the written mode with slight differences in the other features: mostly argumentative essays on a previously defined list of topics in ICLE and essays and exam scripts on a variety of topics in LLC. Task-related characteristics concern (a) data collection: sectional rather than longitudinal, (b) type of elicitation: spontaneous as contrasted to prepared or edited texts, (c) production time: either fixed or timed or untimed and done as homework, and (d) use of references allowed, for instance, dictionaries, with such information recorded. As for learnerrelated features, ICLE is a corpus with texts produced by university level students, while LLC allows for participation of different academic level groups. Both corpora have texts written by learners from a variety of first language backgrounds. While ICLE has high-intermediate to advanced material, LCC allows for the submission of texts at all levels. ICLE was "the first large collection of computerized learner data to be made available for research" while LLC "has been commercially available for research" (Tono, 2003, p. 800). Whereas ICLE has recently released a new version presenting over 5.5 million words (Granger et al., 2020); LLC, with 10 million words (Tono, 2003) does not have such updated information on their website. 13 Several learner corpus studies use Contrastive Interlanguage Analysis (CIA), which can be understood as analysis that "involves the comparison between learner language and the target language" (Granger, 2015, p. 13). Learner language has been called 'Interlanguage Varieties' (ILV), especially highlighting "the highly variable nature of interlanguage" (Granger, 2015, p. 18). This approach may compare students' oral or written texts with native speakers' texts (ILV CIA study results can enhance practitioners' understanding of their students' needs; nevertheless, teachers may choose to collect their own class corpus as a Do-It-Yourself Corpus (DIY corpus) 16 or create data-driven learning (DDL) (Johns, 1991) activities based on ready-made corpora (e.g.,COCA, BNC 17 ) or on their DIY corpus. It is worth mentioning that DDL (Johns, 1991(Johns, , 1994 allows students access to corpus data and concordancing softwares as part of their language-learning process. Using the figure of Sherlock Holmes as a metaphor, Johns (1997) explains that learners are seen as detectives as they are encouraged, for example, to search and 13 http://global.longmandictionaries.com/longman/corpus#aa 14 Sw-ICLE stands for the Swedish subcorpus of ICLE. 15 Ch-ICLE is the Chinese subcorpus of ICLE and Dt-ICLE is the Dutch one. 16 Check this page for detailed instructions on how to prepare your own corpus https://www.lancaster.ac.uk/fss/courses/ling/corpus/blue/l04_top.htm. 17 There are several online corpora that can be sources of real language use (e.g., https://www.englishcorpora.org/coca/; https://www.english-corpora.org/bnc/). identify grammatical rules, vocabulary meaning, collocations and lexico-grammatical patterns, to name a few. Following Johns' (1991, p. 4) DDL format "identify-classify-generalize", learners who participated in Lee's research (2011) had the opportunity to learn and practice prepositions through the analysis of concordance lines. They explored a corpus comprising texts from J.K.Rowling's book Harry Potter and the Philosopher' Stone, concentrating on verb-preposition collocates. DDL activities raised students' awareness of the use of prepositions while it helped them to figure out the uses and functions of certain phrasal verbs. DDL contributed to students' language acquisition, proving to be a good way to prepare for exams, as it created a learning context with more enthusiasm and student autonomy (Lee, 2011).
Researchers compiling their own corpus need to adopt strict design criteria. According to Gilquin (2015, p. 16), in the case of learner corpora creation, the rules adopted are "even more crucial, given the highly heterogeneous nature of interlanguage" and such design criteria will be fully addressed in the next section. Moreover, few studies investigated students' texts produced in their own class writing contexts (Staples & Reppen, 2016), a gap that CorIFA studies serves to fill as the corpus is compiled from class activities in "English for Academic Purposes." With detailed gathering of learner metadata, and careful consideration of task and language variables, as described in the next section, CorIFA allows for thorough studies on classroom contextualized learner writing.

CorIFA: a learner corpus
According to Reppen (2010, p. 33), building a corpus requires a significant time investment as it involves a set of highly interconnected procedures. From collecting the texts to saving, storing, marking-up and adding metadata, the researcher is faced with a vast number of decisions that need to be considered when compiling a corpus. CorIFA was originally created in 2013 at a Brazilian public university. An overview of its compilation history, challenges and shifts is presented in this section. Above all, the corpus objective has remained the same: to describe Brazilian university students' written English interlanguage, as produced in a pedagogical context.

Data collection was inspired, at first, by the International Corpus of Learner English (ICLE).
CorIFA and ICLE carry similarities regarding task variables, such as, task medium, genre, topic and task setting. CorIFA compilation started with written tasks: argumentative essays, such as those in ICLE. Students were asked to write essays based on previously chosen topics, such as the internet, feminism, science and technology. Teachers asked the students to write their essays in class or at home, submitting them by email. Essay length, another task variable, was different from ICLE, since the latter required texts to be from 500 to 1000 words; whereas CorIFA allowed, at that time, 200 to 300-word essays. Regarding the learner variables, ICLE and CorIFA participants' age range and learning context are similar; the data for both corpora come from university students who have learned English in a non-English speaking country. The main differences between the two corpora, however, entail first language, academic level, discipline and language proficiency. At this point, we will refer only to first language as the other characteristics will be fully discussed in the following section. ICLE encompasses subcorpora with English texts from speakers of a variety of languages (e.g.,Chinese, Turkish, Portuguese, French, etc.) while CorIFA's participants are mainly Portuguese speakers 18 . Finally, the consent forms were in printed format when the corpus started being collected with basic information from the students, for instance, name and enrollment number. As consent forms were modified later on by the research group to collect more metadata, the texts collected in 2013 were discarded.
In 2014 another attempt to collect student texts and compile a learner corpus was made. As the primary goal of having students write texts was pedagogical, many decisions were taken by the subject teachers and, thus, most essays were handwritten. The time and effort demanded to transform the written texts into a digital format led the research group not to include 2014 texts into CorIFA. The group considered that, despite transcriber training, there were risks of misspelling or grammar errors being modified by the person digitizing the documents or by computer spell checkers, leading to texts that would not reflect students' real English level. Since the experience of receiving paper-based texts did not facilitate the process of corpus compilation, from 2015 on, the texts have been collected only in digital format, where students fill in an online form.
In 2015, there was a compilation of texts written in controlled and uncontrolled time settings.
First, students submitted texts as part of in-class mock tests, to capture students' skills in writing under time constraints and with a proposed topic. Three were mock tests for B1 and B2 level students. All of them presented the same instructions regarding text production. Students had to write a 300-word essay (minimum) based on a set topic in 30 minutes. Since digital text collection worked well, the compilation process became standardized, and, from 2016 on, students have submitted texts with distinct registers through online forms, according to their proficiency level, as described in Table 1. Systematic corpus compilation has allowed the research group to keep a sound learner corpus, which will be described in the following section.
Before sending their texts, learners are asked to fill in a digital form through Google Forms with their information and to read a consent form for their participation in the research, with which they may choose to agree or disagree. This form comprises students' information in a way that helps researchers keep better records of participants' social and linguistic backgrounds, and specificities of the task. Such a consent letter is provided in the Appendix.
CorIFA is composed of texts written by undergraduate and graduate students from different courses at the Federal University of Minas Gerais. These students are registered in one of the five English for Academic Purposes (IFA) 19 subjects created in 2012 as part of a set of initiatives to expand and enhance the internationalisation process of the university. Students register according to proficiency levels, ranging from intermediate to advanced (B1-C1), following the Common European Framework of Reference 20 for Languages. As part of each subject's assessment, students from each level are required to write to a specific academic register (Table   1). Before being accepted in one of IFA subjects, students' proficiency level must be checked.  The corpus carries an array of academic registers from statement of purpose to research paper.
Students write their texts as course requirements, following the teachers' instructions in terms of number of words and topics. The registers have been gradually distributed, starting with, for example, statement of purposes or summaries in IFA I and, ending with a research paper or literature review for IFA V. After students turn in the first draft text, teachers adjust the teaching of that register to their students' needs. Each subject design includes several exercises on each register and the opportunity for text editing. Students then submit a second and/or third draft that is edited and graded. For the corpus compilation, these texts are categorized into unedited and edited versions.
One of the text variables for CorIFA is length in words. IFA teachers may determine text word ranges based on their experience with the students' level and on register characteristics. Average word length is kept as presented in Table 2, which also shows the total number of words per register and in the whole corpus. The corpus shows (Table 2) great differences among registers as far as word average length. It consists of six written registers, each ranging in length from 225.39 to 1,564.50 words. The one which surpasses all the others in terms of length in words is the research article. This reflects the reality of the linguistic features among registers, especially related to their nature, which includes physical mode, setting, production circumstances, etc. (Biber & Finegan, 1994), in which some registers do require more words than others. Moreover, since writers must include several sections in a research article, which Biber and Finegan (1994, p. 131) call "standard four-part organization (Introduction, Methods, Results and Discussion -IMRD)" the number of words will certainly surpass other registers in our corpus. The shortest type of text produced by CorIFA participants was the abstract. This register, which could also be understood as part of a research article, has a clear communicative purpose to sum up the article, presenting its aims, methodology, results, and conclusion. Oftentimes, journals and conferences limit the number of words in an abstract (Swales & Feak, 2009) imposing on the writer the need to be concise.
Another essential aspect carefully planned during corpus compilation was to account for texts per academic level, a learner variable, since participants may either be undergraduate or graduate students (see Figure 1). As the IFA subjects are elective, students at both academic levels can be registered.

Figure 1 − Students per academic level and area
From the first semester of 2016 on, the number of collected samples was quite higher than in the previous year, a situation that remains. The reason that 2015 had the smallest number of samples is due to the compilation process, which were through mock tests, as previously mentioned. Not all students produced the task, many did not give consent to have their texts included in the corpus and some texts did not achieve the minimum words. Therefore, fewer texts remained to be part of the corpus.
The greatest number of samples in the corpus comes from students enrolled in courses from the following areas: Physical Sciences and Engineering and Biological and Health Sciences.
Humanities and Arts, on the other hand, constitute the discipline area with the smallest number of samples (see Figure 2). This corpus characteristic is very likely due to the total number of students from Humanities and Arts and Social Sciences and Education enrolled in English for Academic Purposes disciplines being considerably lower than those in the Biological and Hard Sciences fields.

Figure 2 − Texts per students' discipline area
Another characteristic related to the corpus design is its potential for longitudinal studies, as its data can help researchers better understand the relationship between students' writing development and proficiency level. There is a paucity of learner corpus studies from a longitudinal perspective (except for Biber et al., 2020;Goutéraux, 2013;Littré, 2015;Meunier & Littré, 2013). Up to 2019, 217 students submitted texts for more than one semester. Among these, 197 submitted them for a period of one year (two semesters), and 20 for more than one year (3 semesters or more). Interestingly, CorIFA data may come from students who started taking IFA classes as undergraduate students and continued to register after starting a graduate program.
Compiling a learner corpus in a pedagogical context has been challenging, since, for a couple of years, data collection procedures changed, as described. After establishing consistent compilation parameters, the corpus grew steadily. Its task and learner variable complexities allow for a multitude of investigations, some of which will be shown in the next section.

Studies based on CorIFA
In this section, we survey the backdrop of studies that employ CorIFA in their research. In the past six years, since the beginning of its compilation, the corpus has been a rich linguistic database for Brazilian researchers. Hitherto, research has mainly centered on learners' language description and on contrasting learners' written interlanguage with data from other corpora using CIA (as pointed out in section 2). Relying on two main academic genres: argumentative essays and abstracts, the studies focus on the understanding of learners' use of English as they are at different proficiency levels and also on detecting their underuse or overuse of specific linguistic features. Most studies use a reference corpus composed of well-evaluated non-native speakers' essays or native-speakers' texts. The topics encompass linking adverbials, collocations, thatclauses, conjunctions, noun phrases, and passive constructions. For organizational reasons, first, we bring an overview of one descriptive study and five investigations that fall under the CIA perspective. This section ends with a CorIFA-based study that highlights a pedagogical intervention, leading to reflections on applications of learner corpus research.
Focusing on the interlanguage itself, Queiroz (2019) explores CorIFA deeply to shed light on novice writers' use of noun phrases. These phrases have been regarded as a common linguistic feature in expert academic texts, mainly research articles Parkinson & Musgrave, 2014;Gray, 2015;Biber & Gray, 2016). To investigate the grammatical complexity of noun phrases (NPs), the study analyzes general topic essays and specific topic essays 21 from a CorIFA subcorpus and provides a thorough description of pre and post-modification of the types: adjective + noun and noun + prepositional phrase. Two of the study's findings are particularly noteworthy. First, surprisingly, the subcorpus analysis of upper-intermediate level texts revealed a higher use of complex NPs (59.3%) than simple ones (35%). Second, NPs were more frequent in the specific topic essays, which is interpreted as quite positive writing practice as complex 21 "General topic texts were those in which the EAP instructor presented a topic or question, e.g., Does technology make us more alone? to all students to write an argumentative essay about. Many of these topics were similar to the ones used in English proficiency tests. On the other hand, specific topic texts were those in which students were allowed to choose a topic of their preference to write about. Many wrote essays about their graduate studies, such as one dentistry student who wrote about periodontal disease and premature delivery." Queiroz (2019, p. 57). .
NPs are characteristic of academic registers: "it can be assumed that Brazilian learners due to their proficiency level [B2], the academic context of writing, and the probable contact with specialized texts in English from their own disciplines, are capable of using structurally complex and compressed phrasal structures, often characteristic of professional academic writing" (QUEIROZ, 2019, p. 112). Above all, this last result shows that discipline-specific tasks can propel writing at the university level that is more suitable to the academic context. This research makes evident the potential of descriptive corpus-based research to contribute to applied linguistics, in this case, EAP. The corpus design allowed a task variable (specific topic versus general topic task) to emerge as the one that affected NP use by Brazilian upper-intermediate level university novice writers.
Three of the CIA studies carried out based on CorIFA are on learners' use of conjunctions, which have been considered key features in text coherence (Halliday & Hasan, 1976;Chen, 2006;Liu, 2008;Zihan, 2014). Yet, these linguistic features "are not always needed and (...) they have to be used with discrimination" (Altenberg & Tapper, 1998, p. 80), posing difficulties to learners as they "tend[s] to vary from one language and culture to another" (Altenberg & Tapper, 1998, p. 81 (2018) contrasting connectors (but and however), they all detected either substantial quantitative differences (underuse or overuse as compared to LOCNESS or MICUSP), sentence position disparities, as well as discourse function inadequacies on the part of the learners. For instance, the marker so is used three times more in CorIFA than in LOCNESS, assuming beginning sentence functions of initiating a topic or announcing that an idea is going to be presented again (DUTRA et al., 2019), which have been attested as oral discourse markers (Carter & McCarthy, 2006). An interpretation shared by the three CorIFA investigations based on learners' overuse of conjunctions in sentence initial position is that register awareness needs to be better addressed in the Brazilian university context. corpus-driven linguistic analysis like this one and the study on verb variation and collocations can inform teachers in their practice.
The studies with our learner corpus described up to this point show the extent to which specific corpus analysis can shed light on the understanding of learners' linguistic needs, considering the register they wrote, their proficiency level and type of tasks. The ultimate general goal of developing such studies is catering to learners' exact difficulties because more precisely designed activities can be prepared and the course syllabus redesigned. A good example of a combination of interlanguage analysis and classroom activities was developed by Alves and Pinto (2018). First, the study investigated how results and conclusions were reported in abstracts in two apprentice corpora: CorIFA and MICUSP. CorIFA analysis provided real examples of learner language and the access to MICUSP allowed students to experience a corpus linguistics pedagogical practice: Do-it-Yourself (DIY) corpora (McEnery et al., 2006). Students compiled their own study corpus and were able to raise their awareness on how to improve the final two rounds of abstracts. MICUSP was a suitable corpus for DIY since they are formed by wellevaluated university papers and the online framework allows for the user to choose disciplinespecific texts.
The agenda for the future is promising and shall contemplate among other issues, advances in research methodologies, granting better access to students' interlanguage, longitudinal studies and description variation across registers and discipline. Furthermore, it is of utmost importance that DDL activities (Johns, 1991) are more present in the language classroom, providing information on their advantages and limitations, and thus transferring research data into direct application, as in Almeida et al. (in press).

Conclusion
The overall aim of this paper was to present the rationale behind building an academic learner corpus, making the case that such a process is paramount for revealing traces of learners' written interlanguage. The main principles regarding corpus compilation were presented associated with the design criteria adopted for CorIFA. After that, we outlined the methodological procedures that were followed in the compilation process. The texts included in the corpus and the register associated with the students' proficiency level were also explained indicating a vast range of topics for future studies.
Subsequently, descriptive and CIA research based on data from CorIFA were presented to illustrate the contributions that the corpus has already provided for researchers interested in learner interlanguage. The studies pinpointed in this paper serve to reinforce the claim that compiling and observing a learner corpus can be an invaluable resource for language teachers keen to enhance their understanding of learners' output, enabling them to make more accurate pedagogical decisions for their classes.