The Other’s Voice in the Co-Construction of Self-Reference in the Dialogic Child / A voz do outro na coconstrução da autorreferenciação na criança

 ABSTRACT Bakhtin’s deep insights on dialogicality resonates with views of language acquisition as a multimodal, situated, interactive process grounded in everyday experience and reverberating the voices of the care-givers. Drawing on a longitudinal videoethnography of French parent-child interactions in family life over a period of seven years, this study documents how the child’s language development is co-constructed through interactive tellings and retellings of activities and events permeated with multiple perspectives. Our choice of extracts will exemplify how the others’ voices shape children’s unique identity as speaker and co-speaker grounded in the richness of their daily life. Through the experience of assimilating the others’ words, utterances, and every single form of multimodal expression, children appropriate our common treasure, language, but also learn the individual power of accenting their productions with their own


Introduction
In the following excerpt taken from Guillaume's longitudinal data (BRIGAUDIOT & NICOLAS, 1990;MORGENSTERN, 2012), the little boy refers to himself with the second person pronoun:  Guillaume has just swallowed a peanut he has seized out of the bowl quickly during the adults' apéritif and gazes at his mother with a mischievous smile. GUIL: T'as avalé encore! [You've swallowed again!] He uses an angry tone of voice.
By eating a peanut (without chewing it), the child is « misbehaving » and not following the adults' directives. He expresses his awareness of the danger by replicating his mother's rebuking words and tone. A similar use of the second person pronoun is also documented when children between two and three run alone on the pavement, stop just before crossing the road, and say out loud "you don't cross the street." This also reminds us of situations when children between one and two years old say "no" or shake their head as they are about to touch a dangerous object. Those types of productions are extracted from a fixed scenario with authoritative adult speech (BAKHTIN, 1981) that children have memorized in similar situations. They use their auditory memory for which a situation is associated to speech. Guillaume here applies the formula 'tu + predicate' to a specific situation, which could be considered to be a pronominal reversal as he should have used je (I) to designate himself. It is as if he were snatching the mother's utterance out of her mouth and borrowing her role in the dialogue, as if the most important were not who the speaker was but that the utterance be spoken out of an existing script. The child does not create an utterance, but recycles it because it applies to the present situation. He has assimilated the voice of the adult encountered repetitively in previous circumstances. By taking on the role of his mother, Guillaume samples the taste of authority and power endowed by her words and the personal pronoun she uses to address him. But those words are re-accented (BAKHTIN, 1990)  Bakhtin's deep insights on dialogicality (1981; 1986) resonate with views of language acquisition as a multimodal, situated, interactive process grounded in everyday experience and reverberating the voices of the care-givers. In this approach, children's entry into language is guided by the language that surrounds them and is also very much triggered by their eagerness to imitate their conversational partners (GOPNIK; MELTZOFF;KUHL 1999). In line with Vygotsky's theory of learning (1934;1978), child language development can be viewed as a dynamic social process co-constructed in time between collaborating adults and children. Throughout their infancy, children participate in exchanges which are strongly scaffolded (BRUNER, 1975) by their caregivers. In Western cultures (see Ochs, 1984 among others for adult-child interactions in other cultures), adults and children attune to each other and eagerly maintain their interaction thanks to the adults' adaptation to the child's motor, cognitive and linguistic development. Adults sometimes speak for children as well as to them as they verbalize every possible visual, vocal, actional, tactile cue enacted by the child to co-create dialogue. Those interactions are thus grounded in situated activities, and adults express what the child is holding, what the child is looking at, what just took place, whether the child looks or sounds hungry, tired, happy, irritated, in order to integrate every gesture or vocalization in rich interactive sequences. Recurrent activities associated with verbal productions form scripts (NELSON, 1981) enabling the child to "experience language" (OCHS, 2012) in their everyday life. Children's first productions are permeated with echoes of the constructions heard in the adults' productions. In order for them to actually learn linguistic constructions (TOMASELLO, 2003), be they sound patterns, gestures, words or multimodal constructions, children must repeat and manipulate the forms, play with them -with others and on their own -to test a wide range of sounds and prosodic patterns, or gestural configurations and movements. Children progressively internalize the adult's role and appropriate the linguistic tools, social codes and behaviors used in their community in and thanks to dialogue.
Drawing on a longitudinal videoethnography of French parent-child interactions in family life over a period of seven years (MORGENSTERN; PARISSE, 2012), this study documents how the child's language development is co-constructed through interactive tellings and retellings of activities and events permeated with affect and stance. Our choice of extracts will exemplify how the others' voices shape children's unique identity as speaker and co-speaker grounded in the richness of their daily life. Dialogism will be illustrated by our multimodal analyses of sequences in which children seem to reverse roles as they perform the other's actions or voice the other's cue in the scripts that are deployed during playful recurrent mundane activities championed by Bakhtin in his writings. We will try to show how as their productions resonate with alterity (second or third person instead of first person), children express their own identity through other standpoints, other perspectives, other "voices" in dialogic contexts. We will focus on the moments when the children's utterances when they are self-referring to their own past and present accomplishments are "filled with echoes and reverberations of other utterances" (BAKHTIN, 1986, p.91).
Our detailed study will be pervaded by our own dialogic relationship with Bakhtin. The transmission of his approach is in itself an interpretative construction. It is shaped by the diffracting influences resonating in the multiple voices of the authors inspired by Bakhtin, who have influenced our own usage-based, multimodal, dynamic, interactive approach to language development.

Theoretical Framework
Human beings mobilize their representational skills and combine semiotic modalities in order to co-construct meaning, to refer to present and absent entities and events, to express intentions, desires and feelings. As shown by Vygotsky, interaction is a crucial locus for children to develop such cognitive and linguistic skills, which are socially co-constructed between collaborating partners within a cultural context (1934).
The scaffolding role of adults (WOOD; BRUNER; ROSS, 1976) is paramount in the development of children's interactional competences. Scaffolding involves cooperation between adults and children in order to facilitate the children's participation in interactional practices and help them learn to use available semiotic resources so as to coconstruct meaning within their cultural and linguistic community. Children's understanding of new entities is often first mediated by their interlocutors' affective display, especially through facial expressions (EKMAN, 1984). Such instances of "social All content of Bakhtiniana. Revista de Estudos do Discurso is licensed under a Creative Commons attribution-type CC-BY 4.0 referencing" (KLINNERT et al., 1983) constitute "affective frames" (OCHS; SCHIEFFELIN, 1989) that are fundamental to children's cognitive and linguistic development. Sensory experiences and situated activities also play a decisive role in children's language development as "meaning comes about through praxisin the everyday interactions between the child and significant others" (BUDWIG, 2003, p.108).
If children's entry into language is guided by the language they receive, it is also very much triggered by their own eagerness to imitate their conversational partners (GOPNIK; MELTZOFF; KUHL, 1999). While children take up and imitate the forms produced by their parents, parents also take up the sounds and movements produced by their children, endowing them with meaning and intentions, thereby shaping them into patterns that are compatible with the adult communicative system. Joint parent-child actions and interactions scaffold children into simultaneously understanding situations and talking about these situations. They learn to understand language and action together, each offering support for the other. In order for them to learn linguistic constructions, language practices and genres, children must repeat and manipulate the forms produced or recast by adults and play with them repeatedly across a variety of situations so as to mobilize them in a productive manner in situated interactions as well as when they are playing alone. During their first years of life, children experience the multiple actions that can be achieved with language: "they learn how to use language as a tool to elicit attention, to establish relationships and identities, to perform social actions, and to express certain stances. All this is part of being a speaker of a language" (OCHS, 1990, p.358). The mundane activities of children's everyday lives allow them to not only experience language but also to progressively language their experience, i.e. produce motivated, conventionalized language forms (sounds, words, tones, gestures) to communicate. They learn to filter experience and shape it, refer to it, index it into language forms: adults' and children's diverse multimodal enactments of language constitutes "modes of experiencing the world" (OCHS, 2012, p.149) through sound, visual and embodied forms.
Children first language about perceived objects and events that they participate in. They also progressively language about objects and events that are not perceived and experienced in the here and now but that they have perceived, manipulated, ingested, liked, disliked, been part of, in their past experience. Children thus use vocalizations, words, gestures, syntax, multimodal constructions and discourse to co-construct with interlocutors references to objects and events that are part of their daily lives (MORGENSTERN, 2014) but also to index and reconstruct past events, memories or create imaginary worlds that can only be captured through language thanks to specialized constructions such as verbal forms (see Parisse, De Pontoux & Morgenstern, 2018 on the use of the French imparfait / imperfect past tense). The subtle architecture of sensory perceptions, mental states and processes can be progressively expressed through fine linguistic constructions. However, even if it can index experience, language does not render its multilayered complexity. Language is but an incomplete representation of the world, of individual practices, objects or events in which it is embedded and that it attempts to capture (BOAS, 1965, p.189) through condensations of meanings. If language shapes experience, as proposed by Boas (1911), Sapir (1921Sapir ( , 1927, Whorf (1956), Gumperz and Levinson, (1991) or Lucy (1997), language can also create worlds of its own, out of our remembrance of things past, our dreams or projects and the figments of our imagination. Those worlds are inhabited and shared with others thanks to the discursive practices we perform with all the semiotic resources at our disposal. Children can internalize the language to which they are exposed, they can extract form-function pairings and use them with sensitivity to the pragmatic and dialogic context (HALLIDAY, 1967). Their mastery of language is marked by how freely they combine constructions and produce utterances that are accepted and understood by their interlocutors in context through negotiation of meaning as part of the social practice of interaction (GUMPERZ; LEVINSON, 1996). Children's daily engagement in interactions occurs in a variety of contexts and with an increasingly wider range of interlocutors thanks to whom they learn how to manipulate reference to self and others.

Ecological Data
In order to capture language development with a true usage-based perspective interaction, as it is deployed in multiple ecologies, both in time (the moment-to-moment unfurling of an interaction) and over time (multiple recordings over several years of the same children in their family environment). As early as the 1960s, Sacks encouraged the use of video-recordings (SACKS, 1984; so as to capture, analyze and share sequences that unveil the structure of everyday practices. However, since the presence of the observer could be a source of interference -see Labov's (1972) "paradox of the observer"-, researchers must acknowledge, assess and integrate advantages and interferences of an observer's presence in their framework (CARONIA, 2015). Firstly, when everyday life sequences are captured, the repetition of rituals and routines helps counterbalance the effects of the researchers' intrusion. Secondly, when longitudinal data and film are collected every two weeks or every month from very early on, the familiar repetition of a researcher's presence in a child's home enables to build a strong relationship which can be integrated in the analyses (MORGENSTERN, 2009). Filmed families are not only the target of analyses, they are people with whom the researcher interacts, shares emotions and feelings, and this common ground built during the fieldwork strongly contributes to the analytical process and data sensemaking. Thirdly, although the recorded sessions only represent a small portion of the participants' lives, those snippets can help us capture sediments of their past experiences as they are reactivated in their daily activities and exchanges -what we could call their "habitus" as defined by Husserl (1982). They index multiple dimensions of their broader interactionallinguistic practices that can be replayed, transcribed, coded and thoroughly analyzed over and over, with a variety of perspectives thanks to the traces provided by video-recordings and the aligned transcriptions produced with specialized software.
Child language research is one of the first fields in which spontaneous interaction data was systematically collected, initially through diary studies (INGRAM, 1989;MORGENSTERN, 2009), and later by audio and video recordings shared worldwide thanks to the CHILDES project (MACWHINNEY, 2000). This data-centered method has allowed many researchers to confirm that in the course of their development, children make their way through successive transitory multimodal systems with their own internal coherence (COHEN, 1924). This phenomenon can be observed at all levels of linguistic analysis. Children's productions are like evanescent sketches of adult language and can only be transcribed and analyzed in their interactional context by taking into account shared knowledge, actions, manual gestures, facial expressions, body posture, head movements, all types of vocal productions, along with the recognizable words used by children PARISSE, 2007;MORGENSTERN, 2010).
Research in language acquisition has developed tools, methods, and theoretical approaches to analyze children's situated multimodal productions, as they provide evidence for links between motor and psychological development, cognition, affectivity, and language.

Self-Reference
First-person pronouns are a complex category for children to acquire. When they start referring to themselves as subjects, French-speaking children may use standard forms -"je" [I], "moi je" (contrastive I) but also non-standard forms -"moi" [me], "tu" [you], "il/elle" [he/she], name as well as bare predicates. The analyses of these uses provide us with valuable insights on how children creatively process the language that surrounds them and progressively acquire the tools that enable them to refer to themselves, both as speakers and subjects (MORGENSTERN, 2006;CAET, 2013). First and second person references are expressed through pronouns in adult French, and children need to understand that they refer to the conversational roles of speaker and addressee. French-speaking children use a variety of markers to refer to themselves until around 24-30 months (BRIGAUDIOT; NICOLAS; MORGENSTERN, 1994;2006;CAET, 2013), such as the null form, filler syllables, the child's name, the strong pronoun "moi," (MORGENSTERN, 2003) and another person, second or third person (MORGENSTERN, 2003). Then, the first pronoun progressively replaces the other markers as children also master tenses, aspects, modalities, and a variety of discourse genres (NELSON, 1989). Referring to self and others requires sufficient knowledge of phonology, morpho-syntax, semantics, pragmatics, discourse, which explains why children need a certain amount of time (which varies according to the child) to use the pronominal system. We have illustrated how their use of different forms when referring to themselves and their interlocutor implies that each form is linked to specific functions and contexts (BUDWIG, 1995;MORGENSTERN, 2006;CAET, 2013).
All content of Bakhtiniana. Revista de Estudos do Discurso is licensed under a Creative Commons attribution-type CC-BY 4.0 Some children reverse first and second person pronouns (see Evans;Demuth, 2012, for English, andMorgenstein, 2012, for French). Reversals are rare in typical children but are very striking and a number of authors have studied the phenomenon as it induces "specific questions whose answers may shed light on the mechanism of pronoun acquisition" (CHIAT, 1989, p.383). They are also examined in language disorders (KANNER, 1943) and labelled as echolalia or as indicative of children's difficulties in understanding the reversible nature of pronouns and their pragmatic functions in dialogue.
Various factors can explain reversals including children's lack of semantic knowledge (BELLUGI; KLIMA, 1983), their straightforward imitation of the speech heard (Peters 1983), not understanding perspective shifting (LOVELAND, 1984), or the nature of child directed speech (OSHIMA-TAKANE, 1988;. In our data of typical children, pronominal reversal is not systematic and always occurs along with standard use, which indicates that they are in a transitional phase. Oshima-Takane (1992) suggests that children can establish the link between pronouns and speech roles if they see two speakers interacting and thus are made aware that the second person pronoun refers to the interlocutor. She thus highlights the importance of vision (see also Cole;Yaremko, 1993). Studies on blind children's acquisition of pronouns show that the use of the pronominal system is mastered late (FRAIBERG; ADELSON, 1973;SAMPAIO, 1991). Loveland (1984) emphasizes the spatial aspects of pronoun acquisition and how overhearing and seeing others in dialogue helps children to understand perspective shifting. Perez-Pereira's (1999) very detailed study however goes against the claim that blind children make many reversal errors. He does support the fact that "failure to observe pronouns in speech addressed to another person" could have an effect but he also analyses the impact of the "large proportion of directives and requests used by mothers" (PEREZ-PEREIRA, 1999, p.677), which hinders some of the blind children in his study in their use of the standard forms. Blind children can compensate for their lack of vision with vocal scaffolding. They develop the capacity to internally locate speech partners and overhearers or other protagonists referred to, and like sighted children, they can thus distinguish their roles in dialogue as "persons" (with an interpersonal and reversible relationship) or "non-persons" -being excluded from the interpersonal relationship and from interlocution- (BENVENISTE, 1966), thanks to their auditory perception. Sighted children use all the semiotic resources at their disposal to learn those differences. The acquisition of reference to self is anchored in dialogue. In order to analyze children's productions with a dialogic perspective, we must take into account the discursive and situational context, in relation to the language that is provided by the adults, both in terms of forms and functions. We study reference to self in parallel to reference to the interlocutor both in the children's and in the parents' productions. When children learn how to refer to self, they can rely on formal and functional clues derived from the forms parents use to refer to the child as interlocutor and grammatical subject but also to refer to themselves as speaker and grammatical subject as well.

Data and Method
In this paper, I focus on snippets of the data extracted from the Paris corpus that were collected within the ANR ColaJE project and are accessible online (CHILDES project, MACWHINNEY, 2000;ORTOLANG, MORGENSTERN;PARISSE, 2012).
The funding of the project was used to collect new French data, improve researchers' transcription and coding systems to enable them to study the emergence and development of grammatical patterns used by children between age one and seven, and compare child and adult speech. The programs brought together specialists from various fields of language acquisition in order to study language development in the same longitudinal corpus from a multimodal and interdisciplinary perspective. The analyses aimed to find regularities in acquisition for each child and across the children.
The children have middle-class college-educated parents, and were filmed at home about once a month for an hour in daily life situations (playing, taking a bath, having dinner). The transcriptions were done in CHAT format, thus enabling the use of CLAN software tools for analyzing and searching the data (Mean Length of Utterance; word frequency; number of word types and word tokens; morphological categorization; word and expression search). The transcriptions were aligned with the video-data and could be analyzed with a variety of computerized tools. For the purpose of this study, we used a more reader-friendly format: we provided the French production, its English translation between brackets and gave non-verbal information in italics.
I selected the extracts presented in the analyses out of extensive previous studies on reference to self and others conducted on the Paris corpus (MORGENSTERN, 2006;CAET, 2013, CAET;MORGENSTERN, 2015) in order to illustrate how children's development of self-reference resonates with Bakhtin's work.

Analyses 1 of References to the Child
Children are described as manipulating language and dialogue in self-talk in which the voices of their care-givers permeate their own speech. According to Vygotsky (1978), the social dialogues children engage in during make-believe play are internalized as self-regulatory inner speech. A Bakhtinian framework can be useful when examining heteroglossic uses of pretend play which is part of children's practice of adult social roles as they explore meaning-making possibilities in fictive worlds in which every word, every multimodal construction, every utterance resonates with other words, gestures, facial expressions, utterances used repetitively around them.
However, I would like to show that this framework can also be valuable outside children's make-believe play, in their everyday life. Throughout their interactions with their parents and as they participate in very mundane activities, children have opportunities to get familiar with, replicate, recycle, and appropriate a variety of perspectives which are specifically marked in the use of personal reference. All their multimodal utterances are the product of the variety of voices (produced with different semiotic resources) that have surrounded them from the very beginning of their lives.
My analyses will begin with an example taken from the Forrester corpus (FORRESTER, 2008) to illustrate the scaffolding role of parents in the development of reference to self and other. The father takes up his daughter's gesture, which could be interpreted as not being intentional and communicative at all, and transforms it into a game that serves as a transition toward meaning. In turn 1, the father explicitly addresses Ella by using the second person pronoun.
As she whimpers and gestures without producing articulate speech, he formulates in turn 2 his own interpretation of her behavior "oh a little bit" from his own perspective. He is taking an active part in the dialogue, compensating for his daughter's lack of expertise in her manipulation of speech. He then illustrates how he treats their interaction as a live communicative experience and responds to all her multimodal productions. He takes up what seems to be a non-intentional non-communicational gesture and transforms it by shaping it into a conventional pointing gesture, through which he can designate alternatively his own head (turn 4) and his daughter's head (turn 5). He has changed it into a social gesture which is part of the string of routinely used pointing gestures of the various members of the family, which Ella will take up and replay herself in the following sessions in the data. Interestingly enough, he uses a third person perspective to refer to his child and to himself, enabling them to share the same external perspective, outside the reversible interpersonal realm of dialogue. This use of third person pronouns and names to refer to themselves and their children is very common in Child Directed Speech in Western cultures. Children replicate this use in specific contexts as the next examples will illustrate.
While he was taking his afternoon snack after his day at nursery school, his parents asked their little boy questions about his day away from them. At 2 years old, he has the ability, thanks to their scaffolding, to tell them that he has made sculptures with clay, to list what he has eaten and who he has played with. A little later, as he is getting ready for his bath, Léonard becomes somewhat aggressive with his mother as she kisses his arms and he plays at removing the kisses with his hand and says "no kiss on my arms." He then starts very abruptly to narrate an event that happened during the day: The child re-enacts the event as he punctuates his speech with gestures. This enables his parents to have access to a visual rendition of the situation. However, in Léonard's production, a distance between the child in the bath now, and the protagonist described in kindergarten is marked through the use of the third person. Interestingly enough, his gaze has become vague (turn 1) as if the scene was being visualized in his mind. This use of gaze is quite similar to what Cuxac (2000) analyses in narratives in French Sign Language. Signers, when engaged in narratives do not gaze at their addressee as they do not embody their own self but characters in their narrative. 3 In the original: "Example 3 -Léonard, 2;02. / 1. LEO: L'a dit pan à David. Leonard's gaze is turned towards the sky, away from his mother. In turn 6, Léonard clarifies the identity of the two protagonists after his mother's question: "David from kindergarten." He uses his own name, Léonard (turn 7): "Léonard, (he) said pan to David." By using his name instead of the pronoun "I" (which he does employ at this age in other contexts) he can share his parents' viewpoint. Léonard is split into two identities, narrator and protagonist, and separates the two roles by the use of the third person pronoun. However, he does mime the scene (turns 7, 11). The body of Léonard the speaker (including his arm hitting in the air) enables him to show his mother the protagonist Léonard in his narrative. This is similar to what happens in narratives in Sign language; the signer's body becomes the character of the story, which is called "personal transfer." All of Léonard's uses of self-references were analysed (MORGENSTERN, 2012).
He uses "il" (he) each time he tells a narrative in which he portrays himself as naughty, not only being aggressive with David, but ripping a book apart, breaking a toy or jumping too hard on his bed. Thanks to his use of the third person perspective, his verbal production marks a distance between the speaker and the naughty protagonist he is depicting.
At the same age, instead of using the first person pronoun, Anaé also sometimes refers to herself as subject with her name, which gives a third person perspective to her production as she becomes the target discourse object (CAET; MORGENSTERN, 2015).
In the following example, she is hiding and her use of her name "Anaé" imbues her turn with the perspective of the person looking for her, instead of the speaker's viewpoint. The child's nonstandard use of the personal system is not only marked by her production of her name to designate herself, but also by the use of the masculine pronoun "il" (he). This could be an overgeneralization of the unmarked pronoun in the child's system (cf. Greenberg, 2005) as she has two brothers and might hear the pronoun "il" more frequently than "elle" in playful situations. She thus recycles forms her brothers and adults use in similar situations and does not shift perspectives as older children and adults do in similar situations, even though she does use first person pronouns in other situations, when she makes requests or describes every day activities she is engaged in (CAET, 2013).
The same lack of role reversal can also be noted in narratives of past events in which her productions manifest a third person perspective and she is depicted as a character in the story. In the example below, she reformulates her mother's utterance, as they narrate an excursion by train to the zoo. Anaé did not often take the train and it is staged as a salient event. Another occurrence of a salient event occurs when, just like Léonard, she referred to herself as the author of a mischief (she had torn part of the page on which Babouche, the monkey character was therefore not in great shape). the speaker (herself) with the agent of the actions she refers to, as if she were echoing her mother's voice and perspective to refer to herself (MORGENSTERN, 2006;2012). She does not integrate speech-roles and shifting references in her own speech and maintains a third person or "neutral" perspective thus recycling speech she has heard previously in the language that surrounds her.
Anaé's mother also sometimes used the third person perspective to designate her daughter in special circumstances. The following example illustrates how she used the child's name and the third person when Anaé has managed a great feat: she has climbed all the way up to the top of the slide by herself. In example 8, Anaé's mother seems to be addressing a doll who is not endowed with the ability to reverse roles as speaker rather than a person as she uses the pronoun "on" (one), the nominal expression "cette petite fille" and the third person pronoun "elle." She is creating a fake perspective which reinforces the negative evaluation marked by the adjective "sale" (dirty) and the demonstrative "cette" (that) in "cette petite fille" as well as the interjection "wah" (wow). The use of the third person enables the mother to take 7 In the original: "Example 7 -Anaé 1;11 1. FAT: tu viens ? Anaé walks towards the slide.

FAT: tu montes ?
Anaé climbs up the stairs of the slide and sits on the top. 3. MOT: oh la la qu'est-c(e) qu' elle est grande Anaé ! 8 Example 8 -Anaé 2;2 1. MOT: hop on essuie la bouche. 2. MOT: Oh la la, oh la la ! Elle est sale cette petite fille. wah qu'elle est sale Anaé protests." All content of Bakhtiniana. Revista de Estudos do Discurso is licensed under a Creative Commons attribution-type CC-BY 4.0 on a third person perspective, outside interlocution. She is thus able to indirectly address her comments to her daughter without explicitly directing them to her. This permeates her production with fake objectivity. There is pretend play in this scene, not through a pretend situation, but through changing the genre used. Instead of placing herself in an interpersonal sphere, the mother creates a distance and places her own speech outside the traditional speech roles conventionally used in dialogue. She highlights her role as an adult who can say "on essuie la bouche" to the dirty little girl because that little girl is presented as an object of shared attention with an external perspective.
Children need to learn that when they are the speaker, they have to use the specialized form, first person pronoun and reverse perspectives instead of the second person pronoun used to refer to them when others speak to them. Anaé rarely designated herself with the second person pronoun. But in example 9, she is echoing her mother's uses. She is trying to perform a somersault.
All content of Bakhtiniana. Revista de Estudos do Discurso is licensed under a Creative Commons attribution-type CC-BY 4.0 herself with the first person pronoun as in turn 2, she still congratulates herself for her own accomplishment -she has succeeded to put on her own slippers, without her mother's help and even provides an explanation on how to manage the slipper -using her own name, turn 6 "Bravo Anaé!". She takes on her mother's voice, which provides a distancing between the speaker and the target of her praise. She masters both the actions and the linguistic script related to her accomplishment. In all these contexts, the children use a second, a third person pronoun or their name and speak about themselves with the others' voice, taking their interactional role as if they were the addressee (CHIAT, 1986). Children progressively learn which pronouns to use in which contexts. Even when they are recycling someone else's words, they find out from their experience with language in interaction that they need to use je (I) to refer to themselves as speakers.
As they experience and internalize conventional language, children learn to manipulate the various forms experienced in the adults' productions. The forms they use are part of a sequence of transitory systems as they develop the adult conventions. Parents' productions model those of the child and accompany them into the adult system (CLARK, 2003). As they interact and witness interactions, children learn to use the pronominal system with its phonological, cognitive and pragmatic constraints.
At the end of our longitudinal data, all the children of the Paris corpus have become skillful multimodal speakers. As illustrated in the following extract, Madeleine is even able to make reported speech and take on several perspectives and roles. She uses the first person pronoun to refer to herself but also goes beyond the principle that "I signifies the person who is uttering the present instance of the discourse containing I." (BENVENISTE, 1966, p.252). 11 She can recreate a reference to past events and past discourse thanks to her own languaging as she incarnates the voice of her mother in reported speech.
Example 11 -Madeleine 6;11 speaking to OBS, the observer 12 1 MAD: One day Mummy looked at her telephone and went <oh my! The child has stopped gazing at her interlocutor and places her hands in front of her to mime the situation as if she had a telephone. When she says "ah mince!" she brings both her hands in front of her mouth. She then gazes back at Martine as she performs the next utterance.
2 MAD: Because she had prepared everything, we had given out the invitations and she goes <oh my, I have an appointment at exactly the time we planned your birthday. her gaze turns away from OBS and then gets back to her just as she says "your birthday." Her facial expressions reproduce the expressions she attributes to her mother, with small head movements and exaggerated prosody. 3 OBS: Yes 4 MAD: In fact, she tried to solve the problem and in fact it's one of her colleagues who is going. MAD makes a cyclic gesture at the level of her head with her two hands as she says "régler" (solve). 5 OBS: Oh so she'll be able to be here. 6 MAD: Because she was going on < no but I want to see your friends, see. MAD changes her voice, her gaze turns aways from OBS, her facial expressions are the nervous expressions she attributes to her mother. 7 OBS: Good! 8 OBS laughs 9 CHI: I want to be there She continues to use her facial expressions and small nervous beat gestures with her hands.
The first occurrence of reported speech attributed to Madeleine's mother (1) is not introduced by a quotative verb but is marked by non-segmental markers. The child uses pitch and gestures that she accentuates in her own way, replicating, but with her own accentuation, her memory of her mother's past speech and postures to indicate her role shift. Her gaze turns away from the observer (turns 1 and 2) and is targeted at her hands "holding" the telephone, her facial expressions are exaggerated and indicate that she has entered the narrative space. Madeleine is playing her mother's role addressing herself, inside the specific space she is recreating. In order to recreate herself as her mother's interlocutor, she then gazes at the observer (turn 6) as she says "your birthday." The conjunction of gaze on the observer and of the second person "your" attributes the role of little Madeleine, witnessing her mother's consternation as she realized the conflict with the birthday party on the agenda in the telephone. Through Madeleine's reenactment of the event represented through speech, gaze, gestures, facial expressions, the observer is made to be Madeleine as Madeleine has become her own mother.
Multimodal constructions are automatically recycled by Madeleine as with the "han mince" (oh my) in turn 2 complemented by the hand gesture on the mouth and the facial expression, or the very sophisticated gesture in turn 4 that complements the verb "régler" (solve).
Throughout this sequence, Madeleine manipulates first and second person with great expertise and in great harmony with her use of gaze. She can be herself, Madeleine, the child, narrating the story, referring to her mother with a third person pronoun and gazing at the observer, or she herself can incarnate her mother, turning her gaze away from the observer unto her telephone and using the first person pronoun, or she can build a new version of herself directly in front of her with second person "ton" (your) and gaze at the observer, in the reconstructed scene of the event she is creating with all the semiotic resources at her disposal.
The child has internalized the adult's role and has appropriated multimodal tools, social codes and behaviors, which are intertwined in language, in and thanks to dialogue.

Conclusion
I have used a theoretical mix of Bakhtin's concepts intermingled with the other approaches that have influenced my work in language development to analyze some salient moments of children's developmental path towards the full appropriation of the conventional linguistic system to refer to self and others. Through a series of examples, I have tried to illustrate how role play and heteroglossic discourse are not only found in children's make-belief play or self-talk, but also in daily interactions between parents and children in very mundane activities which involve playful attitudes permeated with affect and perspective shifting. Bakhtin was one of the first theorists to champion the richness and complexity of mundane activities in ordinary interactions. In their daily engagement in dialogues with others, children develop the understanding of their social role(s) and learn how to use the conventional linguistic forms that are transmitted to them in the voices of others. Through the experience of assimilating the others' words, utterances, and every single form of multimodal expression, children assimilate our common treasure, language, but also learn the individual power of accenting their productions with their own voice.