From Birdsong to Speech: a Plea for Comparative Approaches

Human language and speech are unique accomplishments. Nevertheless, they share a number of characteristics with other systems of communication, and investigators have thus compared them to birdsong and the vocal signaling of nonhuman primates. Particular interesting parallels concern the development of singing and speaking. These behaviors rely on auditory perception, subsequent memorization and finally, the generation of vocal imitations. Several mechanisms help young individuals to deal with the various challenges during the time of signal development. Specific differences aside, astounding parallels can be found also in how a human and a particularly accomplished bird like the Common Nightingale Luscinia megarhynchos treat the experience of many different sound patterns or songs. As a consequence of such exposure, both human infants and young birds eventually acquire large repertoires of verbal or vocal signals. These achievements, however, require access to specific memory mechanisms which are well adapted to the purposes they serve, thereby allowing them to fulfil their species typical roles. With such aspects as a reference, birdsong is an excellent biological model for memory research and also an appropriate system for the study of evolutionary strategies in a very successful class of organisms.


INTRODUCTION
Inquiries into the mechanisms of human communication have raised questions about their biological roots and whether there are homologous or at least analogous mechanisms in animals too.Although there is no doubt that a large number of typically human properties, such as the ability and use of languages, are unique to humans, at least some of these properties may result from neural mechanisms which evolved during the biological history of man.To be challenged by this idea one must be inquisitive about the accomplishments of other creatures that seem comparable to human achievements.In this article, I will undertake such an endeavor and E-mail: todt@zedat.fu-berlin.decontrast some typical features of language development to a selection of features of the ontogenetic development of complex signal systems that have been documented in animals.I will limit myself to vocal signals that are acquired by individual learning.
Humans aside, such signals are known for some mammals, e.g.cetaceans and bats (Boughman 1997, Janik andSlater 1997), and for songbirds and a few other avian taxa, namely parrots (Todt 1975, Pepperberg 1999) and hummingbirds (Baptista andSchuchmann 1990, Jarvis et al. 2000).With the exception of birds, systematic studies on this issue are rare (review in Hultsch and Todt 2004a).Thus, it is expedient to select learned vocal behavior of birds An Acad Bras Cienc (2004) 76 (2) as a paradigmatic model, here, and to concentrate on recent findings about birdsong.As an introduction to the ways in which birdsong is specifically acquired, developed and eventually retrieved, I will first give a brief description of song organization.
Like human speech, the singing of birds can be described as a stream of behavior where acoustically filled segments alternate with silent segments (pauses).In birds, the most conspicuous segment that is well known to the human listener is the socalled 'song' ('strophe').In the typical case, songs have a length of about 3 seconds and are separated by pauses of about the same duration.Birds sing and listen to each other, and such time patterns are obviously an adaptation to modes of vocal interaction.From the perspective of information processing, song duration seems to be selected to provide optimal units of information.In human speech, a similar characteristic is given by the typical duration of sentences or phrases used in our spoken languages (Pöppel 1978, Vollrath et al. 1992).
The phonological organization both of songs and sentences also provides optimal units of communication (Fig. 1).Songs, like sentences, form an intermediate level of a structural hierarchy in which the highest level is given by an episode of singing or speaking.On hierarchically lower levels one can distinguish several structural compounds that compose the intermediate level.In top-down order, the constituents of songs include song sections, trills, motifs, syllables and elements or notes (Hultsch et al. 1999).In the same order constituents of sentences are single phrases, words, syllables and morphemes or phonemes (Bierwisch 2000).
In birds, the number of intra-song levels distinguished, as well as their phonological constituents vary a lot across species.Nevertheless, the basic level is always given by the so-called song elements or notes, respectively.Usually, this level serves as a basis for analysis, in which basic units are compared and classified according to parametric features such as measures of sound frequency and duration.The pool of classified song elements is then used to categorize songs and determine the repertoire of song-types (Todt 1968, Thompson et al. 1994).
Bird species differ in the size of their song-type repertoires.One extreme is given by Zebra Finches Taeniopygia guttata (Clayton 1987) and Whitecrowned Sparrows Zonotrichia leucophrys (Marler 1970, Baptista andPetrinovich 1984) in which each individual male typically develops only one single type of song.Some repertoires range around three to eight songs, for example, in Song Sparrows Melospiza melodia (Marler and Peters 1987) and Chaffinches Fringilla coelebs (Slater 1989); but they are larger in Common Canaries Serinus canaria (Nottebohm and Nottebohm 1978) and Starlings Sturnus vulgaris (Chaiken et al. 1993) and even greater in Common Nightingales Luscinia megarhynchos (Hultsch 1980).In spite of such species-specific diversity, the composition of vocal repertoires follows some basic rules: the sizes of element-type repertoires are larger than their song-type repertoires in most oscines birds (for special cases see Hailman and Ficken 1996).If we compare the relations documented for units and higher level compounds in human language, a crucial difference appears; a small number of basic units (e.g.phonemes) serves as a pool to compose an almost unlimited amount of verbal patterns, such as words or sentences (Bierwisch 2000).
Comparing birdsong to human speech is an immensely complicated enterprise, requiring a concentration on a subset of issues and aspects.My approach here will deals predominantly with three issues: (1) achievements of learning and development, (2) relationships between pattern structure and inter-individual interaction, and (3) finally the implications of large vocal repertoires.
Studies on the rules of song learning have been conducted in a great number of bird species (review in Hultsch and Todt 2004a), but most have concentrated on the intra-song level.Systematic investigation of how information is encoded at the inter-song level is currently available only for the Common Nightingale.This species is renowned for its outstanding vocal virtuosity that reflects a large memory capacity and enables adult males to perform An Acad Bras Cienc (2004) 76 (2) FROM BIRDSONG TO SPEECH
their about 200 different types of songs in a versatile style of singing (Todt and Hultsch 1998, for details see also Hultsch and Todt 2004b).The Common Nightingale will serve as a major reference in the following comparison of birdsong and speech.

LEARNING AND DEVELOPMENT
Comparing the acquisition of songs in birds and the acquisition of language in humans has a long tradition (Marler 1970, Marler and Peters 1981, Kuhl 1989) and was recently revisited by Doupe and Kuhl (1999).These authors highlight the following parallels as crucial to the development of birdsong and human speech: (1) both behaviors have to be learned to achieve the normal species typical properties; (2) such learning relies on auditory perception, subsequent memorization and imitation of sound patterns; that is, perception precedes the production of vocal material; (3) acquisition is easiest early in life and during sensitive periods, and apparently is guided by specific predispositions; and (4) vocal expertise is successfully reached only after passing through particular stages of development, wherein vocal practice plays an essential role.To supplement this list by a concrete example: at an age of about three months human infants produce only vowel-like sounds; at about seven months they begin their 'canonical babbling'.Then, at an age of about twelve months infants usually start to perform and use so-called 'oneword-utterances' which some time later develop into 'sentences' composed of two or more words (Weissenborn and Höhle 2000).The progression of these accomplishments can be compared to the succession of ontogenetic stages reported for the development of singing in birds, e.g.their subsong, their plastic song and eventually the use of their crystallized full song (for details see Geberzahn and Hultsch 2004).
There are some further similarities shared by the processes of language acquisition and the song learning of birds.Early in life, for example, both young birds and humans face a similar problem: instead of hearing a single auditory stimulus, they are exposed to many long sequences of vocalizations.Human infants search for cues that help to parse the sequences of words produced by their adult caretakers into segments.This strategy allows them to better identify and store information about particularly frequent segments, e.g.words or combinations of words (Jusczyk et al. 1992).We studied whether a young nightingale that has to cope with a similar task, may apply a similar strategy.To test this, nightingales were presented with long song sequences from which we had erased the silent intersong intervals and other cues that might assist in the recognition of songs as separate sequential units.The results were surprising.Imitations developed by the birds showed that they had no problems in correcting for the lack of such cues, and were clearly able to identify and memorize most of the tutored songs.This outcome suggests that nightingales have some kind of concept of their song, and thus may be better off than human infants when first exposed to an almost continuous array of auditory stimuli (Hultsch et al. 1999).But what about learning accomplishments on a higher level of song organization?
When humans are presented with a serial learning task they usually handle it by a maneuver called 'chunking'.That is, they memorize a string of different items, e.g.words or sentences, by splitting it into 'chunks' of approximately four units in length, which they can then process optimally in short term memory (Cowan 2001).Interestingly, nightingales are skilled in extracting and memorizing information about the serial succession of song-types experienced during their training, too (Todt and Hultsch 1996).There is evidence that the birds achieve this accomplishment by a process like 'chunking' called 'package formation', which also reflects properties of their short-term memory (for details see Hultsch and Todt 2004b).
In addition to such 'chunking' maneuvers, there are further parallels in the learning of humans and birds.For instance, if humans are exposed to a string of different items often enough, they are able to memorize and recall their serial order.Common Nightingales also do well at this task.At least 50 exposures are required for nightingales to imitate the serial succession of tutored songs, and with even more exposures their order gets consolidated both within and between packages (Hultsch and Todt 1992).Although the development of sequential associations among song-types has not yet been studied in other songbirds, we assume that it may play a role in other species.For example, several field studies have shown that a performance of three to five sequentially associated song types is indeed widespread across birds with song repertoires (Todt 1968, Lemon andChatfield 1971).This aspect of song acquisition needs to be investigated in other species, as well.

PATTERN STRUCTURE AND INTERACTION
The singing of birds and the performance of spoken language share some formal properties which are related to features of vocal patterns in the time domain and to the rules of pattern composition.As mentioned earlier, a 'song' ('strophe') has a duration of a few seconds in most bird species, and a similar time structure seems to hold for the sentences or phrases of human speech (Pöppel 1978, Vollrath et al. 1992).From a structural perspective, songs are composed of several phonological constituents, including particular types of elements or notes, syllables, motifs and phrases.Most sentences are also composed of different phonological constituents, but there is a crucial difference, which concerns the freedom of unit combination.Within songs, the flexibility of unit combination is typically rather constrained, contributing to the species typical structure of the compound.This contrasts with human communication, where a limited number of prosodic units ('units of articulation') is used to generate an almost unlimited amount of words, phrases and sentences ('units of interpretation', Bierwisch 2000).
However from the perspective of social interaction, songs and sentences serve remarkably analogous roles.The most common units of vocal interaction in birds are the songs, as revealed in the way songs are used during communication between territorial neighbors (Todt andNaguib 2000, Geberzahn andHultsch 2004).Similarly the most common unit of verbal conversation in humans is a sentence, and only rarely a single word (Cutler 1994, Cairns et al. 1997).During an ideal vocal interaction, i.e. when two individuals are mutually contributing to an exchange of signals, the mode of communication is not arbitrary, but follows certain time-and pattern-specific rules.The significance of such rules has been documented for both the verbal or nonverbal dialogues of humans (Burgoon and Saine 1978) and the vocal duels of songbirds (Todt and Hultsch 1996).We have suggested that, during such ideal interactions, the signals should be long enough to convey a given message, but at the same time not so long as to delay a potential reply.Inasmuch as human sentences and songs of birds, that are applied in an interactive context have a length of only a few seconds, such structuring can be viewed as an adaptation to a requirement for optimal vocal communication.

LARGE SONG REPERTOIRES
Humans communicate by an endless variety of words, phrases and sentences (Bierwisch 2000).Thus, any comparison of birdsong and speech should also address the issue of very large vocal reper-toires.As already mentioned, the size of song repertoires differs remarkably across oscine species.Two groups of hypotheses have been put forward to explain such diversity (Kroodsma 1982).One predicts that females should prefer to mate with males who sing large repertoires (Searcy and Yasukawa 1996), as shown, for example, for the Sedge Warbler Acrocephalus schoenobaenus (Buchanan and Catchpole 1997) and the Pied Flycatcher Ficedula hypoleuca (Lampe and Saetre 1995).The other emphasizes the role that repertoire size seems to play in malemale encounters, where large repertoires may allow the bird to specifically address several different neighbors by song matching (review in Todt and Naguib 2000).Such achievements have been documented for Eurasian Blackbirds Turdus merula (Todt 1968, Wolffgramm andTodt 1982), Song Sparrows (Nordby et al. 2002) and Common Nightingales (Hultsch 1980).
With regard to birds that develop extremely large repertoires the listed explanations suggest a brief comment.Evolutionary aspects aside (Buchanan and Catchpole 1997), the role of such repertoires in female choice and mating seems a bit puzzling.In Common Nightingales, for instance, it would be ineffective for a female to have to first listen to two songsters for many minutes in order to somehow 'count' their different songs and only then to choose, if one performs about 200 and the other one just 150 different types of songs.Therefore, it is appropriate to look for other cues that could be effective here.A detailed study of courtship and mating may offer some clarification; e.g. in terms of the question how much time females actually invest in their choice behavior before they are ready to mate with a given male.
In contrast to problems with the female choice hypothesis, very large repertoires have a clear advantage in the domain of territorial contests among males.First, they raise the chance of song-type sharing which is an essential prerequisite of vocal interactions between neighbors (Todt and Hultsch 1998).There is evidence that the proportion of shared songs is related to the spatial distribution of birds and is particularly high in males who settle close to each other (Hultsch and Todt 1981).Second, the amount and also the quality of interactions both have an impact on the establishment and maintenance of territories by the competitors.In addition, their interactions may be observed and evaluated by a 'third party' (Todt 1981, Naguib and Todt 1997, Todt and Naguib 2000).With this as a reference, a classical but often neglected hypothesis may be of interest.It refers to the song learning of birds and assumes that this accomplishment reflects a strategy which prepares a young male for the contests he might have to face later in life (Todt 1975(Todt , 1981)).In other words, a young male will benefit from a song memory which contains information about any song pattern a possible rival can sing It seems that this advantage also plays a role in the vocal imitation learning of parrots (Pepperberg 1999).
The development and use of large song repertoires leaves us with several interesting questions, and many of them are still open.One concerns the evolutionary history of song repertoires.A rather common view states that large song repertoires have evolved from small ones.This view is substantiated by 'economic' arguments saying that repertoires are costly and that such costs increase with repertoire size, because their development requires much time and energy and because their use needs heavy neural investment (Slater 1989, Nowicki et al. 1998).There is, however, evidence to counter these explanations.First, genetic studies indicate that those songbirds which obviously appeared early in oscine evolution may not have had small repertoires (Irwin 1988).Thus, the small repertoire of Zebra Finches could be regarded as a special adaptation.Second, early in life birds may sing more different patterns than after song crystallyzation (Marler andPeters 1982, Hultsch 1993).Such reduction in repertoire size taking place before birds reach the developmental stage of adult singing could point to a similar reduction during song phylogeny.In any case, it is clear that the question of repertoire evolution is not yet resolved, and merits further research efforts in the future.

CONCLUSIONS
A major aim of this paper was to demonstrate the value of taking a comparative approach to the communicative achievements of birds and mammals.I have contrasted some properties of birdsong to some characteristics of human speech and concentrated especially on (1) aspects of learning and development, (2) relationships between pattern structure and social interaction, and (3) finally, the implications of large vocal repertoires.Despite the many diversities, I have outlined some remarkable parallels between birdsong and speech, but to avoid misunderstandings, I wish to emphasize briefly two additional aspects.
First, the parallels in the accomplishments of singing and speaking can indeed be considered as evidence for similar solutions that both birds and humans have found for comparable problems, reflecting in turn basic mechanisms that operate in songbird and human brains.Nevertheless, it should be clear that such similarities do not refer to biological homologies.The evolutionary history of brain development in birds and mammals diverged early on, so that parallels in the development of singing and speaking or in the structure of songs and sentences only mirror analogous accomplishments.
Finally, adopting a comparative approach to singing and speaking should not at all obscure the specific properties of these accomplishments.In other words, although my paper was meant as a plea for comparative approaches, it was also intended to recommend further studies of those forms of vocal signaling that are unique to birds or to mammals.

ACKNOWLEDGMENTS
First of all, I like to thank Henrike Hultsch for her persistent and effective cooperation in our nightingale research.In addition, I appreciate the skillful help in hand-raising our birds, doing experiments or conducting data analyses provided by many people, namely H. Brumm, J. Cirillo, N. Geberzahn, S. Kipper, R. Mundry and F. Schleuß.I am particularly grateful also to Maria Luisa da Silva and Jacques Vielliard for inviting me to the IBAC conference in Belém.The study was supported by the DFG (Az: To 13/30).