Acessibilidade / Reportar erro

Automatic classification of written descriptions by healthy adults: An overview of the application of natural language processing and machine learning techniques to clinical discourse analysis

Classificação Automática de Discurso Descritivo Escrito de Adultos Sadios: uma Visão Geral da Aplicação de Técnicas de Processamento de Línguas Naturais e Aprendizado de Máquina à Análise Clínica do Discurso

Abstracts

Discourse production is an important aspect in the evaluation of brain-injured individuals. We believe that studies comparing the performance of brain-injured subjects with that of healthy controls must use groups with compatible education. A pioneering application of machine learning methods using Brazilian Portuguese for clinical purposes is described, highlighting education as an important variable in the Brazilian scenario.

OBJECTIVE:

The aims were to describe how to: (i) develop machine learning classifiers using features generated by natural language processing tools to distinguish descriptions produced by healthy individuals into classes based on their years of education; and (ii) automatically identify the features that best distinguish the groups.

METHODS:

The approach proposed here extracts linguistic features automatically from the written descriptions with the aid of two Natural Language Processing tools: Coh-Metrix-Port and AIC. It also includes nine task-specific features (three new ones, two extracted manually, besides description time; type of scene described - simple or complex; presentation order - which type of picture was described first; and age). In this study, the descriptions by 144 of the subjects studied in Toledo18 were used, which included 200 healthy Brazilians of both genders.

RESULTS AND CONCLUSION:

A Support Vector Machine (SVM) with a radial basis function (RBF) kernel is the most recommended approach for the binary classification of our data, classifying three of the four initial classes. CfsSubsetEval (CFS) is a strong candidate to replace manual feature selection methods.

natural language processing; language tests; narratives; adults; educational status; age groups


Um importante aspecto na avaliação de indivíduos com lesão cerebral é a produção de discurso. Acreditamos que estudos que comparam o desempenho de lesados com grupos de controles sadios devem utilizar grupos com escolaridade compatíveis. Nós apresentamos uma abordagem pioneira ao utilizar métodos de aprendizado de máquina com propósitos clínicos, para o Português do Brasil, destacando a escolaridade como variável de importância no cenário brasileiro.

OBJETIVO:

Nosso objetivo é descrever como: (i) desenvolver classificadores via aprendizado de máquina, usando features criadas por ferramentas de processamento de línguas naturais, para diferenciar descrições produzidas por indivíduos sadios em classes de anos de escolaridade e (ii) identificar automaticamente as features que melhor distinguem esses grupos.

MÉTODOS:

A abordagem proposta neste estudo extrai características linguísticas automaticamente a partir das descrições escritas com a ajuda de duas ferramentas de Processamento de Linguagem Natural: Coh-Metrix-Port e AIC. Ela inclui ainda nove features dedicadas à tarefa (três novas, duas extraídas manualmente, além de tempo de descrição; tipo de cena descrita - simples ou complexa; ordem de apresentação das figuras e idade). Neste estudo, foram utilizadas as descrições de 144 indivíduos estudados em Toledo18, que incluiu 200 brasileiros, sadios, de ambos sexos.

RESULTADOS E CONCLUSÃO:

SMV com kernel RBF é o mais recomendado para a classificação binária dos nossos dados, classificando três das quatro classes iniciais. O método de seleção das features CfsSubsetEval (CSF) é um forte candidato para substituir métodos de seleção manual.

processamento de linguagem natural; narrativas; adultos; escolaridade; grupos etários


Texto completo disponível apenas em PDF.

Full text available only in PDF format.

References

  • 1
    Togher L. Discourse sampling in the 21st century. J Commun Disord 2001;34:131-150.
  • 2
    Andreetta S, Cantagallo A, Marini A. Narrative discourse in anomic aphasia. Neuropsychologia 2012;50:1787-1793.
  • 3
    Wills C, Capilouto GJ, Wright HH. Attention and off-topic speech in the recounts of middle-aged and elderly adults: a pilot investigation. Contemp Issues Commun Sci Disord 2012;39:105-113.
  • 4
    Cannizzaro MS, Coelho CA. Analysis of narrative discourse structure as an ecologically relevant measure of executive function in adults. J Psycholinguist Res 2013;42:527-549.
  • 5
    Cooper P. Discourse Production and Normal Aging: Performance on Oral Picture Description Tasks. J Gerontol 1990;45:210-214.
  • 6
    Ash S, Moore P, Antani S, McCawley G, Work M, Grossman M. Trying to tell a tale: Discourse impairments in progressive aphasia and frontotemporal dementia. Neurology 2006;66:1405-1413.
  • 7
    Smith E, Ivnik RJ. Normative neuropsychology. In: Petersen RD. Mild cognitive impairment. New York: Oxford; 2003:63-88.
  • 8
    Marini A, Boewe A, Caltagirone C, Carlomagno S. Age-related Differences in the Production of Textual Descriptions. J Psycholinguist Res 2005;34:439-463.
  • 9
    Wright HH, Capilouto GJ, Koutsoftas A. Evaluating measures of global coherence ability in stories in adults. Int J Lang Commun Disord 2013;48:249-256.
  • 10
    Le Dorze G, Bédard C. Effects of Age and Education on the lexico-semantic content of connected speech in adults. J Commun Disord 1998;31:53-71.
  • 11
    Mackenzie C. Adult spoken discourse: the influences of age and education. Int J Lang Commun Disord 2000;35:269-85.
  • 12
    Neils J, Baris JM, Carter C, et al. Effects of age eduation and living environment on Boston Naming Test performance. J Speech HEAR Res 1995;38:329-223.
  • 13
    Ardila A, Bertolucci PH, Braga LW, et al. Illiteracy: the neuropsychology of cognition without reading. Arch Clin Neuropsychol 2010;25:689-712.
  • 14
    Duong A, Ska B. Production of Narratives: Picture Sequence Facilitates Organization but not Conceptual Processing in Less Educated Subjects. Brain Cogn 2001;46:121-124.
  • 15
    Forbes-McKay KE, Venneri A. Detecting subtle spontaneous language decline in early Alzheimer's disease with a picture description task. Neurol Sci 2005;26:243-254.
  • 16
    Alves DC, Souza LAP. Performance de moradores da grande São Paulo na descrição da Prancha do Roubo dos Biscoitos. Rev Cefac 2005;7:13-20.
  • 17
    Parente MA, Capuano A, Nespoulous J. Ativação de modelos mentais no recontar de historias por idosos. Psicol Reflex Crit [online] 1999;12:157-172.
  • 18
    Toledo CM. Variáveis sociodemográficas na produção do discurso em adultos sadios. Tese Mestrado. School of Medicine of the University of São Paulo; 2011.
  • 19
    Fraser K, Meltzer JA, Graham NL, et al. Automated classification of primary progressive aphasia subtypes from narrative speech transcripts. Cortex 2014;55:43-60.
  • 20
    Roark B, Mitchell M, Hosom JP, Hollingshead K, Kaye J. Spoken language derived measures for detecting mild cognitive impairment. IEEE Trans Audio Speech Lang Processing 2011;19:2081-2090.
  • 21
    MacWhinney B, Fromm D, Forbes M, Holland A. Aphasia Bank: Methods for Studying Discourse. Aphasiology 2011;25:1286-1307.
  • 22
    Price LH, Hendricks S, Cook C. Incorporating Computer-Aided Language Sample Analysis into Clinical Practice. Lang Speech Hear Serv Sch 2010;41:206-222.
  • 23
    Graesser AC, McNamara DS, Louwerse MM, Cai Z. Coh-Metrix: Analysis of text on cohesion and language. Behav Res Methods Instrum Comput 2004;36:193-202.
  • 24
    Aluísio SM, Specia L, Gasperin C, Scarton CE. Readability Assessment for Text Simplification. In: NAACL 5th Workshop on Innovative Use of NLP for Building Educational Applications (BEA-2010), 2010, Los Angeles. Proceedings of the NAACL HLT 2010 Fifth Workshop on Innovative Use of NLP for Building Educational Applications. New York: ACL 2010:1:1-9.
  • 25
    Scarton CE, Aluísio SM. Análise da Inteligibilidade de textos via ferramentas de Processamento de Língua Natural: adaptando as métricas do Coh-Metrix para o Português. Linguamática 2010;2(1):45-61.
  • 26
    Cunha A, Toledo CM, Scarton CE, Mansur L, Aluísio SM. Classificação Automática de Discurso Descritivo Escrito de Adultos Sadios: Referência para a Avaliação da Linguagem de Lesados Cerebrais. In: Encontro Nacional de Inteligência Artificial e Computacional, ENIAC 2013, 2013, Fortaleza. Anais do X Encontro Nacional de Inteligência Artificial e Computacional. Porto Alegre: SBC; 2013;1:1-12.
  • 27
    Semenza C, Cipolotti L. Neuropsicologia con carta e matita. Padova: Cleup Editrice Padova; 1989.
  • 28
    Biderman MTC. Dicionário Ilustrado de Português. Editora: Atica; 2005;1:344.
  • 29
    Bick E. The Parsing System Palavras: Automatic Grammatical Analysis of Portuguese in a Constraint Grammar Framework. PhD thesis, Aarhus University; 2000 .
  • 30
    Muniz MC, Laporte E, Nunes MGV. UNITEX-PB, a set of flexible language resources for Brazilian Portuguese. In: Anais do III Workshop em Tecnologia da Informação e da Linguagem Humana 2005;1:1-10.
  • 31
    Balage Filho P, Pardo T, Aluísio SM. An Evaluation of the Brazilian Portuguese LIWC Dictionary, 5 p. To be published in the Proceedings of STIL; 2013.
  • 32
    Hall M, Frank E, Holmes G, Pfahringer B, Reutemann P, Witten IH. The WEKA Data Mining Software: An Update; SIGKDD Explorations 2009;2:10-18.
  • 33
    Maziero EG, Pardo TAS. Automatic Identification of Multi-document Relations. In the (on-line) Proceedings of the PROPOR 2012 PhD and MSc/MA Dissertation Contest, Coimbra, Portugal 2012;17:1-8.
  • 34
    Armstrong E. Aphasic discourse analysis: the story so far. Aphasiology 2000;14 :875-892.
  • 35
    Chand V, Baynes K, Bonnici L, Farias ST. Analysis of Idea Density (AID): A Manual. University of California, Davis, 44 p. Available at: http://mindbrain.ucdavis.edu/labs/Baynes/AIDManual.ChandBaynesBonniciFarias.1.26.10.pdf 2010.
    » http://mindbrain.ucdavis.edu/labs/Baynes/AIDManual.ChandBaynesBonniciFarias.1.26.10.pdf
  • 36
    Chand V, Baynes K, Bonnici LM, Farias ST. A Rubric for Extracting Idea Density from Oral Language Samples. Curr Protocn Neurosci 2012: doi: 10.1002/0471142301.ns1005s58.
    » https://doi.org/10.1002/0471142301.ns1005s58

Publication Dates

  • Publication in this collection
    Sept 2014

History

  • Received
    05 Feb 2014
  • Accepted
    20 May 2014
Academia Brasileira de Neurologia, Departamento de Neurologia Cognitiva e Envelhecimento R. Vergueiro, 1353 sl.1404 - Ed. Top Towers Offices, Torre Norte, São Paulo, SP, Brazil, CEP 04101-000, Tel.: +55 11 5084-9463 | +55 11 5083-3876 - São Paulo - SP - Brazil
E-mail: revistadementia@abneuro.org.br | demneuropsy@uol.com.br