Acessibilidade / Reportar erro

Forensic analysis of auditorily similar voices

ABSTRACT

Purpose:

to verify contributions of acoustic spectrographic analysis in the forensic identification of speakers with auditorily similar voices, considering the distinctive behavior of acoustic parameters: formants of vowel “é”, of connected speech, mean fundamental frequency in Hz, linear prediction curve of vowel “é” and linear prediction curve area; and to propose an objective method to use the analyzed parameters.

Methods:

a quantitative, qualitative and descriptive study, conducted in Pernambuco on 16 pairs of male siblings, aged 18-60 years. The subjects recorded videos from which the audios were extracted, numbered and sent to three examiners, in two groups: older brothers and younger brothers, for perceptual-auditory pairing. The correct pairings, indicated by at least two examiners, were submitted to acoustic analysis. The statistical tests included Wilcoxon, Kruskal-Wallis and Bonferroni, with p<0.05.

Results:

the results of analyses of formants and the mean fundamental frequency were not enough to distinguish similar voices. Unprecedentedly, in the measurements of areas generated by the linear prediction curve graphs, a distinctive statistical significance was observed.

Conclusion:

it was concluded that, among the parameters studied, the measurements of areas of the linear prediction curve objectively indicated effectiveness in distinguishing speakers with auditorily similar voices.

Descriptors:
Acoustics; Voice; Speech

RESUMO

Objetivo:

verificar contribuições da análise espectrográfica acústica na identificação forense de falantes em vozes auditivamente semelhantes, considerando o comportamento distintivo dos parâmetros acústicos: formantes da vogal “é”, da fala encadeada, média da frequência fundamental em Hz, curva de predição linear da vogal “é” e área da curva de predição linear; propor um método objetivo da utilização dos parâmetros analisados.

Métodos:

estudo quantitativo, qualitativo e descritivo, realizado em Pernambuco com 16 pares de irmãos do sexo masculino, entre 18-60 anos. Os sujeitos gravaram vídeos de onde extraíram-se os áudios que foram numerados e enviados a três avaliadores, em dois grupos: dos irmãos mais velhos e dos irmãos mais novos, para pareamento perceptivo-auditivo. Os pareamentos corretos, apontados por pelo menos dois avaliadores, foram submetidos à análise acústica. Os testes estatísticos foram Wilcoxon, Kruskal-Wallis, Bonferroni, com p<0,05.

Resultados:

os resultados das análises dos formantes e da média da frequência fundamental não foram suficientes para distinguir as vozes semelhantes. Ineditamente nas medidas das áreas geradas pelos gráficos da curva de predição linear, foi verificada significância estatística distintiva.

Conclusão:

concluiu-se que entre os parâmetros estudados, as medidas das áreas da curva de predição linear apontaram, objetivamente, eficácia na distinção de falantes com vozes auditivamente semelhantes.

Descritores:
Acústica; Voz; Fala

INTRODUCTION

In ancient and contemporary history, there are several reports of people recognition through voice, the most famous being the Lindberg case in 1932. Since voice recognition is a fragile test, based on only one sense of a single person, currently the proposal is to identify speakers, using scientifically based protocols.

Studies are constantly evolving, and several methods have been used for the forensic identification of speakers, in most cases. In Brazil, voice identification methods were introduced for forensic purposes in the 1990s, involving experts from the states, the Federal Police and the Federal District11. Cazumbá LF, Sanches AP, Telles IFC. Introdução à fonoaudiologia forense. In: Rehder MI, Cazumbá L, Cazumbá M, editors. Identificação de falantes: uma introdução à fonoaudiologia forense. Rio de Janeiro: Revinter; 2015. p.7-24.. The interception of telephone communications for investigation and as evidence in the Brazilian Criminal Proceedings is an increasingly used procedure22. Azzariti M. Diálogos de uma tortura: discursos de um crime. Rio de Janeiro: Rei dos Livros; 2016..

To assist and support the preparation of forensic evidence, Forensic Science is available, which is the set of all scientific knowledge and techniques that are used to unravel not only crimes, but also other legal issues. Concerning sciences, those directly involved with the forensic identification of speakers for legal purposes include Forensic Linguistics, Forensic Phonetics and Forensic Speech Therapy, whose professionals are dedicated to the complex task of identifying speakers through their voice and speech.

Forensic Linguistics is a branch of applied linguistics dedicated to the investigative context that points to elements that analyze communication in its several aspects33. Azzariti M, Gomes RV, Vasconcellos ZMC. Linguística: aspectos fonéticos. In: Rehder MI, Cazumbá L, Cazumbá M, editors. Identificação de falantes: uma introdução à fonoaudiologia forense. Rio de Janeiro: Revinter; 2015. p.119-37.. Forensic Phonetics goes beyond the identification of speakers; it permeates many criminalistic mysteries. The main objective of Forensic Speech Therapy is to respond to legal demands related to human communication, acting in several analyses involving forensic comparison of voice, speech and language; graphotechnics; facial biometrics; transcription, textualization and analysis of audio, video and image content; and description of the communicative profile11. Cazumbá LF, Sanches AP, Telles IFC. Introdução à fonoaudiologia forense. In: Rehder MI, Cazumbá L, Cazumbá M, editors. Identificação de falantes: uma introdução à fonoaudiologia forense. Rio de Janeiro: Revinter; 2015. p.7-24..

Recently, on October 22nd 2020, the Brazilian Federal Council of Speech Therapy recognized the field of Forensic Speech Therapy by resolution n. 58444. Conselho Federal de Fonoaudiologia. Resolução 584, 22 de outubro de 2020. Available at: https://www.fonoaudiologia.org.br/resolucoes/resolucoes_html/CFFa_N_584_20.htm. [Accessed 2021 fev 27].
https://www.fonoaudiologia.org.br/resolu...
.

For the Forensic Identification of Speakers, it is necessary to compare the standard sample with the sample under analysis55. Vieira RC. Identificação de falante: um estudo perceptivo da qualidade de voz [thesis]. São Paulo (SP): Pontifícia Universidade Católica de São Paulo; 2018.. It should be explained that the standard sample is the audio recording that contains the speech of the suspect, accused or defendant (of known identity), and the questioned sample is the audio recording that contains the speaker's speech, whose identity must be known66. Gonçalves CS, Petry T. Comparação forense de locutores no âmbito da perícia oficial dos estados. In: Rehder MI, Cazumbá L, Cazumbá M, editors. Identificação de falantes: uma introdução à fonoaudiologia forense. Rio de Janeiro: Revinter; 2015. p.241-64..

Three methods are used by specialists in the field of forensic speaker identification: the auditory-perceptual method, the acoustic method and the automatic method77. Lucena LVO. Relação entre as análises acústica e perceptivo auditiva da voz na identificação forense de falantes: uma revisão sistemática [dissertation]. Recife (PE): Universidade de Pernambuco; 2018..

The perceptual-auditory method highlights the parameters to be analyzed and presents a strong subjective aspect through a qualitative approach88. Cazumbá LF, Rehder MI, Sanches AP. Investigação e análise perceptivo-auditiva. In: Cazumbá L, Cazumbá M, Rehder MI, editors. Identificação de falantes: uma introdução à fonoaudiologia forense. Rio de Janeiro: Revinter; 2015. p.89-101..

The acoustic method uses the spectrogram to analyze the waves produced at the moment of vocal emission, allowing quantitative analysis99. Karakoç MM, Varol A. Visual and auditory analysis methods for speaker recognition in digital forensic. In: International Conference on Computer Science and Engineering. Antalya. Anais. 2017:1189-1192. https://doi.org/10.1109/UBMK.2017.8093505.
https://doi.org/10.1109/UBMK.2017.809350...
. The evaluation by acoustic parameter must be standardized, since this analysis provides a number1010. Behlau M, Almeida AA, Amorim G, Balata P, Bastos S, Cassol AA et al. Reducing the GAP between science and clinic: lessons from academia and professional practice - part A: perceptual-auditory judgment of vocal quality, acoustic vocal signal analysis and voice self-assessment. CoDAS. 2022;34(5):e20210240. https://doi.org/10.1590/2317-1782/20212021240en. PMID:35920467.
https://doi.org/10.1590/2317-1782/202120...
, which facilitates analysis, comparisons and storage of measurements. The spectrogram generated in this method is a three-dimensional graph that records the acoustic measurement of the sound wave. It contains information related to sound parameters, i.e., intensity, duration and frequency (time on the horizontal axis, frequency in Hertz on the vertical axis and intensity in Decibel by the color99. Karakoç MM, Varol A. Visual and auditory analysis methods for speaker recognition in digital forensic. In: International Conference on Computer Science and Engineering. Antalya. Anais. 2017:1189-1192. https://doi.org/10.1109/UBMK.2017.8093505.
https://doi.org/10.1109/UBMK.2017.809350...
.

In a simplified manner, the acoustic evaluation quantifies the sound signal, which leads to an objective analysis of voice. There is also the following distinction: while acoustics performs measurement of the sound signal, the auditory-perceptual evaluation offers a description of the vocal signal with only hearing as a basic instrument1111. Behlau M, Madazio G, Feijó D, Pontes P. Avaliação de Voz. In: Behlau M, editor. O livro do especialista. v. 1. Rio de Janeiro: Revinter; 2001. p.85-245.. The importance of the two proposed methods (perceptual-auditory and acoustic) in association, besides confirming that one is not better than the other but complement each other, was the conclusion of a recent study at the University of Pernambuco77. Lucena LVO. Relação entre as análises acústica e perceptivo auditiva da voz na identificação forense de falantes: uma revisão sistemática [dissertation]. Recife (PE): Universidade de Pernambuco; 2018..

The other method, the automatic, is performed by softwares that try to reduce subjective analyses as much as possible. The software is fed with information such as vocabulary, programmed and pronounced in many different manners. In some European countries, the use of automatic systems is accompanied by insights from a professional with knowledge in phonetics and even linguistics. For example, at the University of Gothenburg, the software used is ALIZE SpkDet, and the results obtained by the software are combined with traditional acoustic and auditory analysis1212. Eriksson A. Aural/Acoustic vs. Automatic methods in forensic phonetic case work. In: Neustein A, Patil HA, editors. Forensic speaker recognition: law enforcement and counter- terrorism. New York: Springer-Verlag; 2012. p.41-69..

Automatic systems are subject to the so-called incompatibility conditions, which occur when differences between voice samples may also appear due to differences in transmission channels, which is a relevant and worrying problem in this type of analysis method1212. Eriksson A. Aural/Acoustic vs. Automatic methods in forensic phonetic case work. In: Neustein A, Patil HA, editors. Forensic speaker recognition: law enforcement and counter- terrorism. New York: Springer-Verlag; 2012. p.41-69..

All legal and technological devices support the forensic identification/comparison of speakers, and more studies are being conducted in this field, so that the binary comparison of voices may be used for legal purposes.

The general objective of this study was to verify the contributions of acoustic spectrographic analysis in the forensic identification of speakers in auditorily similar voices, and to propose an objective method of using the analyzed parameters. The specific objectives were to verify the usefulness of the acoustic parameters: formants of vowel “é”, mean fundamental frequency in Hz, formants F1, F2, F3 in speech, linear prediction curve (LPC) curve of vowel “é”, and area of the LP for distinguishing auditorily similar voices.

METHODS

The study was conducted at the state of Pernambuco and was approved by the Institutional Review Board of the State Hematology and Hemotherapy Foundation, Brazil, under report n. 4.303.659 and CAAE 38306620.3.0000.5195. The independent variables were place of birth, age, sibling and gender, and the dependent variables were the first four formants of vowel “é” (represented by “/ɛ/”); mean fundamental frequency, F1, F2, F3 in connected speech, LPC of vowel /ɛ/ and area of the LPC curve.

The study was conducted on 32 people, being 16 pairs, two brothers from each family. The following inclusion criteria were adopted: being brothers (due to genetics), being male (due to the proximity of vocal frequency), being aged between 18 and 60 years (since the voice does not undergo significant changes in this age group) and being native and residing in the state of Pernambuco (due to the accent and especially the pronunciation of vowel “e”, marked in the region). Exclusion criteria were: being twins, considering the existence of previous studies on twins, and/or having a viral, bacterial or inflammatory process in the upper airway on the day of collection, which would influence the voice and possibly the distinction of voice among peers, and/or not having signed the Informed Consent Form.

The investigator (S.C.W.C) recruited participants randomly, sending an invitation specifically designed for this purpose, on social networks and institutions in the state of Pernambuco. After the participants were defined according to the previously described inclusion and exclusion criteria, data were collected by video, captured by the participant's cell phone using the device software. The videos had the following recording script, previously explained to the participants: say the name, the date, show an identification document with photograph and date of birth; talk about the state of Pernambuco for 3 to 5 minutes. Afterwards, the videos were sent to the researcher. To perform the first methodological stage, listening to the voice samples, the videos were converted into audio in Wav format by the investigator, using the multimedia conversion software Format Factory®. Preparation of the material for the stage of listening and pairing of voice samples constituted the formation of two groups GimV (group of older brothers) and GimN (group of younger brothers). Then, the names of participants in group (GimV) were replaced by consecutive numbers from 1 to 16. In the group of younger siblings (GimN), the names were randomly replaced by numbers 17 to 32. After this procedure, two groups of voice samples were obtained, GimV with numbers from 1 to 16 and GimN with random numbers between 17 and 32.

To compose the samples of auditorily similar voice to be later investigated by the acoustic spectrographic analysis by the investigator in the second stage, the voice samples of the GimV and GimN groups were submitted to perceptual-auditory pairing, conducted by three speech therapists specialized in Voice by the Federal Council of Speech Therapy - CFFa. The speech therapists who performed the perceptual-auditory pairing were asked to listen to the GimV voices and to indicate the pair of the respective sibling in the GimN and record each pair using a pairing table (Chart 1). Acoustic analysis was performed on pairs of siblings considered to be auditorily similar in a correct manner, belonging to the same family, appointed as peers by at least two of the three speech therapists. Of the 16 pairs submitted to perceptual-auditory pairing performed by speech therapists, six were coincident and submitted to acoustic analysis. The result of the perceptual-auditory pairing is shown in Chart 1.

Chart 1
Perceptual-auditory analytical pairing performed by speech pathologists specialized in voice by the Federal Council of Speech Pathology

In the second stage, the correctly paired samples were analyzed using acoustic spectrographic analysis, aiming to verify whether and which of the analyzed acoustic parameters would have sufficient statistical power to distinguish people from the same family with auditorily similar voices, and whether and which acoustic parameters were coincident in people born and residing in the State of Pernambuco. The acoustic spectrographic analyses were performed by the investigator (S.C.W.C) using the acoustic analysis software PRAAT®.

In this study, individual acoustic parameters were verified and later compared between the paired brothers, between the pairs and between the two groups (GimV and GimN). The acoustic parameters analyzed were the first four formants (F1, F2, F3, F4) of vowel /ɛ/, which were extracted after the first minute of speech; mean fundamental speech frequency in Hz; F1, F2 and F3 in connected speech, which were extracted in the first four minutes of speech; LPC curve by the PRAAT® software. The area of the LPC curve was also analyzed from the graphs of the individual LPC curves generated by the PRAAT® software, to propose an original analysis method in the present study. Calculation of the area generated by the comparative LPC graph of each pair studied was performed by an Informatics professional, who generated an algorithm specifically for this purpose. The LPC curve of each audio separately generated in PRAAT® was submitted to analysis of its area to obtain measurements of the areas formed below the curves, which could be analyzed and submitted to intrapair comparison in the statistical analysis.

To achieve this area, an algorithm was used to generate graphs and calculate the integral (area under the curve). Initially, the image was converted from RGB to a monochrome version and the intermediate gray levels were removed, leaving only completely white or completely black pixels.

Then, a loop was made, first varying the “y” coordinate, in principle, from the first to the last line of the figure. Since the study was dealing with 3,600 x 2,400 resolution figures, this means varying “y” from 0 to 2,399; in each interaction of the “y” loop, another loop was performed, this time varying the “x” coordinate, in principle, from the first to the last column of the figure, i.e., varying “x” from 0 to 3,599. This is described as "in principle" because the pixel colors are evaluated during scanning, and initially all are white pixels. When the first black pixel was found, both loops ended, since it was known to be the upper left part of the graph, reminding that the coordinate point (0,0) is on the first line (uppermost) and first column (leftmost). From the point immediately before this pixel found, i.e., the coordinates (xblack − 1, yblack), in which the coordinates (xblack, yblack) are those of that first black pixel found, the “y” coordinate was increased, recording the “y” values where variations are found from white to black, or vice versa. Since the column was being scanned immediately before the “y” axis of the graph, these variations are found in the markings on the “y” axis scale (0, 20, 40, and 60 dB/Hz, depending on the graph being analyzed). Thus, the T “y” Map table was generated, in which the mean “y” coordinate between the transition from white to black and the following transition from black to white was recorded, assuming that the scale value is exactly on the half of the marking stroke. This T “y” Map table allows to map the “y” coordinates expressed in pixels in the figure to their respective values in dB/Hz.

Following, an analogous table T “x” Map was created, this time varying the “x” coordinates from the point (xblack, ymark_min), in which xblack is the “x” coordinate of the first black point found above, and ymark_min is the “y” coordinate of the mark with the lowest dB/Hz value on the “y” axis. Thus varying, the “x” coordinate of the first transition from black to white was recorded, xini, which characterizes the first column of the graph region; as well as the last transition from white to black, xend, characterizing the last column of this region. The T “x” Map table, thus created, allowed mapping of “x” coordinates, with xini → 0 dB, and xend → 104 dB. Finally, the “y” coordinate of (xini, ystroke_min) was varied, increasing the “y” value, i.e., following downwards on the graph until finding a transition from white to black, which will occur on the coordinate ybottom, where the “x” axis is located.

Similarly, the “y” coordinate was varied again, this time decreasing it (i.e., going upwards), until finding the ytop coordinate, where the upper frame of the graph is located. From there, the dx value was calculated, defined as: dx = xf im−xini 104, since 104 is the final value of the “x” axis in all graphs, and the initial value is zero. Then, an integral variable was initiated with value zero, and a loop was started varying the “x” coordinate, in principle, from xini to xend, and at each iteration of this loop the “y” coordinate was varied, in principle, from ybottom to ytop, that is, going upwards, passing through white pixels, then through black pixels (the graph line), and stopping one pixel before the transition from black to white, where the graph point is, at coordinate (xi, yf(xi)).

Each time a point (xi, yf(xi)) was found, the coordinates expressed in pixels were converted to coordinates expressed in graph units, using the T “x” Map and T “y” Map tables. The value yf(xi) is added to the integral variable, zeroed at the beginning of the outermost loop, so that its value at the end of loops is multiplied by the dx value obtained above, providing the final value of the integral, i.e., the area under the curve.

For statistical analysis, the results of the analyzed acoustic parameters were extracted and inserted in a digital spreadsheet. Descriptive analyses were performed, using measures of central tendency, and inferential, using non-parametric comparison tests, since the data did not meet the normality criteria. The Wilcoxon test was used for paired analysis between siblings, and the Kruskal-Wallis test was used to compare groups of older and younger siblings and comparison between pairs of siblings, besides the post hoc Bonferroni test for multiple comparisons. The SPSS software version 21 was used at a significance level of 5% (p<0.05).

RESULTS

Table 1 shows the comparison of measurements of formants of vowel /ɛ/ between the older and younger brothers of each pair.

Table 1
Comparison of each extracted acoustic measure referring to formants of vowel /ɛ/ between older and younger siblings of each pair

The acoustic measurements extracted from vowel /ɛ/ for F1, F2, F3 and F4 did not show statistically significant differences, as shown in the results in Table 1.

Table 2 presents the comparison of formant measures, of the mean frequency in connected speech between older and younger siblings of each pair.

Table 2
Comparison of each extracted acoustic measure referring to speech formants, mean frequency of speech among older and younger siblings of the same pair

The acoustic measurements presented in this table are not statistically significant.

In Table 3, the possibility of differences in measurements between pairs was considered, since these subjects are not related, but only have a common birthplace. Thus, Table 3 shows the comparison of acoustic measurements between pairs.

Table 3
Comparison of general means of voice acoustic measures between the six pairs of older and younger siblings.

The frequency parameter between the six pairs (Table 3) revealed a statistically significant difference between peers, i.e., even knowing that this parameter has a population mean, interpair differences were found.

The Bonferroni's test for multiple comparisons was then performed to observe where these differences occurred, as shown in Chart 2, considering that such differences may contribute to the forensic identification of speakers in general.

Chart 2
Post-hoc test for multiple comparisons between general means of frequency measures of the six pairs of older and younger siblings

With this analysis, no significance was found between the pairs in relation to frequency, i.e., even between all pairs there was not a frequency that could highlight a pair, or even a voice, as previously observed.

Figure 1 presents six images that represent the LPC curve between pairs, the siblings' audios in the graphs are represented by curves with different colors.

The following images demonstrate the differences between audios, since the two resulting curves are distinct, even though in some cases they superimpose or even intertwine.

Figure 1
Linear Prediction Curve of the same pair with different colors for each curve on the same screen

In the present study, the LPC was considered in vowel /ɛ/, whose results are presented in Figure 1. The analysis applied to a speech signal allows achieving the spectral envelope and the frequencies corresponding to the formants.

Figure 2 presents 12 images with measurements of the area of LPC graphs.

Figure 2
Area measurements of Linear Prediction Curve graphs in different graphs

Table 4 compares the areas of LPC curves and shows that this measure is able to distinguish, as an objective parameter, more than 50% of siblings with auditorily similar voices.

Table 4
Comparison of areas of Linear Prediction Curve measurements of the voice of siblings of each pair.

DISCUSSION

As shown in the results of comparison of each extracted acoustic measurement, referring to the /Ɛ/ vowel formants between older and younger brothers of each pair, the measurements were not able to differentiate the brothers even in the high frequency formant, which is in line with the findings of studies described below.

A recent study1313. Cavalcanti JC, Eriksson A, Barbosa PA. Acoustic analysis of vowel formant frequencies in genetically-related and nongenetically related speakers with implications for forensic speaker comparison. Plos One. 2021;16(2):1-31. https://doi.org/10.1371/journal.pone.0246645. PMID: 33600430.
https://doi.org/10.1371/journal.pone.024...
revealed consistent patterns regarding the comparison of high- and low-frequency formants in pairs of twins and non-genetically related speakers, with high-frequency formants exhibiting greater speaker discriminatory power compared to low-frequency formants. It should be mentioned that this study was conducted on pairs of twins (genetically related) and on non-genetically related subjects.

Another study1414. Franks S, Barbosa R. A importância da duração da vogal final da palavra para a identificação de falantes não nativos de português por meio de máquinas de vetores de suporte. RBLA. 2014;14(3):689-714. https://doi.org/10.1590/S1984-63982014000300009
https://doi.org/10.1590/S1984-6398201400...
demonstrated that male and female speakers produced vowels with F1 and F2 values relatively close to the targets of native speakers of the state of Paraíba (PB), and the mean values for non-native male speakers were almost identical to the means of native speakers. Formantic measurements are the main acoustic correlates associated with the description of vowel segments1515. França FP, Almeida AA, Lopes LW. Immediate effect of different exercises in the vocal space of women with and without vocal nodules. CoDAS. 2022;34(5):e20210157. https://doi.org/10.1590/2317-1782/20212021157pt. PMID: 35894373.
https://doi.org/10.1590/2317-1782/202120...
. In the present findings, the values of vowel /ɛ/ formants were not sufficient to differentiate pairs of siblings with auditorily similar voices. The absence of distinctive vowel characteristics indicates that this parameter should be used with caution in the forensic identification of speakers among siblings. That is, once again in this study, formants that are classified as highly individual1111. Behlau M, Madazio G, Feijó D, Pontes P. Avaliação de Voz. In: Behlau M, editor. O livro do especialista. v. 1. Rio de Janeiro: Revinter; 2001. p.85-245. were not able to identify the auditorily similar voices in each pair, demonstrating limitations in the use of formants for the identification of speakers with auditorily similar voices.

Regarding the fundamental frequency, it was observed that the acoustic measurements referring to the means in connected speech between siblings of the same pair did not present statistical significance, corroborating a study1616. Debruyne F, Decoster W, Gijsel AV, Vercammen J. Speaking fundamental frequency in monozygotic and dizygotic twins. J Voice. 2002;16(4):466-71. https://doi.org/10.1016/s0892-1997(02)00121-2. PMID: 12512633.
https://doi.org/10.1016/s0892-1997(02)00...
that analyzed the mean fundamental frequency of speech of twins and its standard deviation in a reading task. The mentioned study investigated to which extent the similarity observed for the fundamental frequency was genetically influenced when comparing data from monozygotic twins (MZ) with data from heterozygotic twins (HZ). In that study, there were no differences between MZ twins and HZ twins in terms of mean fundamental frequency of speech (FFF) and its variation (standard deviation), although correlations were observed between measurements in the first group.

Therefore, as observed in the present study, the fundamental frequency, when used between siblings with auditorily similar voices, will probably not be efficient to distinguish such speakers.

The research also analyzed the LPC curve. When the exam to be performed is the identification of speakers, in which it is important to study the resonance poles of the vocal tracts, it is also necessary to study the response curve in Frequency, which is obtained by the LPC1717. Fernandes JR. Perícias em áudios e imagens forenses. Campinas: Milennium; 2014.. Whenever possible, the examiner should use linear prediction analysis (LPC), since this strategy is the most adequate for measuring sound formants1111. Behlau M, Madazio G, Feijó D, Pontes P. Avaliação de Voz. In: Behlau M, editor. O livro do especialista. v. 1. Rio de Janeiro: Revinter; 2001. p.85-245..

The LPC graphs generated from the acoustic analysis of vowel /ɛ/ of the pairs of siblings, in the present study, corroborate the literature, showing different curves between siblings of the same pair (curves were traced with different colors for each sibling of the same pair for easy viewing). However, to allow their use as forensic evidence, it was decided to generate values that could be statistically analyzed to prove whether or not there were significant differences between siblings in pairs. Under this scientific view, the graphs were submitted to measurement of the area of the LPC curve generated from the audio of vowel /ɛ/ of each subject. This resource was used to provide a new method for forensic use based on an objective parameter herein represented by the measurement of area of the LPC curve.

After analyzing the graphs resulting from the measurements of areas of the LPC curves, values were generated, in which the measurements of pairs of siblings are statistically compared.

Comparing the areas of the LPC curves between pairs of siblings, it was observed that there were statistically significant differences in pairs 1-31, 3-21, 9-32, 14-19. In pairs 6-28 and 10-25, no statistically significant differences were observed. It is relevant to mention that, at study onset, in the perceptual-auditory pairing, the pair 6-28 was the only considered coincident by the three examiners specialized in voice. In general, this resource was able to differentiate the voice of older and younger brothers in the same pair, except when there is marked auditory similarity.

This resource demonstrates the importance of analyzing the area of the LPC curve in differentiating auditorily similar voices. The results of the LPC curves visually demonstrated that the curves must belong to different subjects. However, since this is a scientific research and aiming to exclude subjectivity in data interpretation, measurements of the LPC areas were generated, in an unprecedented manner, which were submitted to statistical analysis. With the analysis of these measurements, it was possible to detect the distinction in most pairs, except for those in which the vocal similarity was high. Other studies on larger samples are needed to assess the sensitivity of this new method. This resource proved to be promising for the distinction of voices and should be combined with acoustic evaluations to complement and strengthen the delineation of cases, since this is an innovative measurement that can contribute to greater reliability in future forensic reports by bringing less subjectivity and providing reproducibility for the work of forensic experts.

This study reinforces how delicate is the forensic identification of speakers mainly with auditorily similar voices. It also points to acoustic analysis and its tools used in line with the desired forensic analysis; the more similar the compared voices, the more resources should be used.

This study is completed and simultaneously raises new hypotheses for studies in this field, which has been growing as recorded oral communication is increasingly used in the most diverse processes as an element of forensic evidence.

CONCLUSION

This study demonstrated that the formants of vowel “é” and connected speech, and the mean fundamental frequency in Hz were not enough to distinguish auditorily similar voices. It also showed that the unprecedented resource of measuring the area of the LPC curve was able to distinguish most of them, thus, representing an objective and reproducible parameter to be used in forensic evidence.

REFERENCES

  • 1
    Cazumbá LF, Sanches AP, Telles IFC. Introdução à fonoaudiologia forense. In: Rehder MI, Cazumbá L, Cazumbá M, editors. Identificação de falantes: uma introdução à fonoaudiologia forense. Rio de Janeiro: Revinter; 2015. p.7-24.
  • 2
    Azzariti M. Diálogos de uma tortura: discursos de um crime. Rio de Janeiro: Rei dos Livros; 2016.
  • 3
    Azzariti M, Gomes RV, Vasconcellos ZMC. Linguística: aspectos fonéticos. In: Rehder MI, Cazumbá L, Cazumbá M, editors. Identificação de falantes: uma introdução à fonoaudiologia forense. Rio de Janeiro: Revinter; 2015. p.119-37.
  • 4
    Conselho Federal de Fonoaudiologia. Resolução 584, 22 de outubro de 2020. Available at: https://www.fonoaudiologia.org.br/resolucoes/resolucoes_html/CFFa_N_584_20.htm [Accessed 2021 fev 27].
    » https://www.fonoaudiologia.org.br/resolucoes/resolucoes_html/CFFa_N_584_20.htm
  • 5
    Vieira RC. Identificação de falante: um estudo perceptivo da qualidade de voz [thesis]. São Paulo (SP): Pontifícia Universidade Católica de São Paulo; 2018.
  • 6
    Gonçalves CS, Petry T. Comparação forense de locutores no âmbito da perícia oficial dos estados. In: Rehder MI, Cazumbá L, Cazumbá M, editors. Identificação de falantes: uma introdução à fonoaudiologia forense. Rio de Janeiro: Revinter; 2015. p.241-64.
  • 7
    Lucena LVO. Relação entre as análises acústica e perceptivo auditiva da voz na identificação forense de falantes: uma revisão sistemática [dissertation]. Recife (PE): Universidade de Pernambuco; 2018.
  • 8
    Cazumbá LF, Rehder MI, Sanches AP. Investigação e análise perceptivo-auditiva. In: Cazumbá L, Cazumbá M, Rehder MI, editors. Identificação de falantes: uma introdução à fonoaudiologia forense. Rio de Janeiro: Revinter; 2015. p.89-101.
  • 9
    Karakoç MM, Varol A. Visual and auditory analysis methods for speaker recognition in digital forensic. In: International Conference on Computer Science and Engineering. Antalya. Anais. 2017:1189-1192. https://doi.org/10.1109/UBMK.2017.8093505
    » https://doi.org/10.1109/UBMK.2017.8093505
  • 10
    Behlau M, Almeida AA, Amorim G, Balata P, Bastos S, Cassol AA et al. Reducing the GAP between science and clinic: lessons from academia and professional practice - part A: perceptual-auditory judgment of vocal quality, acoustic vocal signal analysis and voice self-assessment. CoDAS. 2022;34(5):e20210240. https://doi.org/10.1590/2317-1782/20212021240en PMID:35920467.
    » https://doi.org/10.1590/2317-1782/20212021240en
  • 11
    Behlau M, Madazio G, Feijó D, Pontes P. Avaliação de Voz. In: Behlau M, editor. O livro do especialista. v. 1. Rio de Janeiro: Revinter; 2001. p.85-245.
  • 12
    Eriksson A. Aural/Acoustic vs. Automatic methods in forensic phonetic case work. In: Neustein A, Patil HA, editors. Forensic speaker recognition: law enforcement and counter- terrorism. New York: Springer-Verlag; 2012. p.41-69.
  • 13
    Cavalcanti JC, Eriksson A, Barbosa PA. Acoustic analysis of vowel formant frequencies in genetically-related and nongenetically related speakers with implications for forensic speaker comparison. Plos One. 2021;16(2):1-31. https://doi.org/10.1371/journal.pone.0246645 PMID: 33600430.
    » https://doi.org/10.1371/journal.pone.0246645
  • 14
    Franks S, Barbosa R. A importância da duração da vogal final da palavra para a identificação de falantes não nativos de português por meio de máquinas de vetores de suporte. RBLA. 2014;14(3):689-714. https://doi.org/10.1590/S1984-63982014000300009
    » https://doi.org/10.1590/S1984-63982014000300009
  • 15
    França FP, Almeida AA, Lopes LW. Immediate effect of different exercises in the vocal space of women with and without vocal nodules. CoDAS. 2022;34(5):e20210157. https://doi.org/10.1590/2317-1782/20212021157pt PMID: 35894373.
    » https://doi.org/10.1590/2317-1782/20212021157pt
  • 16
    Debruyne F, Decoster W, Gijsel AV, Vercammen J. Speaking fundamental frequency in monozygotic and dizygotic twins. J Voice. 2002;16(4):466-71. https://doi.org/10.1016/s0892-1997(02)00121-2 PMID: 12512633.
    » https://doi.org/10.1016/s0892-1997(02)00121-2
  • 17
    Fernandes JR. Perícias em áudios e imagens forenses. Campinas: Milennium; 2014.
  • Study conducted at the Departamento de Perícias Forenses da Universidade de Pernambuco - UPE, Recife, Pernambuco, Brazil.
  • Financial support: Nothing to declare.

Publication Dates

  • Publication in this collection
    05 June 2023
  • Date of issue
    2023

History

  • Received
    03 Nov 2022
  • Accepted
    31 Mar 2023
ABRAMO Associação Brasileira de Motricidade Orofacial Rua Uruguaiana, 516, Cep 13026-001 Campinas SP Brasil, Tel.: +55 19 3254-0342 - São Paulo - SP - Brazil
E-mail: revistacefac@cefac.br