USE OF COMPUTATIONAL TOOLS AS SUPPORT TO THE CROSS-MAPPING METHOD BETWEEN CLINICAL TERMINOLOGIES

Objective: to reflect on the use of computational tools in the cross-mapping method between clinical terminologies. Method: reflection study. Results: the cross-mapping method consists of obtaining a list of terms through extraction and normalization; the connection between the terms of the list and those of the reference base, by means of predefined rules; and grouping of the terms into categories: exact or partial combination or, in more detail, similar term, more comprehensive term, more restricted term and non-agreeing term. Performed manually in many studies, it can be automated with the use of the Unified Medical Language System (UMLS). Obtaining the terms list can occur automatically by natural language processing algorithms, being that the use of rules to identify information in texts allows the expert’s knowledge to be coupled to the algorithm, and it can be performed by techniques based on Machine Learning. When it comes to mapping terms using the 7-Axis model of the International Classification for Nursing Practice (ICNP ® ), the process can also be automated through natural language processing algorithms such as POS-tagger and the syntactic parser. Conclusion: the cross-mapping


INTRODUCTION
In the development of terminologies in health it is necessary to harmonize concepts to ensure the interoperability of the data and to inform the researchers of the area about possible updates to be performed. 1The process of elaboration, development and harmonization of terminologies in health comprises a great effort on the part of its developers, who have limits to effect it in an individual way.] The SNOMED CT comprises a global clinical terminology covering several specialties, disciplines and requirements.For this reason, it minimizes the use of different terminologies or clinical systems, which allows a greater sharing and reuse of structured clinical information. 3In Brazil, the Ministry of Health, through the Ordinance n o 2073, of August 31, 2011, defined its use for the codification of clinical terms and mapping of national and international terminologies in use in the country, aiming to support the semantic interoperability between the systems. 5pecifically in nursing field, the use of standardized terminology provides a clear method for documenting its practices, it provides guidance and support for nurses in their clinical reasoning, and names the phenomena of interest in the profession, contributing for the construction of specific knowledge. 6Thus, the implementation of terminology in care settings presupposes that a comparison between the records of the patient's medical recordand the standardized language should be made in advance, which can be made through the cross-mapping methodology. 7onsidering the recommendation of the Ministry of Health regarding the employment of SNOMED CT, the cross-mapping between this terminology and those of nursing can broaden the representativeness of nursing phenomena in national databases, with the possibility of comparison with international bases.In addition, the use of this method contributes to the evolution and dissemination of terminologies by the different countries and specialties of nurses, 8 and their results collaborate so that professionals may reflect on the terms they use every day and are not registered in a uniform way.
2][13][14][15] A research that carried out the cross-mapping of the terms of ICNP ® 1.0 and SNOMEDCT identified that 80% of the terms of that are present in this. 11In this sense, the International Council of Nurses (ICN) currently provides tables of equivalence between the statements of diagnosis, outcomes and nursing interventions of the ICNP ® and of the SNOMED CT. [16][17] In the studies mentioned, the cross-mapping was performed manually and, gradually, computational tools were incorporated to support its operation, in order to reduce the time and to reduce human inconsistencies.] Despite the incorporation of computational resources for the cross-mapping between terminologies, 22 between clinical texts and terminologies 23 and between elements present in archetypes for terminologies, 24 the potential of the tools is not yet fully used and there is no consensus on the automation of the method, nor on its effectiveness, which justifies the reflection proposed in this article, which aims to reflect on the use of computational tools in the cross-mapping method between clinical terminologies.

REFLECTION
In the context of standardized languages, the cross-mapping consists of a method that allows the comparison of a standardized language with the language used in daily health services or between different classification systems. 25The method consists of obtaining the list of terms, through the extraction and normalization; connection between the terms of the list and those of the reference base (structured terminologies), by means of previously defined rules; and grouping the terms into categories.The terms extracted should represent the breadth of the nursing practices in a given care space; therefore, the researches in this domain use different databases and temporalities.Being it a human process, it is prone to failures due to the amount of data being processed.
As an example of the different bases and temporalities, for the mapping between the ICNP ® 1.0, the nursing diagnosis contained in children's records and the nomenclature of nursing diagnosis and interventions in the city of Curitiba, PR, it was necessary to manually retrieve 20% of the medical records of the patients treated in six months -it was considered a consultation to each medical record selected, a total of 80. 7 The complete transcription of the information contained in nursing records, through deep and exhaustive reading, was reported in a study that mapped nursing diagnosis of patients of an Intensive Care Unit (ICU) with the North American Nursing Diagnosis Association International (NANDAI).The database consisted of 256 records of patients who were hospitalized in the ICU in a period of six months. 12In turn, similar studies used a computational tool, called Poronto 21 for the extraction of terms of nursing evolutions contained in an electronic health record of a university hospital 26 and for the identification of terms in scientific articles related to the practice of nursing aimed at children and adolescents in situation of domestic violence. 27In the first study, a database of 115,760 patient evolutions and a temporality of two years were used, and 257,893 terms were extracted from the records. 26In the second, the database was composed of 40 articles in total, of which 17,365 terms were extracted. 27he automatic extraction of terms from the texts is a task of the Natural Language Processing (NLP) algorithms, which involves solving simple and compound terms and that can be based on statistics, linguistics and/or knowledge. 28Among the tools1 * available for use, it is possible to mention:

5/12
CoGrOO,Natural Language Toolkit (NLTK), OpenNLP, Stanford Core NLP and GATE.Therefore, the inclusion of a computational tool allows the processing of a large number of texts in search of information, in a shorter time when compared to the manual activity.In the automatic process, using the patient records as an empirical basis, it is possible to perceive the use of databases and greater temporalities, when compared to the extraction of terms by the manual process.In addition, the quantification of terms, that is, the frequency with which a term appears in the analysis corpus, is performed automatically by the tool, demonstrating the relevance of a term or concept of nursing, in a given care space.
The quantification of the terms of large databases is a complex activity if performed manually, due to the number of occurrence of the terms; for example, nursing evolutions of a university hospital presented more than 50,000 occurrences of the terms "time" and "abdomen", in a total of 115,760 records. 26This, in part, explains the use of smaller databases by studies that extract terms through the manual process.However, it should be emphasized that the larger the database and the temporality, the greater the possibility of representing the phenomena of the nursing practice.
7] In the automatic form, the methods of normalization and adequacy of texts originate from the stages of pre-processing used in NLP algorithms.The preprocessing and tokenization, in turn, are related to the transformation of the input text into something that the computational algorithm can understand and manipulate, which may include the removal of stopwords (words without relevant function in a given context, not being necessary for the processing of texts) and the capitalization of the text (uppercase to lowercase), the normalization of words by means of linguistic reducers or other forms of standardization and, finally, the separation of sentences and words from the text into individual units, called tokens. 29he NLP algorithms can use morphological, syntactic, semantic and pragmatic analysis.Regarding the morphological analysis of the texts, it is possible to mention the POS-Tagger, 29 which defines the morphology of words and their grammatical classes; for example, in the phrase "patient reports reduced pain", the algorithm would define "patient" as a singular masculine noun, "report" is a verb in the present indicative in the third person singular, and so on.][31][32][33][34] When performed automatically, the morphological normalization can reduce the time spent by the researcher during this step.However, the specialist's knowledge is extremely important, as is the normalization of the terms "right" and "patient rights" -the first refers to one location and the second to a nursing attention focus -; in case of automatic standardization, the two terms would be normalized to "right". 26This refers to the reflection that the normalization process needs to be performed in a semiautomatic way, that is, the knowledge of the expert is necessary for the semantics of the terms to be preserved.
Regarding the syntactic analysis of the texts, it is possible to mention the syntactic parser, 29 which is divided into two categories: the constituency parser (Figure 1), which demarcates the structure of the sentences of a text, and the dependency parser (Figure 2), which establishes the relations of dependence between the words of a text.
Another aspect of normalization of extracted content that can be performed automatically, in addition to the removal of stop words and capitalization, is the expansion of the abbreviations used by nursing professionals when recording their activities.The semantic analysis consists of discovering the meaning of words or concepts in the middle of the text.Among the problems that the semantic analysis aims to solve, it should be highlighted the resolution of ambiguity 35 and the recognition of nominated entities, 36 which includes the identification and classification of entities, such as names of people, organizations and locations, in a text.
The rules for the cross-mapping can be determined according to the study design, based on the characteristics of the data structure of the information system and the terminology to be used. 37The number of rules, if on the one hand guarantees the accuracy of the mapping, on the other, demands from the researcher an effort and knowledge that goes beyond their specialty: it needs a theoretical and practical basis of the classificatory system and semantic and transcultural equivalences.
Studies that performed the cross-mapping manually between terms and nursing diagnosis contained in patient records and NANDA I 12,38 included rules such as: guarantee the meaning of the terms, check the context and the meaning and not only the words; compare the terms to the statements of diagnosis and the focus of attention; compare terms to defining characteristics and related risk factors; identify and describe the possible concepts nursing diagnosis; and to map the nursing diagnosis in NANDA I domains and classes.
Similarly, studies that have performed the manual cross-mapping between nursing interventions and the Nursing Interventions Classification (NIC) 13 included rules such as: use the verbs of the interventions to perform the mapping for the NIC; map the intervention from the NIC intervention title to the activity; maintain the consistency between the mapped intervention and the definition of intervention in the classification; use the title of the more specific NIC intervention; and map interventions that had two or more verbs to two or more NIC interventions corresponding to them.
The use of rules for the automatic identification of information in the text (rule-based information extraction) is a methodology widely used in computational tools, since it allows the expert's knowledge to be incorporated into the algorithm.This approach has some known limitations 30 and can be enhanced if used in conjunction with statistical-based techniques such as the Machine Learning (ML).
In the case of ML algorithms with supervised learning, the knowledge of the expert is passed to the algorithm by means of annotation of data in the texts, being morphological, syntactic or semantic.This process is very time-consuming and costly, and it requires specialists engaged in task execution, well-defined annotation guidelines, and computational tools that accelerate and support the process. 39hen it comes to unsupervised learning algorithms, the expert's knowledge is not necessary because the algorithm itself can group data by similarity and extract the necessary information.In addition, the use of statistical methods can extend the scope of the algorithm, not limited to the knowledge of the expert and the generation of rules.
The establishment of categories for the arrangement of terms, the last phase of the mapping, should follow criteria capable of making possible subsequent comparisons or the re-use of results.In general, when the term found corresponds exactly to the term of the classification system, it is categorized as an exact combination and, when it presents similar, synonymous or related concepts, as a partial combination. 13n the nursing domain, it is common to use the criteria established by Leal, 40 which indicate more detailed categories for the mapping, among them: similar term, when there is no agreement of the spelling of the term, but the meaning is identical; broader term, when the term identified has a greater meaning than that of its terminology; more restricted term, when the term identified has a more limited meaning than that of its terminology; and non-agreeing term, when there is no agreement between the identified term and that of its terminology.
Regarding the cross-mapping between nursing terminologies, the use of the Unified Medical Language System (UMLS) can anchor the performance of an automatic mapping, since it comprises a knowledge source that integrates hundreds of terminologies or health-related classifications using a unified platform. 1In addition, there is already an initiative for the translation of UMLS into Brazilian Portuguese. 41he UMLS uses several processes to integrate terminologies, such as the use of lexical tools, for the normalization of concepts and preservation of meanings and relations in the source vocabularies. 42However, the automated process may have limitations.The automatic mapping by the UMLS between the Logical Observation Identifiers Names and Codes (LOINC), terminology for laboratory tests and clinical observations, and SNOMED CT has proved to be unsatisfactory, although the two terminologies cover both the domain of laboratory procedures and use similar knowledge representation formalisms.The study found that to improve the performance of the automatic mapping process, additional techniques are required. 43naccurate correspondences were also observed in the mapping between nursing terminologies, indicating a series of complexities to be addressed in the UMLS, requiring collaboration between specialists to solve problems in semantic mappings. 1This fact was approached in a cross-mapping between ICNP ® and the Classification of Clinical Care (CCC), and between ICNP ® and NANDA I, in which there were 97% exact matches when the mapping was performed by specialists; when processed by the UMLS, the comparison analysis presented an overall precision of 33.6% in the semantic mapping. 1 On the other hand, when it comes to mapping terms using the ICNP ® 7-Axis model, the process can be automated through NLP algorithms such as the POS-tagger and the syntactic parser. 29This is justified by the fact that, in the 7-Axis model, the terms belonging to the focus axis consist, for the most part, of nouns; the terms of the judgment axis correspond to adjectives; and the terms of the action axis refer to verbs in the infinitive, which enables a discipline for the semantics of the terms.
Considering that the analysis of the context of the terms extracted from empirical bases is of extreme importance in terminological work, 44 being often necessary to consider excerpts from nursing records to identify the context of nursing terms 38 the dependency parser is a tool that can support the cross-mapping methodology, since the dependencies between the words help to understand the context in which the terms are inserted.
Although the POS-tagger, the syntactic parser 29 and other NLP techniques are identified as facilitators for the cross-mapping method, with tools widely available for use, studies that have used them are still not identified, which is a limit for the reflection proposed in this article.

CONCLUSION
The operation of the cross-mapping methodology can be hampered by the amount of data from the empirical bases and by the human limitation in the comparison process.In this sense, computational tools are resources to maximize time and minimize manual inspection errors, but are supportive of the expert.
It is necessary that the researchers who are dedicated to the development of terminologies of the nursing field know computational tools able to support the process of cross-mapping, so that they can evaluate them and use them potentially.
In addition, the steps of obtaining and normalizing terms are the ones that most exploit the potential of computational resources, and the cross-mapping method can be intensified by the use of NLP algorithms.However, even in cases of automatic mapping, the validation of the results by specialists should not be disregarded, especially regarding the cross-cultural equivalence.

Figure 2 -Figure 1 -
Figure 2 -Relation of dependence between words by the dependency parser