Taxonomic indexes for differentiating malignancy of lung nodules on CT images

Introduction: Lung cancer remains the leading cause of cancer mortality worldwide, with one of the lowest survival rates after diagnosis. Therefore, early detection greatly increases the chances of improving patient survival. Methods: This study proposes a method for diagnosis of lung nodules in benign and malignant tumors based on image processing and pattern recognition techniques. Taxonomic indexes and phylogenetic trees were used as texture descriptors, and a Support Vector Machine was used for classification. Results: The proposed method shows promising results for accurate diagnosis of benign and malignant lung tumors, achieving an accuracy of 88.44%, sensitivity of 84.22%, specificity of 90.06% and area under the ROC curve of 0.8714. Conclusion: The results demonstrate the promising performance of texture extraction techniques by means of taxonomic indexes combined with phylogenetic trees. The proposed method achieves results comparable to those previously published.


Introduction
Lung cancer is the most frequent of all malignant tumors and has an increase of 2% per year in its worldwide incidence.In 90% of cases, lung cancer is associated with the consumption of tobacco products.In Brazil, estimates of lung cancer cases in the year 2014 were 27,330, with 16,400 men and 10,930 women (Instituto..., 2015).
One of the best opportunities to diagnose lung cancer is when an asymptomatic patient, normally a smoker, undergoes a computerized tomography (CT) exam (Srichai, 2007).The detection of such nodules using CT is not a simple task, because they can have contrasts similar to other structures, low density, and small size in an area of complex anatomy (connected to blood vessels or on the borders of the lung), among other issues (Carvalho et al., 2014).
A variety of computer-aided detection and diagnosis techniques have been proposed for the detection and characterization of tumors (Carvalho et al., 2016;Gupta and Tiwari, 2014;Hua et al., 2015).The development of such techniques can be divided into two main categories: computer-aided detection (CADe) and computer-aided diagnosis (CADx).CADx systems would allow for the reduction of the number of unnecessary biopsies in patients with benign tumors, preventing physical and mental depression inpatients.Thus, CADx acts as a second opinion, aiding experts to achieve accurate and efficient diagnosis of cancer cells in the earlier stages of the disease (Parveen and Kavitha, 2014).
Various initiatives are frequently developed with the goal of increasing the accuracy of lung cancer diagnosis using CADx systems.Nascimento et al. (2012) proposed a methodology based on texture features using Linear Discriminant Analysis (LDA) and Support Vector Machine (SVM), with an accuracy rate of 92.78%.Orozco et al. (2013) proposed a methodology based on texture features using Correlation-based Feature Subset Selection, k-Nearest Neighbor (KNN) and SVM, with an accuracy rate of 82.66%.Krewer et al. (2013) proposed a methodology based on a combination of texture and shape features using Correlation-based Feature Subset Selection and KNN, with an accuracy rate of 90.91%.Dandil et al. (2014) proposed a methodology based on texture features using Principal Component Analysis (PCA) and Artificial Neural Network (ANN), with an accuracy rate of 90.63%.Parveen and Kavitha (2014) proposed a methodology based on texture features using SVM, with a sensitivity rate of 91.38% and specificity rate of 89.56%.Kuruvilla and Gunavathi (2014) proposed a method based on texture features using ANN, with an accuracy rate of 93.30%.Gupta and Tiwari (2014) proposed a methodology based on shape features using ANN, with an accuracy rate of 90.00%.Hua et al. (2015) proposed a method based on deep learning techniques, Deep Belief Network (DBN) and Convolutional Neural Network (CNN), achieving a sensitivity rate of 73.30% and specificity rate of 78.70%.Kumar et al. (2015) used Stacked Autoencoders (SAE) achieving accuracy of 75.01%.
In most CADx methods, the feature extraction stage is based on shape and on texture.In the present study, we used taxonomic indexes, which were originally used in ecology, as texture descriptors.Only texture features were used to analyze the intratumor heterogeneity.According to Gerlinger et al. (2012), intratumor heterogeneity may foster tumor evolution and adaptation, and, therefore, assist in the lung cancer diagnosis.Another reason to use only texture features is because marking by experts are always greater than the real area of the nodule, making shape-based analysis on more difficult.
The taxonomic diversity index (∆) and taxonomic distinction index (∆*), which were originally applied in ecology, are used for describing the texture of nodules in benign and malignant.The first considers the abundance of the species and taxonomic relationship between them, while the second represents the mean taxonomic distance between two individuals of different species.These indexes are based on phylogenetic distance, considering the architecture of a rooted tree in the form of an inclined cladogram.The use of these indexes as texture descriptors is due to the promising results published by Carvalho et al. (2016) for classification of lung regions extracted from CT images as nodule and non-nodule.As an improvement of the methodology published by Carvalho et al. (2016), we propose a method using the same indexes applied to nodules and regions generated by internal and external masks to differentiate malignancy of lung nodules on CT images.

Methods
This section describes the steps used in the proposed methodology for the classification of lung nodules in CT images.The methodology is divided into four steps as described in Figure 1.In summary, the first step details the materials used as images of CT exams in the LIDC-IDRI database and the nodules segmentation.In the second, the feature extraction is conducted using the taxonomic indexes.After this step, the classification is completed using the SVM.Finally, the results are evaluated.
The methods were implemented in C++ language and ITK software, running on a machine with an Intel Core i7 CPU at 3.07 GHz processor, 4 GB of RAM and Windows 7 operation system.

Database
The images used in this work were acquired from the LIDC-IDRI (Armato et al., 2011) database, which is available online as a result of an association between the Lung Image Database Consortium and the Image Database Resource Initiative, and includes 1,018 CT exams.However, two factors made some of them (185 exams) inappropriate for this methodology.The first factor is related to exams that do not present nodules equal to or larger than 3 mm.The second factor is the divergence of information found in the marking file of an exam versus the information present in the DICOM header of the same exam, which invalidates the marking (Carvalho et al., 2016).Therefore, 833 exams were used.In the LIDC-IDRI database, all the images are in the DICOM format and have 512 × 512 of dimension with 16 bits per voxel.The database supplies an XML file with contour information for the slices, and several features including calcification, texture and malignancy with values ranging from 1 to 5 for lung nodules larger than 3 mm.This paper considers only the feature malignancy, used to separate the nodules in malignant and benign.The process of annotating the nodules of the LIDC-IDRI database was performed by four experts in two stages.In the first stage, each expert analyzed the exams individually.In the second stage, the results of the four analyses of the first stage were presented together to the four experts.During this stage, each of the experts re-analyzed the exams and again made their annotations independently (Nascimento et al., 2012).
With respect to nodules segmentation, information was obtained from a XML file containing the coordinates of the nodules with analysis criterion of each expert.There is no consensus imposition, all nodules indicated by the revision of experts are taken into account and recorded.Thus, it is possible to have a different diagnosis for the same nodule.This paper considers only one instance per nodule, with the objective of minimizing the impact of subjectivity in exams.The classification of malignant or benign is obtained first with computation as presented in (Jabon et al., 2009), which summarizes the features of each nodule as determined by the four experts by computing the mode or the median into one single value.According to the result of this summary, this paper considers that malignant nodules are those cases that present malignancy semantic values of moderately suspicious or highly suspicious and benign nodules are those cases that present characteristics that are highly or moderately indicative of a benign tumor.Regarding contour, the value that contains larger bounds among the four markings made during the annotating process was used.As a total, 1,405 nodules (to which 1,011 benign and 394 malignant) were obtained.Figure 2 shows an example of an expert's marking in a CT image.

Features extraction
The segmented nodule data were submitted to the feature extraction stage based only on texture.First, each image was quantized in two levels: 8 and 12 bits.A uniform quantization process (Gonzalez and Woods, 2007) was used to combine individuals (voxels) into a smaller number of "species" (Hounsfield unit -HU), enabling analysis of the image at different gray scale levels, in addition to the original image (16 bits).We defined the levels 8, 12 and 16 bits, as it was verified in tests that obtained the best results.The taxonomic diversity and taxonomic distinction indexes are used to describe the texture of objects.These indexes are based on phylogenetic distance (accounting for number of edges) based on the tree architecture.The other requirements for the generation of the tree are the species (HU) and individuals (voxels) acquired based on the bounding area approaches, which were internal and external mask.The objective of dividing the region of interest into masks is to perform a local analysis, which is useful because these areas of the mask may supply information that can distinguish nodules as benign and malignant, such as calcification, irregular margins and speculated borders (Tan et al., 2003).

Approaches with internal and external masks
The objective of this stage is to find diversity patterns in the areas close to the border of the regions and in the inner areas (Oliveira et al., 2015).These regions were generated through masks as binary images.The first internal mask was created with the binarization of the quantized volume of interest (VOI), and the second internal mask was based on successive reductions of the scale of the VOI with respect to the first one, while maintaining the center of mass.The successor masks were acquired from their previous mask following to the most internal.We defined a value of 20% for the diminution of scale, as it was verified in tests that the best results were achieved using five image masks with this scaling proportion.
The external masks are determined by the difference between the internal masks, where the first external mask is determined by the difference between the first and the second internal mask, and so on.

Phylogenetic tree
Diversity is a term often used in ecology and describes the variety of species present in a community or area.A community is defined as a set of species that occur in a certain location and at a certain time (Magurran, 2004).Phylogeny is a branch of biology responsible for the study of evolutionary adaptive relationships among species by verifying the relationship between them, to determine possible common ancestors.A phylogenetic tree, or simply phylogeny, is a tree in which the leaves represent the organisms and the internal nodes represent supposed ancestors.The edges of the tree denote evolutionary relationships (Pienkowski et al., 1998).
The calculation for two randomly chosen species in a phylogeny existing in the community is performed by means of taxonomic diversity (∆) and taxonomic distinction (∆*) indexes (Pienkowski et al., 1998).These indexes consider three essential factors: the number of species, the number of individuals and the connectivity structure of the species (number of edges).In the present work, these two indexes are used to delineate benign nodules and malignant nodules.
The taxonomic diversity index (∆) considers the abundance of the species and the taxonomic relationship among them.This way, its value expresses the mean taxonomic distance between the individuals (Pienkowski et al., 1998).
( ) where x i (i = 1,...,S) is the number of individuals of the i th species, x j (j = 1,...,S) is the number of individuals of the j th species, S represents the total number of species, n is the total number of individuals and ω ij is the distance of the species i to the species j in the taxonomic tree.
The taxonomic distinction index (∆*) represents the mean taxonomic distance between two individuals that belong to different species (Pienkowski et al., 1998).
where x i (i = 1,...,S) is the number of individuals of the i th species, x j (j = 1,...,S) is the number of individuals of the j th species, S represents the total number of species and ω ij is the distance of the species i to the species j in the taxonomic tree.Various iterations reported in the literature represent the species through trees, such as the architecture called a "rooted tree" in the shape of an inclined cladogram (Moura and Viana, 2011).The inclined cladogram is a graphical representation used to describe the phylogenetic relationship between ancestor species; these trees allow the extraction of indexes that connect diversity, richness and parenthood between species (Oliveira et al., 2015).In the present study, this architecture was adapted to find a more strict discrimination between the benign and malignant classes, which, according to Magurran (2004), a community in which the species are distributed in many types must present a higher diversity than a community in which most species belong to the same category.The architectures of trees used in this paper are presented in the following sections.
The phylogenetic tree combined with taxonomic diversity and distinction indexes are used in biological studies to compare behavior patterns of species in different areas.To implement this methodology, the first step is to derive a correspondence between the terms used in biology and those used in this tumor diagnostics.Table 1 shows this correspondence.

Tree 1: Rooted tree shaped as an inclined cladogram
With the candidate region extracted (internal and external mask), the trees are created.Figure 3 shows a tree, in which the species are HU and can vary between -32768 and +32768.A simple change was applied to make every value positive, with the goal of making the index calculations simpler [-32768, +32768] → [0,65536].
The relationship between species is considered from left to right as pointed by the red arrow in Figure 3.The relation between a species i and j has ω ij = (j -i) + 1 edges, for i = 0, and ω ij = (j -i) + 2 edges, for i > 0 (Carvalho et al., 2016).

Tree 2: Rooted tree as an inclined cladogram excluding species with no individuals
Following the same logic of the calculation of the indexes based on the previous tree, another architecture was developed to remove species with no individuals, resulting in the reorganization of the edges for the remaining species.The species distances (ω ij ) are computed according this modified structure.

Tree 3: Rooted tree as an inclined cladogram modifying the edges
The third proposed tree has the same combination process between species of Tree 1, with the only difference being the addition of a ponderation for more distant species pairs in the computation of the number of edges.The ω ij is computed by: ω ij = 2* (ji) edges, for i = 0, and ω ij = 2* (ji) +1 edges, for i > 0 (Carvalho et al., 2016).
After this step, 54 features were extracted ((1 original image + 2 quantizations) x (5 internal masks + 4 external masks) x 2 indexes) for each tree architecture described above.Figure 4   regularization properties that influence the generalization of the model to new data.This is the main reason for applying this classifier in the present study.The accuracy of a SVM model is largely dependent on the selection of the kernel parameters such as C, controls the tradeoff between margin maximization and error minimization, and γ, defines how far the influence of a single training example reaches, for a RBF.A small C makes the cost of misclassification low, while a large C aims at classifying all training examples correctly by giving the model freedom to select more samples as support vectors and a small γ means a Gaussian with a large variance and low bias, while a large γ means a low variance and large bias, implying the support vector does not have widespread influence (Duda and Hart, 1973).It was used the software LibSVM (Chang and Lin, 2011) to estimate these two parameters.All of the values in the sample were normalized between -1 and 1 to improve the performance of the SVM.This way, a shorter processing time without mischaracterizing the original value of the feature is made possible (Duda and Hart, 1973).

Database separation
In order to evaluate the methodology, the database was divided into two groups: training and test, with the following proportions: 20% and 80%, 40% and 60%, 60% and 40%, and 80% and 20%.For each group, the individuals were randomly chosen and proportionally for training and testing.The purpose of these groups is showing that the methodology performs well with the best (80% and 20%) and the worst (20% and 80%) training and testing cases.

Validation
After the conclusion of the classification stage, it is necessary to validate and discuss the results.The method uses metrics commonly used in CADe / CADx systems that are widely accepted for performance analysis of image processing-based systems.These metrics include sensitivity, specificity, accuracy and area under the ROC curve (AUC) (Duda and Hart, 1973).In addition to these metrics, the standard deviation was used to analyze the amount of variation of the four proportions of training and test.
Equations 3, 4, 5 represent the formulas used to calculate the sensitivity, specificity and accuracy, respectively.

TP Sensitivity
TP FN = + (3) TP TN Accuracy TP TN FP FN where TP is true positive, FN is false negative, TN is true negative, and FP is false positive.

Results
This section presents the results obtained with the proposed methodology with reference to the lung nodules diagnosis in CT exams, by applying a set of 1,405 nodules (of which 1,011 benign and 394 malignant).Due to the unbalanced data, different penalty parameters in the SVM formulation were used for classes' regularization, to which 1.0 for benign class and 3.0 for malignant class.The SVM performs five classifications for each proportion of training and test, described in Subsection Database Separation, that are evaluated by means of sensitivity, specificity, accuracy and AUC.The results of each tree and all trees together are provided next.
Table 2 shows the results for all experiments, including the means of accuracy, sensitivity, specificity and AUC for the five tests performed on each proportion, followed by the respective standard deviations, for each tree.The final experiment was based on a combination of the all of trees presented.
For the experiments of Tree 1, we obtained the best mean accuracy for the 60/40 proportion, with standard deviation less than one, indicating that the values have little variation among proportions.However, the standard deviation in the mean sensitivity has high value compared with other experiments, indicating a high variation for the malignant nodules classification.Tree 2 presents its best mean accuracy for the 80/20 proportion.All metrics obtained low standard deviations, indicating the results' robustness in all proportions.For Tree 3, the best mean accuracy was found for the 40/60 proportion, notwithstanding the best mean AUC was found in the 60/40 proportion.The combination of the all of trees presented its best mean accuracy for the 80/20 proportion, nevertheless the best mean sensitivity of all experiments, test's ability to correctly detect malignant nodules, was found for the 40/60 proportion.The worst results of all experiments were obtained for the 20/80 proportion, due to the small amount of nodules used in training.
The ideal CADx system has a good balance among the three metrics used for evaluation (accuracy, sensitivity and specificity), since a good methodology must be capable of successfully classifying both malignant and benign cases.Based on this criterion, the best result of the proposed methodology was obtained with the experiments of Tree 2, for the 80/20 proportion.This can be attributed to the elimination of species with no individuals.In this way, in a community in which the species actually have individuals and are organized according to them, the diversity among the species becomes higher (Magurran, 2004).Table 3 presents the SVM parameters for the best results of each tree; i.e., parameters C and γ of the five tests comprising each experiment performed.

Comparison with related works
Table 4 shows a comparison between the results found on this paper and some of the related works.It is important to emphasize that to perform a reliable comparison with these previous works, it would be necessary to use the same image database, same training and test exams, and same settings for the classifiers, among other parameters.Even if we compared studies with the same image database, only (Krewer et al., 2013) used LIDC-IDRI.The methodology proposed by Krewer et al. (2013) shows a value superior to those presented here for all experiments for sensitivity, specificity and accuracy.However, our methodology used 1,405 samples of nodules whereas Krewer's methodology used only 33 samples of nodules, and yet obtained results close to his work.
Analysis of published studies revealed that the proposed methodology achieves results comparable to the most reliable previously reported studies, as shown in Table 4.However, sometimes, some values are lower for some metrics, indicating that the experiments performed for the classification of lung nodules as benign or malignant appear promising.This encourages further study, even for the use in conjunction with other existing methodologies.proposed a method for classification of lung regions extracted from CT images as nodule and non-nodule using different diversity indexes such as taxonomic diversity and taxonomic distinction, to improve the performance of CADe system.As a next step to improve and incorporate into CADx systems, we proposed the use of taxonomic diversity (∆) and taxonomic distinction (∆*) indexes to classify and differentiate lung nodules into benign or malignant, acting as a second opinion for the experts in the final diagnosis.
The proposed method was evaluated over 1,405 nodules (of which 1,011 benign and 394 malignant) from the LIDC-IDRI database, which were divided 1.The use of taxonomic indexes ∆ and ∆* combined with phylogenetic trees led to good results in terms of classification of lung nodules as benign and malignant.
2. The use of uniform quantization to represent the image at different gray scale levels (8 and 12 bits, besides the original image) produced better results than using only the original image (16 bits).
3. The use of regions extracted based on internal and external masks produced good results when they were combined.
4. Tree 2 achieved the best result of the proposed methodology, with a mean accuracy of 88.44%, mean sensitivity of 84.22%, mean specificity of 90.06% and mean AUC of 0.8714.
5. Finally, it is important to highlight that the LIDC-IDRI database is extremely complex and diverse, containing countless different cases of lung nodules.This database has exams that were extracted by various tomography methods, leading to difficulty in the detection, classification or even diagnosis through CADe / CADx systems.
All of the above-mentioned attributes aggregate value to this methodology.The properties of the texture analysis through taxonomic indexes of diversity (∆) and distinction (∆*) combined with phylogenetic trees showed good response to the experiments.Additionally, the complexity of the LIDC-IDRI database allows a more precise conclusion on the results.
Finally, the results demonstrate the promising performance of the texture extraction techniques by the indexes presented.Another important result was the creation of the phylogenetic tree.In other words, the usage of this tree performed well in the separation of benign and malignant nodules.Although the database used is highly robust and ensures great diversity of nodules to be analyzed, more tests are necessary in other databases to improve the methodology, making it more robust and generic.The methodology presented in this work could integrate a CADx system to be applied in the diagnosis of lung nodule, making the analysis of exams by experts more efficient and less exhaustive.

Figure 1 .
Figure 1.Main blocks of the proposed method.

Figure 2 .
Figure 2. Example of a marking on a CT slice.

Figure 3 .
Figure 3. Rooted tree in the shape of inclined cladogram.

Table 1 .
Correspondence between biological terms and adapted terms used in this paper.

Table 2 .
Overall results of the experiments.

Table 3 .
Carvalho et al. (2016) best results of each experiment.In our research group,Carvalho et al. (2014)proposed a method for automatic detection of lung nodules, using quality threshold clustering, genetic algorithm and diversity indexes such as, Simpson's and Shannon's indexes.In order to reduce the number of false positives,Carvalho et al. (2016)

Table 4 .
Comparison with other publications with respect to the classification of lung nodules in benign and malignant tumors.the following training and testing proportions: 20/80, 40/60, 60/40, 80/20.The experimental results allowed the formulation of the following conclusions: into