Breast cancer diagnosis based on mammary thermography and extreme learning machines

Introduction: Breast cancer is the most common cancer in women and one of the major causes of death from cancer among female around the world. The early detection and treatment are the major way to healing. The use of mammary thermography in Mastology is increasing as a complementary imaging technique to early detect lesions. Its use as a screening exam to identify breast disorders has been investigated. The aim of this study is to investigate the behavior of different classification methods while grouping the thermographic images into specific types of lesions. Methods: To evaluate our proposal, we built classifiers based on artificial neural networks, decision trees, Bayesian classifiers, and Haralick and Zernike attributes. The image database is composed by thermographic images acquired at the University Hospital of the Federal University of Pernambuco. These images are clinically classified into the classes cyst, malignant and benign. Moments of Zernike and Haralick were used as attributes. Results: Extreme Learning Machines (ELM) and Multilayer Perceptron networks (MLP) proved to be quite efficient classifiers for classification of breast lesions in thermographic images. Using 75% of the database for training, the maximum value obtained for accuracy was 73.38%, with a Kappa index of 0.6007. This result indicated to a sensitivity of 78% and specificity of 88%. The overall efficiency of the system was 83%. Conclusion: ELM showed to be a promising classifier to be used in the differentiation of breast lesions in thermographic images, due to its low computational cost and robustness.


Introduction
For decades, breast cancer has been the most common type among women. In Brazil, the breast cancer mortality rates remain high, as the disease is still diagnosed in advanced stages. Even though Mammography, Ultrasonography, Magnetic Resonance and clinical breast examination (ECM) are the most widely used and indicated methods in mastology, there are still many problems associated to them. Sometimes they are not enough to identify breast lesions in women with dense and surgically altered breasts or in women under the age of 40 years. In addition to it, some of these exams are extremely uncomfortable to the patient and there is concern about the risk associated to the use of ionizing radiation (American..., 2015;Instituto..., 2015).
In search for imaging techniques complementary to the above mentioned, thermography started being used in mastology in 1982, but at the time specialists discredited the method and therefore it was not recommended for breast diseases diagnosis. With the technological improvement of the thermographic cameras, many tools using image processing and image analysis could be developed to facilitate the detection of changes in the breast's images, so thermography became more popular and continued to be explored as a complementary screening test in mastology (Milosevic et al., 2015;Walker and Kaczor, 2012).
Breast cancer diagnosis based on mammary thermography and extreme learning machines temperature map of the surface. When applied to medicine, the distribution of temperature gives several physiological information in a way that highly metabolic tissues appears in the images as warmer spots, so lesions such as cancers and places where angiogenesis is happening may be seen through thermograms. Regarding to the identification of lesions in the breast, the lack of depth has not been considered to be a limitation of this technique since these accelerated metabolic activities tends to increase the surface temperature of the breast (Etehadtavakol and Ng, 2013).
According to Etehadtavakol and Ng (2013), breast thermography has been shown to be efficient during early stages of tumor growth, since physiological changes usually precedes anatomical changes. Moreover, it is a completely non-contact method, with no form of radiation and compression and may be used for all women of all ages, including pregnant and breastfeeding women. This technology also works better to women with dense/fibrocystic breasts than the other screening methods vastly used nowadays.
A limitation of this method is the fact that it is easily influenced by changes in the environment, so aspects such as room temperature and humidity have to be severely controlled to guarantee exam validity.
In view of the above, several studies have been carried out on the application of thermographic images in mastology. Resmini et al. (2012), which perform several feature extractions, these features were analyzed using Support Vector Machines (SVM), k-Nearest Neighbors (KNN) and Naïve Bayes classifiers to detect the existence of lesions in thermographic images of the breast. In this work, the authors reach an approximate accuracy of 90%, and an area below the ROC curve close to 0.9. Aguiar et al. (2013) report several extracted features and the multilayer perceptron classifier was used for the detection of breast lesions in thermographic images and presented 75% of correctly classified regions. Belfort et al. (2015) perform feature extraction using the Artificial Crawlers model. The SVM classifier was used and the process presented 78% accuracy, 50% sensitivity and 84% specificity. Another work, from Acharya et al. (2012), describes the extraction of sixteen features, but uses only four, as the authors defined these as clinically significant in comparison with the others. The results obtained were 88.10% accuracy, 85.71% sensitivity and 90.48% specificity.
The aim of this work is to investigate the performance of different classification methods while grouping the thermographic images into one of the groups: cyst, benign lesion and malign lesion by using Haralick and Zernike descriptors for attributes extraction. Classifiers based on artificial neural networks, decision trees and Bayesian classifiers were used to perform the classification. To assess classification, rates of correctly classified instances and kappa indexes were compared. Acharya et al. (2012) evaluated the feasibility of using thermal imaging as a potential tool for detecting breast cancer. Field data were collected from the Department of Diagnostic Radiology, Singapore General Hospital using non-contact thermography. Infrared thermograms were acquired using NEC-Avio Thermo TVS2000 MkIIST System 3.0-5.4 μm short wavelength (30 frames/sec), Stirling cooler, InSb detector with (256×200) elements (Japan), which has a measuring accuracy of ±0.4% (full scale) and temperature resolution of 0.1 °C at 30 °C black body, with the instrument placed 1 m away from the chest with lens (FOV 15°×10°, IFOV 2.2 mrad) attached. 90 patients were chosen at random to undergo the thermography examination. Examination was done in a temperature-controlled room with the temperature range of 20-22 °C (within ±0.1 °C). Humidity of the examination room was maintained at 60±5%. The patients were required to rest for at least 15 min to stabilize and reduce the basal metabolic rate, which will result in minimal surface temperature changes, and therefore, satisfactory thermograms. Also, the patients were asked to wear a loose gown that does not restrict airflow. Furthermore, it was ensured that the patients were within the recommended period of the 5th to 12th and 21st day after the onset of menstrual cycle since during these periods the vascularization is at basal level with least engorgement of blood vessels. In this work, we have used a total of 50 thermograms, where 25 thermograms were from cancer patients (age: 51±8 years) and 25 were from normal subjects (age: 46±10 years).

Related works
In the malignant class, 15 patients had stage III cancer and rest had stage II cancer. 50% of the lumps were found in the upperouter quadrant, 35% in the area behind the nipple, and 15% were located in the upper-inner quadrant. We have analyzed the cancerous breast in each of the 25 malignant cases and one normal breast in each of the 25 normal cases. Acharya et al. (2012) demonstrated the utility of breast surface temperature as an indicator for malignancy. Since a thermogram presents a visual representation of 'hot spots' of the breast, and hence, the interpretation may be subjective. Therefore, Acharya et al. (2012) extracted texture features from the thermograms in order to feed into classifiers for automatic classification. This makes the interpretation more objective and automatic, and therefore, inter-observer variability of diagnostic prediction is highly reduced. Acharya et al. (2012) have extracted 16 texture features: homogeneity, energy, entropy, moment1, moment2, moment3, moment4, entropy, angular second moment, contrast, mean, short runs emphasis, long runs emphasis, run percentage, gray level non-uniformity, and run length non-uniformity. But, only four features: moment1, moment3, run percentage, and gray level non-uniformity were selected as they were clinically significant.
By using the SVM classifier and the texture features, Acharya et al. (2012) obtained a classification accuracy of 88.10% in differentiating normal and malignant breasts. The sensitivity and specificity were also considerably high (85.71% and 90.48%, respectively). Hankare et al. (2016) present color analysis as per the classification on the basis of segmentation. The distinguishable features which are used to detect abnormalities are based upon the variations shown as per the image shape of the hottest regions and it is confirmed by comparing with professional diagnoses. The authors claim their results demonstrate the suitability of infrared thermography as a diagnostic tool in breast cancer detection. Hankare et al. (2016) employ an image segmentation approach using K-means clustering technique based on color features from the images. Segmentation of hot region is carried out into two steps. In first step, the pixels are clustered based on their color and spatial features, where the clustering process is carried out. They claim the advantages of their proposed method are: 1) It can segment the cancer regions from the image accurately; 2) It is useful to classify the cancer images for accurate detection; 3) Early stage detection of cancer from images. However, Hankare et al. (2016) present only qualitative results, based on color distribution. Since pseudo-color maps are not unique, this approach could not be generalized. Araújo et al. (2014) evaluated the feasibility of using interval data in the symbolic data analysis (SDA) framework to model breast abnormalities (malignant, benign and cyst) in order to detect breast cancer. SDA allows a more realistic description of the input units by taking into consideration their internal variation. In this direction, a three-stage feature extraction approach is proposed. In the first stage, four intervals variables are obtained by the minimum and maximum temperature values from the morphological and thermal matrices. In the second one, operators based on dissimilarities for intervals are considered and then continuous features are obtained. In the last one, these continuous features are transformed by the Fisher's criterion, giving the input data to the classification process. This three-stage approach is applied to a Brazilian's thermography breast database and it is compared with a statistical feature extraction and a texture feature extraction approach widely used in thermal imaging studies. Different classifiers are considered to detect breast cancer, achieving 16% of misclassification rate, 85.7% of sensitivity and 86.5% of specificity to the malignant class.
The thermograms used by Araújo et al. (2014) were acquired with a FLIR S45 infrared (IR) camera. The analysis was performed using a data set obtained from a patient group (size n = 50) of the Hospital of the Federal University of Pernambuco (UFPE), Recife, Brazil. This data set consists of patients aged greater than 35 years with a suspected mass, whose diagnoses were confirmed by clinical examination and followed by ultrasound, mammographic and biopsy exams. A standardized protocol was used for the infrared image acquisition. For this purpose, an apparatus was designed and constructed. A protocol for image acquisition was generated and it is described in Bezerra et al. (2013). This apparatus consists of two rails used for the displacement of a small carriage that supports the tripod, that is attached to the infrared camera. A support for the patient's arms made of steel, aluminum, and wood was fitted to a swivel chair. This support has a movable horizontal bar designed to move up and down. The bar is used to position the patient's hands allowing four different positions so as to comfortably accommodate patients of different heights (Bezerra et al., 2013).
Thermographic imaging should be performed in a controlled temperature room to avoid or minimize the thermal interference from external sources. To achieve better thermal conditions, the patients were subjected to an acclimatization period at least of 10 min, in order to their bodies reach the thermal equilibrium with the room. Considerations for the environment conditions as well for the patients are described in Bezerra et al. (2013). The infrared images used in this work were obtained from the frontal planes of each patient. Belfort et al. (2015) used the same thermogram database employed by Araújo et al. (2014), but limited to 34 images, where 15 images for mammary lesion (benign or malignant) and 19 for healthy patients. Colored JPEG images were converted to grey levels and, afterwards, the regions of interest are manually extracted in left and right mammary regions. These two ROIs are then registered using b-splines (Klein et al., 2007) and used to generate a dissimilarity map. From this dissimilarity image, Belfort et al. (2015) used Artificial Crawlers Model for feature extraction (Gonçalves et al., 2014). The generated feature vectors are then classified using linear Support Vector Machines, giving an accuracy of 78%, sensitivity of 50%, and specificity of 84%.
Our proposal is based on the investigation of texture and shape descriptors to represent mammary thermograms. We used the same database studied by Araújo et al. (2014). Since we are interested in lesion classification, as Araújo et al. (2014), we also considered the following classes: malignant, benign and cyst. Acharya et al. (2012), Belfort et al. (2015) and Hankare et al. (2016) are interested in lesion detection. However, differently from Araújo et al. (2014), our feature extraction is based on the combining texture and shape features using Haralick moments and Zernike features, respectively, extracted from grey-level temperature matrices generated from pseudocolor JPEG images. We also tested more sophisticated classifiers, like multi-layer perceptrons, random forests, and support vector machines. Our proposal returned 88.10% accuracy, 85.71% sensitivity and 90.48% specificity, without manual intervention, against the results of Araújo et al. (2014), which returned 84% of accuracy, 85.7% of sensitivity and 86.5% of specificity for the malignant class.

Methods
The images that feed the system came from thermographic images acquired at Hospital das Clínicas, Federal University of Pernambuco, where cyst, malignant and benign classes are selected (Araújo et al., 2014;Bezerra et al., 2013). For the pre-processing step of the images, the RGB-JET conversion was performed to temperature gray levels and the post-processing step was performed to balance the classes. The Zernike and Haralick moments are used to extract attributes based on geometry and texture. The next stage performs the training and subsequent classification with several classifiers based on artificial neural networks, decision trees and Bayesian classifiers. Finally, the performance of the system was evaluated through accuracy and the Kappa index. Figure 1 is a flowchart of the proposed system.

Images acquisition
The thermographic images used in this study were acquired at Hospital das Clinicas da Universidade Federal de Pernambuco (University Hospital of the Federal University of Pernambuco, HC-UFPE, Brazil) by using a FLIR infrared camera of the model S45.
In order to avoid significant changes in patients positions during the acquisition process, a mechanical device was built, this device is shown in Figure 2 and is further described in Oliveira (2012).
The car is connected to the rails in order to move the camera closer or further away from the patient; furthermore, the arms support is connected to the chair through two (2) horizontal bars, so they rotate together to change the position of the patient.
Eight (8) JPG images were obtained for each patient, each image was acquired from a different position, such as follows: T1 (frontal with hands on waist), T2 (frontal with hands raised, holding the bar located above the head (Figure 2)), MD (right breast only), ME (left breast only), LIMD (internal lateral of the right breast), LIME (internal lateral of the left breast), LEMD (external lateral of the right breast) and LEME (external lateral of the left breast). Figure 3 illustrates examples of images in each of the positions.
The image acquisition protocol was first described in Oliveira (2012) and is illustrated in Figure 4, below.

Creation of the thermographic breast image database
In this work, the images in all the positions were used (T1, T2, MD, ME, LIMD, LIME, LEMD and LEME). These images were divided into malign, benign, cyst and normal classes, according to specialists diagnoses, which were given by using consolidated methods for each case. The malign class comprises of all cases of breast cancer proven by biopsy. The benign class refers to cases of benign tumors, also proven by biopsy. The cyst class includes cases with this diagnosis proven by fine needle aspiration (PAAF) or ultrasonography (Silva, 2015). The final database contains 1052 images.
Considering that the purpose of this approach is to verify the classification of an existing lesion, the normal class (227 images) was removed from the database for this study. Therefore, only three classes were used: malign, benign and cyst. For this study, images were taken from 100 female patients; 219 cyst images were used, 371 images with benign lesions and 235 images containing malignant lesions.  (1) the trails used to move the camera support car, there are two (2) of them and they are placed on the floor; (2) plate-shaped car to support camera's tripod; (3) swivel chair where the patient is placed on; (4) arms support, which consists of a horizontal bar that moves vertically so the patient can put the hands up during the exam. Res. Biomed. Eng. 2018 March;34(1): 45-53 49/53 Figure 3. Example of image positions: T1 and T2 are associated to frontal acquisition with arms curved down and up, respectively; MD and ME corresponds to frontal acquisition from center to right, and from center to left, respectively; LEMD and LEME corresponds to right and left medio-lateral acquisition, in this order; LIMD and LIME are almost the same as LEMD and LEME, respectively, but closer.

Preprocessing
The thermal image uses pseudo-coloring techniques which, in this case, were used in the acquisition of the JET color palette. Therefore, it was necessary to use RGB-JET conversion to Grayscale.

Attributes extraction
The definition of the feature extraction method is one of the most important factors for computational system performance in support of the Diagnostic (Cheng et al., 2006). According to the characteristics, the attributes were based on geometry or texture. We used the Zernike moment attribute extractors based on the extraction of geometry and the Haralick moment based on the extraction of texture features. The first are projections of the image function in orthogonal basis functions and only the rotation is invariant (Shanthi and Bhaskaran, 2013). The second one is a value calculated from the co-occurrence matrix of the image, which quantifies some characteristics of the variation of the gray levels of these images (Cheng et al., 2006;Shanthi and Bhaskaran, 2013). Santana MA, Pereira JMS, Silva FL, Lima NM, Sousa FN, Arruda GMS, Lima RCF, Silva WWA, Santos WP Res. Biomed. Eng. 2018 March;34(1): 45-53 50 50/53

Post-processing
After the extraction of the attributes, we performed a class balancing, due to the thermal images database having varying amounts of images from the different classes. Therefore, it is necessary to use the linear balancing technique.

Classification
After the extraction of attributes and class balancing, these attributes are used as input for the classifiers that will be trained and then later perform the classification of breast lesions (malignant, benign and cyst). In this article we present a comparison between eight classifiers in order to verify their capacity to classify lesions in the breast in thermographic images. The classifiers used were Bayes Network, Naive Bayes, Support Vector Machines (SVM), Knowledge Tree J48, Multi-Layer Perceptron (MLP), Random Forest, Random Tree, and Extreme Learning Machines (ELM) (Breiman, 2001;Cheng and Greiner, 2001;Geurts et al., 2006;Haykin, 1999;Librelotto, 2014).
During the tests, we perform the training using a percentage split approach, in which part of the database is used for training while the rest is just used for test, to verify the quality of the training step. For all the classifiers above mentioned, tests were performed using percentage split and a k-folds cross validation method (Jung and Hu, 2015). To first tests the database was randomly divided in a way that 75% of the database was used for training and 25% for testing. In a second time, cross-validation method with k equals to 10 folds was used to perform the tests; in this method the dataset is randomly divided into k samples and these samples are used one by one to perform both training and testing. At the end, all database end up being used for both steps of classification.
The classification stage was performed using the free software Weka (Waikato Environment for Knowledge Analysis), version 3.8, developed at the University of Waikato, New Zealand. We used the configuration of the Bayes Net, Naive Bayes, SVM, J48, MLP, Random Forest, and Random Tree classifiers as available in the Weka 3.8 library; Table 1 shows the parameters we chose to change to each classifier.
For the ELM classifier we performed tests using the following configurations: 100, 200, 300, 400 and 500 neurons in the hidden layer with linear kernel, grade 2 polynomial, grade 3 polynomial, grade 4 polynomial, and grade 5 polynomial.
There were performed 20 tests per configuration of each classifier.

Performance evaluation
Finally, the system performance is evaluated through the average accuracy and average Kappa index for each configuration. Accuracy is the percentage of correctly classified data considering the classes also used correctly (Landis and Koch, 1977). The Kappa index is a statistical method to assess the level of agreement or reproducibility between two sets of data; it can vary between -1 and 1. We used Cohen's Kappa index. The interpretation of the Kappa index is show in Table 2.

Extreme Learning Machine (ELM)
ELM consist of a training approach for single-tiered neural networks. This proposed learning technique is for training single-layer feedforward neural networks that accelerates learning through the random generation of input weights and the hidden layer (Huang et al., 2006).

Results
We acquired results using percentage split of 75%. Classifiers performance was assessed through the values for accuracy and Kappa indexes, which may be seen at the tables below. For all the configurations, we performed tests using Haralick extractor only (Table 3), Zernike only (Table 4) and using both extractors at the same time (Table 5). Number of neurons in the hidden layer: 100, 200, 300, 400 and 500; Polynomial kernel: degrees 1, 2, 3, 4 and 5 * 'a' = (attribs + classes) / 2 = 85 hidden layers. Tables 6 to 8 show the results of accuracy and Kappa index obtained for the tests from 10-folds cross-validation method using only Haralick extractor, only Zernike extractor and combining both extractors, respectively.
Based on the tables above, it was verified that when using only Haralick as attributes extractor we obtained better results using ELM as the classifier, which were 65.95% of accuracy and a Kappa index of 0.4892, for the tests using percentage split and accuracy of 71.22% and 0.6676 for Kappa, for tests from cross-validation method. On the other hand, MLP classifier showed to be more efficient in the cases in which we used only Zernike as extractor and when we combined Haralick and Zernike.
The best result was obtained when we associated Haralick and Zernike attributes extractors. In this situation, 73.38% of the instances were correctly classified using MLP as the classifier, resulting in a Kappa index of 0.6007 when the percentage split approach was used, and we obtained the maximum value of 76.01% correctly classified instances and Kappa of 0.6402, also using MLP.
Qualitatively, this result showed sensitivity around 78% and specificity of 88% in the identification of malignant lesions through thermographic images. Overall, these values indicate that the system had an efficiency of 83%, which is close to the maximum value of 1 (one) implying in a satisfactory performance.

Discussion
The results presented in Tables 3 to 8 showed that the best values of accuracy and Kappa index were obtained by classifiers based on artificial neural networks. The classifiers used were selected because they were able to achieve good results according to the nature of the data. Bayes' naive classifier achieves good results when attributes are statistically independent. Thus, decision boundaries can be modeled through products of one-dimensional Gaussian distributions. Thus, evaluating the performance of Bayes' naive classifier also implies indirectly evaluating the degree of independence of  attributes. Bayesian networks are important to investigate how decision boundaries can be modeled by fairly complex rules. Connectionist learning machines, such as artificial neural networks and support vector machines, return good results when the classification problem is easily generalizable. Decision trees, in turn, model the situation in which data are difficult to generalize, requiring more ad hoc classifiers, composed of many complex rules. Random Forest classifiers are in an intermediate position, and can be used both when the data are easier to generalize (many trees) and more specific (few trees), since they are based on knowledge tree sets. The generalization capacity of the classifiers is best measured when using cross-validation, since the random division of the data set into training and testing allows to evaluate the generalization capacity without subjecting the classifier to overfitting. Table 6, with results for use only of texture attributes (Haralick) shows that Bayesian and decision tree-based classifiers had similar accuracy scores, around 50%, while support vector machines and neural networks (MLP and ELM) had a performance of 60% and 71%, respectively. For the kappa index, the performance difference is even more evident, with clear advantage for ELM networks. This shows that, from the clinical point of view, although they are still not enough to diagnose breast lesions, the texture attributes have a great contribution to the results.
Analyzing only the Zernike attributes, the classifiers performance was proportionally similar, but with the accuracy of the MLP greater than that of the ELM. The situation is reversed for the Kappa index: higher for ELM than for MLP. When we combine texture and shape attributes, joining moments of Haralick and Zernike, the situation repeats, but with little advantage to MLP over ELM in the case of accuracy. However, the advantage of ELM in relation to MLP considering the Kappa index is quite reasonable. Considering that the ELM has the advantage of rapid training, the results point to the use of neural networks of random weights as important tools for the construction of intelligent systems to support the diagnosis of breast lesions.
This article presented a proposal of a classification method of breast lesions, using features extracted from the texture and geometry of lesions in thermal images, and making comparisons with several classifiers. The use of Zernike alone proved to be very promising in this application and the less satisfactory results occurred when only Haralick attributes were used. However, the best results were obtained by combining Haralick and Zernike moments, what indicates that both texture and geometry information are relevant to differentiate breast lesions through thermographic images. In general, ELM and MLP proved to be quite efficient classifiers for classification of breast lesions in thermographic images.
Using 75% of the database for training, the maximum value obtained for accuracy was 73.38%, with a Kappa index of 0.6007. These results increased to 76.01% of accuracy and Kappa of 0.6402 when using 10-fold cross-validation method to perform the tests. The overall efficiency of the system was 83%.
Furthermore, this study obtained significant and promising findings using ELM as the classifier, which is a much less computational costing machine, and its use may decrease the time to perform the classification without losing classification quality. Future studies may optimize the obtained results by testing other configurations for the classifiers, specially the extreme learning machine, which may become more efficient for the classification of breast lesions in thermographic images than the most commonly used classifiers.