INTRODUCTION

The worldwide purchase of papaya (*Carica papaya* L.) has shown significant growth over the last few years. In 2014, production in Brazil was approximately 1.6 million tonnes of fruit from a harvested area of approximately 32 million hectares (^{IBGE, 2014}).

After the harvest, the quality of the papaya can decline considerably, due to adverse environmental and physical conditions during transportation, storage, distribution and sales. Moreover, extreme or volatile temperatures combined with mechanical damage result in low quality fruits which directly affects their color (^{FABI et al., 2010}).

Peel color is considered the first variable that customers assess and is associated with specific taste or use which may determine its acceptance or rejection in the market (^{PICHA , 2006}). Furthermore, it can be used to determine the maturation stages associated with the harvesting and consumption features of several fruits (^{FAGUNDES; YAMANISHI, 2001}).

In order to standardize the color hue measurement process, simple analysis techniques have been developed, such as digital image analysis, which have overcome the deficiencies of the most commonly used methodologies in postharvest, namely the ones which use visual scales or a colorimeter (^{MENDOZA; AGUILERA,2004}; ^{DARRIGUES et al., 2008}). Moreover, they offer a way to measure other physical attributes of the fruit, such as its shape (^{RODRÍGUEZ et al., 2010}).

In a study involving the assessment of the color of pork, ^{O’Sullivan et al. (2002)} concluded that the analysis of digital images can lead to more representative measurements of the color of the meat when compared to a colorimeter, because the images furnish information drawn from the entire surface of the meat.

In 2004, Mendoza and Aguilera proposed the application of digital image analysis to the classification of bananas, when this fruit presents significant natural variability in samples from the same batch. Such variability was also observed by ^{Oliveira et al. (2002)} in papaya cv. Golden.

This indicates that there is great potential for using digital images to differentiate the many maturation stages of several fruits, where digital images of all a fruit’s peel region can furnish information about this natural variability, and may therefore be taken into account in a statistical model.

It is also important to emphasize the work done by ^{Segnini et al. (1999)}, who developed a procedure for quantifying color in potatoes using an analysis of images obtained by video; by ^{Yam and Papadakis (2003)}, whose study showed the advantages, disadvantages and applications of the different artificial vision systems; and by ^{Darrigues et al. (2008)}, who compared the measurements obtained by different scanners with those of a colorimeter in assessing color patterns, when applying image analysis methodology to the pulp of several fruits.

When using artificial vision systems, such as the scanner, each pixel of the image is represented by combinations of the components R (red), G (green) and B (blue), and each one of them belongs to the [0,1] interval or to the {0,1,2,…,255} set.

To verify the relationship between the R, G and B color components obtained through a colorimeter and a scanner, ^{Segnini et al. (1999)} carried out an experiment in which the same color pattern was assessed by both devices and concluded that the relationship is practically linear. The coefficients of determination, R2, were always greater than 0.92.

However, the use of the RGB color space can lead to problems when the intention is to compare object colors as if the comparison had been done by human vision. For example, let (R1, G1, B1) and (R2,G2,B2)) be two colors belonging to the RGB color space. The Euclidian distance between them is given by v(R2-R1)2 + (G2-G1)2 + (B2-B1)2. Now suppose that two colors are separated by a fixed distance d. Depending on the location of the colors in the RGB space, the human eye may or may not be capable of distinguishing them.

To correct this problem, the devices used to measure color usually use color systems created by the Commission Internationale de L’Eclairage (CIE) in 1976 (^{COMMISSION INTERNATIONALE DE L’ECLAIRAGE, 1978}). Such systems were created from criteria based on the perception of the human eye. In the CIEL*a*b color space, the L* component indicates the color luminosity, varying from 0 (black) to 100 (white) and the a* and b* components belong to the [-60, 60] or [- 100,100] interval. In general, -a* indicates the color green and +a the color red, +b the color yellow and +b* the color blue. In the CIEL*C*h system, the color components are given by L*, C* and h*, with L* being the luminosity (L* ? [0,100]), C* the intensity (C* ? [0,60] or C* ? [0, 100]) and h* the hue (h* ? [0,360º]).

On the other hand, ^{Darrigues et al. (2008)} state that one of the problems of digital image analysis is the need to calibrate the equipment (scanner, video cameras, etc.). However, this is usually a procedure that is easy to carry out.

^{Mendoza and Aguilera (2004)}, for example, suggested the use of banana samples at different stages of maturation, in such a way that each banana was assessed by a colorimeter and a camera to obtain the images. Next, through linear regression analysis, the authors obtained the calibration curves for the L*, a*, and b* components. However, the calibration method presented a low coefficient of determination for the b color component (R2 = 0,609).

To solve this problem, ^{Darrigues et al. (2008)} suggested the use of 28 color patterns from Munsell’s chart (^{MUNSELL COLOR COMPANY INC, 1952}) to measure the color of tomato. From these color patterns, they obtained determination coefficients greater than 0.92 for the L*, a* and b* components.

However, in post-harvest, the color hue is the most cited component in the literature when assessing color in fruit (^{HUANG et al., 2006}; ; ^{SMILANICK et al., 1995};^{PULUPOL et al., 1996}; ^{PÉK and HELYES, 2010}; ^{SILVA et al, 2013}) and the analysis of this color component is carried out as if it were a measurement belonging to the real line. According to ^{Zar (2010)}, this type of analysis can lead to wrong conclusions, since the hue is an angular measurement, described in radians or degrees, and consequently belongs to a circular scale. Thus, analyses of the color hue must be carried out following the specific methodologies applied to circular data, such as those proposed by ^{Mardia (1972)}, ^{Batschelet (1981)} and ^{Fisher (1996)}.

For example, consider that two red color points were sampled on the same fruit, and that the hues that describe these points are 10° and 350°.

Then, the arithmetic mean is 180°, corresponding to the green hue. Using the circular mean, however, yields X = 1/2 (sen 10º + sen 350º) = 0 and, Y= 1/2 (cos 10 º + cos 350º ) = 0 and, consequently, h = 180/? 0 = 0º, located in the trigonometric circle (or color circle) exactly between the 10º and 350º values.

Therefore, the circular mean should be used as the use of the arithmetic average can provide incorrect results for the mean hue.

In our study, we presented initially the calibration curves for the determination of the peel or pulp color of any fruit based on the color patterns of Munsell’s color charts for plant tissue (^{MUNSELL COLOR COMPANY INC, 1952}). Next, we presented descriptive statistics techniques and specific statistical inference procedures for circular data, such as those developed by ^{Batschelet (1981}), ^{Fisher (1996)} and ^{Jammalamadaka and Sengupta (2001)}. We then applied such techniques to the peel hue data of papaya (*Carica papaya* L.) cv. Sunrise Solo. Finally, we compared the instrumental methodology (colorimeter) to the digital image analysis methodology (flatbed scanner).

MATERIAL AND METHODS

The experiments were carried out in 2010 at the Department of Exact Sciences and Vegetal Production of ESALQ/USP, Piracicaba, São Paulo.

An HP Scanjet G2410 (HP, Palo Alto, CA) flatbed scanner with a resolution of 200 pixels per inch (PPI) and a Minolta CR-300 (^{KONICA MINOLTA, 1991}) colorimeter were used.

Experiment 1: 297 color patterns based on Munsell’s color chart of vegetal tissues were used (Figure 1) and each pattern was measured by a colorimeter and a scanner. It is important to mention that this chart was projected to reproduce the exact color of vegetal tissue which is essential to the calibration procedure.

Two points were observed at the central region of each color pattern using the colorimeter.

Thus, for each pattern the arithmetic mean of the L*, a* and b* and C* components were calculated based on the respective observed values of each one of them. Next, the mean value of the h* component was calculated using the circular mean of observations.

Then, each color pattern was digitalized by the scanner and saved as a TIFF image file (.tif), which, despite having a larger file size than the jpg format, preserves the original quality of the image and is recommended when working with dark objects (^{RODRÍGUEZ et al., 2010}).

Experiment 2: One papaya fruit from the cv. Sunrise Solo was stored in a cooling chamber at 18°C at 80% ± 5% relative humidity. The fruit’s peel was assessed daily over a period of 19 days using the scanner at a resolution of 200 PPI and the colorimeter. Because the interest of this experiment lies in the comparison between the scanner and colorimeter methodologies when assessing the mean hue of the fruit peel, the use of only one papaya fruit is sufficient, given that both devices were used to assess the same fruit.

Throughout the assessments made with the colorimeter, four points in the equatorial region (Figure 4) were sampled, as recommended by ^{Oliveira et al. (2002)}.

When assessing the fruits using the scanner, both faces of the fruit were digitalized, one being named face A and the other, face B (Figure 2). Each image was saved as a TIFF file. In general, the image of the whole peel region provided around 200 to 250 thousand pixels.

In both experiments, to avoid external light interference when the object was digitalized, a cardboard box coated internally and externally with black cloth was built, as recommended by ^{Darrigues et al. (2008)}.

The use of the box also minimized shadowing effects, as well as providing a dark background for the image. We were careful to ensure that color luminosity, intensity and hue did not coincide with those of the object of interest. In practice, the use of a black cloth was proposed because as the luminosity is closer to zero, it is easier to separate the object from the dark background.

The selection of the object of interest can be made using the autoThreshold function from the rtiff package version 1.4.1 (^{KORT, 2012}) of the R computational environment (^{R CORE TEAM, 2016}).

Hence, for each pattern K pixels from the digital images were considered, from which the Rk, Gk and Bk values for each pixel were read (k = 1,2,3, ..., K) After that, the conversion from the RGB color space to CIEL*a*b* and from to CIEL*a*b* CIEL*C*h* had to be carried out. It is important to note that the method of conversion from RGB values to CIEL*a*b* depends on the type of illumination, angle of observation and type of RGB color space. In this case, we used D65 illumination with an observation angle equal to 10º and standard RGB color space (sRGB). This conversion must be performed, according to ^{Darrigues et al. (2008)} and ^{Zar (2010)} in three steps: i. Conversion from values of the RGB system to the XYZ system.

Let (Rk, Gk, Bk) be the color of the k-th pixel of the image, k = 1,2,...K . We then have e01,where Mt is given by:

f (.) is a function given by : e03

or e04

ii. Conversion from values of the XYZ system to CIEL*a*b*.

In this case,e05

and ,e06

with e07

where Yn, Xn and Zn are values that depend on the type and illumination angle of the scanner. Hence, for the D65 illumination and a 10º angle, used in this work, Xn = 95,047, Yn = 100.000 and Zn = 108.883. Note that the values of a* and b* belong to the interval from -100 to 100 and, then, to transform to the interval from -60 to 60, we must multiply the values of these components by 60/100.

iii. Conversion from the values of the CIEL*a*b* system to values of the CIEL*C*h* system:e08

and e09

where hk* is presented in degrees and k = 1,2 ..., K. Once the (Lk,ak,bk or (Lk,Ck,hk) colors are obtained for each one of the K pixels, we must calculate the average values for each of their components. For components L*, a*, b* and C*, ^{Zar (2010)} suggests the use of the arithmetic mean and for component h*, the circular mean (h) , that can be obtained by replacing ak by X = 1/k SK k=1sin(hk) and bk by Y = 1/k SK k=1cos (hk) in expression (1).

Given a specific color component the i-th pattern of the color chart (i = 1,2...,M) and its average related to the scanner and the colorimeter were now considered, and a table such as Table 1 was formulated.

Once the M mean values had been obtained by the scanner and the colorimeter for each color component, we proceeded to the final step of calibration, in which the calibration functions were obtained through linear regression.

Let h1, h2be a set of N < K data from the random variable h*. We will present now how to summarize the data using frequency distributions and a sector histogram.

The sector histogram or rose diagram is the graphical representation of the j-th frequency, j = 1,2,...,T, given by the angle sector a (j-1) to ja and length vfj , where a = 360/T is the amplitude of the T intervals, as shown in Table 2.

Note that in this case the area of this sector is given by afj /2 , i.e., proportional to fj. These plots can be constructed through functions of the libraries “CircStats” and “circular” (^{JAMMALAMADAKA; SENGRUPTA, 2001}), as implemented by the R software program (^{R CORE TEAM, 2016}).

With regard to location measurements, the circular mean, h, has been described previously. As a measurement of angular concentration, we may consider the length (r) of the vector (OP) in which O has coordinates ( 0,0) and P has coordinates (X,Y).

This measurement can be obtained through the expression v X2 + Y2whose value belongs to the [0,1] interval, equal to 1 when all the circular data are equal, or 0 when all of them are distributed with minimum concentration. It is important to note that both h and r are frequently presented graphically by the already-mentioned OP vector.

Once r is a measurement of angular concentration, we may consider, according to ^{Zar (2010)}, 1 - r = 0 as a measurement of dispersion for circular data, in which 1 - r = 0 shows zero dispersion and 1 - r = 0 shows maximum dispersion.

This measurement was defined by Mardia (1972) as the circular variance S2 given by S2 = 1 - r.

^{Batschelet (1981)}, on the other hand, defined the angular variance dispersion measurement by S2 = 2 (1 - r) with values belonging to the [0,2] interval.

Note that the dispersion measurements presented so far have a defined maximum limit.

Hence, for the new statistic to have values belonging to the [0, 8 ] interval, such as in the linear case, ^{Mardia (1972)} proposed the variance measurement s20 = -2ln r , for r ? ] 0,1].

However, in such cases where the dispersion is the largest possible, S2 = 2, or s2 0? 8, the distribution of the observed points is not necessarily uniform across the circumference. It is important to observe that the averages S, s and s0 are expressed in radians, and to express those in degrees, all one needs to do is to multiply them by 180/?.

Let ? be the populational angle, (1- a )100% the confidence level and h the sample circular mean.

Therefore, the confidence interval for ? is given by IC(?)(1-a)100% = h + d, where d, according to ^{Zar(2010)}, and for small samples such as , as is usual in assessments of hue with the colorimeter, is given by e 10

or e011

if , in which N is the sample size and X2a,1 is the quantile of order (1- a )100% of the chi-squared distribution with one degree of freedom.

RESULTS AND DISCUSSION

From the first experiment, the calibration functions for each color component (L*, a*, b*, C*, and h*), as well as the respective coefficients of determination,were used as a measurement of model goodness-of-fit. The coefficients of determination were greater than 0.9 for all color components (Figure 3 and Figure 4), i.e., generally the scanner may be used as a device to assess these components since the range of pulp or peel colors can be represented by Munsell’s color chart for vegetal tissues. This result agrees with the results presented by ^{Darrigues et al. (2008)}, once the proportion of variation of the response variable that can be explained by the regression was considered high (R2 > 0,9).

In Figure 5, an arrow was added to the rose diagrams extending from the center of the circumference, indicating the size and direction of the mean vector and generic points (in red) of the analyzed fruit on days 1, 10 and 19 using the scanner and the colorimeter. From this Figure and Table 3 it can also be seen that the initial mean hue of the fruits assessed by the colorimeter and the scanner were similar, with h = 112,96º (green) for the scanner and h = 115,41º (green) for the colorimeter.

On the other hand, the final mean hue was apparently different between the methodologies, varying between a yellow fruit color presented by the scanner (h = 91,37º) and an orange fruit color presented by the colorimeter, (h = 75,52º).

Such difference can be due to both the fact that the colorimeter is a punctual device and the sample region, as well as the number of observed points. It is important to note that the fruit presented a gradual maturation over time independent of the equipment as given by the circular means (Table 3).

Furthermore, there is a high angular concentration of the points (r = 1)), i.e., the dispersion of the points in the circumference is small for both methodologies, implying that the hue variability is small for both methodologies. This is since the papaya naturally presents hues varying in a restricted range of hues, generally between [45º, and 150º] , and also because the hue changes gradually over time.

Additionally, it is important to note that in image analysis it is common to find a small percentage of pixels with hues outside of the green to yellow-orange interval, going clockwise. Such points may occur due to small stains in the fruits, which can be caused by disease or mechanical damage.

Since there may be points away from the region of major data concentration, they represent only 0.8% Thus, they practically do not influence the mean hue calculated by the circular mean.

However, if the mean hue were to be calculated by the arithmetic mean, this result would be wrong because there are points with the hue close to 0º. Moreover, the analysis of digitalized images makes it possible to obtain a mean hue closer to the true mean, the mean hue resulting from the colorimeter methodology. This is related to sample size, as well as the position of each point observed in the fruit peel region.

Another important physiological feature observed in this experiment was that the fruit ripened from yellow bands in the peel originating from the ESTILAR region towards the PEDUNCULAR insertion, according to ^{Oliveira et al. (2002)}. This may lead, apparently, to different variances between the methodologies, mainly at the beginning of the experiment, because, initially, the papaya color is more uniform in the equatorial region of the fruit.

In the ESTILAR region of the fruit, the color varies due more to the formation of yellow bands in its peel.

This way, when analyzing the angular deviation, circular deviation and angular standard deviation, presented in Table 3, it was clear that they differ between the methodologies at the beginning of the experiment, and generally they are greater when the scanner is used compared to the colorimeter. This shows that when assessing the fruit in the equatorial region only it is not possible to capture the natural variability of the hue in the fruit peel. Therefore, the analysis of digital images emerges as an important tool when the fruit presents non-uniform color in the ripening process.

This can be verified when looking at the confidence intervals for the colorimeter data (Figure 6 and Table 3). Note, for example, there are moments when the fruit is assessed as green by the colorimeter when it is actually at a more advanced ripening stage (green with yellow stains in the ESTILAR region) when assessed by the scanner. In another case, it is assessed as ripened by the colorimeter (yellow-orange, h* = 75) when it is actually in a less advanced ripening stage (yellow with few orange stains, h* = 85) when assessed by the scanner. Hence, it is important to build the confidence intervals for the mean hue at the j-th time unit to assess if the measurement error between the methodologies is “big”.

However, the construction of these intervals was only necessary for the colorimeter methodology, once the scanner data had provided information on the entire region of the fruit peel. In general, each digital image generated around 200 to 250 thousand observations, and, thus, we can say that as N?8, the variability of the mean hue tends to zero. In other words, the use of images of both faces of the fruit provides information on the populational mean hue.

For each methodology, different regions of the fruit are used to observe the points and this must be considered, because then, in theory, the methodologies cannot be compared. However, in practice, it is common to use the points observed in the equatorial region of the peel to describe the mean hue of the entire surface, implying a probable measurement error.

Thus, lower and upper limits for the 95% confidence interval for the mean hue resulting from the colorimeter methodology were constructed.When these intervals were compared to the mean peel hue of the fruit resulting from the scanner methodology , we noted that the methodologies of color analysis presented different results in days 1 to 5 and days 14 to 18, confirming the initial hypothesis that the methodologies can produce different results (Figure 6).

Therefore, the assessment of the mean peel hue using digitalized image analysis techniques is an alternative for standardizing the process of fruit color determination, as suggested by ^{Chen et al. (2002)}.

Whatsmore, the use of statistical techniques for the analysis of circular data allows for the assessment of data concentration, as well as the circular mean and dispersion considering the natural periodicity of the circumference.

As regards future work, we intend to use regression models for longitudinal circular data.

CONCLUSIONS

The scanner can be used as a device in the determination of the components L*, a*, b*, C* and h* of a fruit peel or pulp color belonging to the Munsell’s color chart’s established intervals.

The instrumental method, using the colorimeter, and the method of digital image analysis, using a scanner, lead to different peel color measurements at the beginning and at the end of the ripening process of papaya.

Therefore, based on the sample size, we conclude that the use of digital images, from devices such as a scanner, to assess the color fruit peel which present non-uniform color, is important because the use of these images makes it possible to work with large samples (N > 200.000), which brings the mean hues closer to the true hue.