INTRODUCTION

Size is one of the factors of raisin prices, and some raisin products label their packages such as “No more than 80 raisins per 30 grams”. Despite raisins’ popularity and the progress of computer vision systems, it is very difficult to find a study on raisin segmentation due to the difficulty of separation. Indeed, raisins are very soft and sticky, and they easily bond with each other, which can distort their shape. Thus, it is very hard to extract a single one in a heap of raisins and estimate the size. A method that uses computer and information technologies to recognize raisins is necessary.

Many studies in food technology have applied artificial intelligence and machine vision techniques (^{Wu & Sun, 2013a} & ^{Wu & Sun, 2013b}). This research allowed for the development of machines such as the food quality grading robot. The main challenge for these intelligent robots is to recognize the objects. Several papers have tested recognition techniques based on thresholds (^{Wei et al., 2014}, ^{Cubero et al., 2014}), on the k-means algorithm (^{Wang et al., 2017}) and on edge detection (^{Lv et al., 2016}). However, it is not possible to distinguish the background from the color of the fruit due to instable illumination. To provide a solution to this problem, some studies have suggested the idea of using a multispectral or RGB image to analyze the shape of the object on a plane and in 3D space (^{Barnea et al., 2016} & ^{Nguyen et al., 2016}). However, this analysis is inefficient when the objects have different shapes or in the case where some fruit is infected with a disease, both of which make it problematic to identify good and bad fruit on the same tree. In this case, it is imperative to give the system the ability to identify one fruit from another, and the research has provided several methods for the classification of fruits.

For this reason, many papers have presented quality and quantity systems (^{Dorj et al., 2017}, ^{Karimi et al., 2017}, ^{Arendse et al., 2016}, ^{Goel & Sehgal, 2015}, ^{Pham et Lee., 2015}, ^{Rodríguez et al., 2017}). All of these techniques have their values but often lack precision because the characteristics that are chosen can change from one object to another. This can lead to misclassification and, furthermore, a poor segmentation process can result in false predictions. This is why good segmentation is of critical importance for any kind of image analysis.

Segmentation has always been a vital process, and it is the most important element of computer vision. There are many algorithms for segmentation such as the threshold algorithm (^{De et al., 2016}, ^{Goh et al., 2017}, ^{Pal, 1993}), the edge detection (boundary) method (^{Tabb & Ahuja, 1997} and ^{Sumengen & Manjunath, 2005}) and clustering (^{Jain et al., 1999}, ^{Kanungo et al., 2002}, ^{Zhang et al., 2015}, ^{Bayá et al., 2017}). Among these methods, the watershed segmentation method is a powerful morphological tool for image segmentation (^{Gonzalez & Woods, 2006}), and it is considered to be a region-based approach (^{Gauch, 1999} and ^{Suphalakshmi et al., 2010}). In some research, the watershed segmentation method has been used to segment characters and irises (^{Frucci et al., 2016} & ^{Kavitha et al., 2017}).

However, most of these techniques are unable to delimit several objects that do not have the same size or often stick together. In the case of watershed segmentation, there can be oversegmentation or a complex background and no available regular structures.

For these reasons, this current paper focuses on the segmentation and extraction of raisins. Unlike fresh fruit on a tree, the grapes have already been harvested and are dry. Therefore, they do not have the same shape and size and they are often stuck together, thereby making it difficult to separate the individual dried fruit. The main contributions of this paper are twofold:

MATERIAL AND METHODS

Sample preparation and image acquisition

The Xinijang Uygur Autonomous Region abounds in grapes and is thus the most famous raisin growing region in China. The raisin breed ‘Mei-gui-zi’, which was sold under the Shamowang brand (Xiyuguoping Co. Ltd.; Urumchi, Xinijang Uygur Autonomous Region), was purchased and used in the experiment. To make the method that is proposed by the paper more practical for future applications, we chose the Canon EOS 70D, which is a user friendly and widely available camera. The samples were put in glass culture dishes that were 12 centimeters in diameter. We prepared 10 dishes of samples and each dish contained 3540 raisins. The dishes were put on white paper to reduce the difficulty of background elimination. The raisins were kept in a single layer but adhesion was allowed. In other words, no overlap is allowed. In natural light, the camera was put on a fixed shelf and acquired images from above. Each image contains one dish. The distance between the lens and the samples is 17 centimeters. We keep the dish and lens as vertical as possible. The Chess board pattern Ti-times CG-076-T was used to for the camera calibration. We took 12 images of Ti-times CG-076-T and the intrinsic parameters were calculated. There are 4 intrinsic parameters for calibration, including the focal lengths *f _{x}* = 4484.5 and

*f*= 4497, and the optical centers

_{y}*c*= 2626.2 and

_{x}*c*= 1891.4. The image was calibrated using the “undistortImage” function in MATLAB. Both the vertical resolution and horizontal resolution of each image are 72 dpi. Each image has spatial dimensions of 4288 by 2848 pixels. One of the calibrated images is shown in Figure 1. All of the image processing and algorithms were implemented in MATLAB R2016b.

_{y}Preprocessing

For this step, an image preprocessing method is applied to the original image for background elimination and noise reduction. As seen in Figure 2, the median filter is used to remove noise, and the background is eliminated by using the threshold method.

Algorithm for predicting the number of raisins

We predict the number of raisins for each connected region in order to decide whether the connected region should be divided. If the predicted number of raisins for a connected region is only one, then there is no need for any further partitioning. Otherwise, the proposed algorithm that recognizes the dividing line will be applied to divide the connected raisins in the region.

For predicting the number of raisins, the SVM, RF, and DNN algorithms will be applied because of their abilities to learn the essential characteristics of a dataset from a few samples. Each connected region in the raisin images is extracted and converted to a binary image and the roundness, area, X-axis value for the centroid, Y-axis value for the centroid, major axis length, minor axis length and perimeter of the region are used as inputs. The output is the number of raisins.

The accuracy of the SVM, RF, and DNN will then be compared and the best model will be chosen for further processing.

Deep neural network (DNN)

For the deep neural network, Rectified linear units (ReLu) are approximate biological neural activation functions, which are better than the traditional Sigmoid function due to three advantages: unilateral inhibition, a relatively wide range of excited boundaries and sparse activation. Compared with the traditional Sigmoid function, it is closer to the biological activation model. The output of the Sigmoid function is not sparse, so pretraining is required to obtain sparse data. The outputs of a network, which have been trained using ReLu, are sparse; therefore, there is no real need for pretraining if the ReLu is applied. The ReLu equation is as follows:

which means the values are equal to 0 if they are less than 0 and remain unchanged if they are greater than 0. Based on these merits, Rectified linear units (ReLu) are used for every neuron in the first four layers. Linear classification is used for the neurons in the fifth layer. The mean squared error is used to estimate the errors between the predicted and expected values. To optimize the model, the Stochastic Gradient Descent is chosen to minimize the errors because the Stochastic Gradient Descent (SGD) can overcome the high costs of applying backpropagation to the training dataset and still can result in fast convergence in the deep neural network. To overcome the shortcomings of the slow and gradient computing speed, and due to lacking an easy way to incorporate new data, the SGD only sees a single or a few training examples and follows the negative gradient. The SGD updates the parameters 0 of objective *J*(*θ*) using the following equation:

where the gradient of the parameter θ is computed based on only a single or a few training examples, α is the learning rate and (*x*^{(i)}, *y*^{(i)}) is a pair of data points from the training set. The adaptive moment estimation is performed to calculate the Gradient Descent. The topological structure of the DNN in the paper is 7-2-2-20-2-1 with 7 inputs and 1 output.

Support vector machine (SVM)

The proper choice for the kernel function and the penalty parameter C is critical for the performance of the SVM. A Gauss radial basis function (RBF) is used as the kernel function in the paper because it has good linear inseparability. The center of each RBF corresponds to a support vector. The kernel function is defined as:

where *γ* is the parameter that sets the “spread” of the kernel, and *x* and *x*′ are two support vectors. If *γ* > 0 and two vectors are close together, then ||*x* - *x*′|| will be small and -*γ*||*x* — *x*′||^{2} will be larger. This means that the vectors that are closer have larger kernel values than those that are farther away. Thus, the kernel function will be in the form of a bell-shaped curve with a width that is set by *γ,* and the larger that *γ* is, the less the width of the bell.

The penalty parameter C can be used to balance the complexity of the decision and the error frequency (^{Cortes & Vapnik, 1995}). If C is too large, too many support vectors will be stored and overfitting will result. If the value is too small, underfitting will occur. In this paper, the value of C is 1.0. In addition, the ε insensitive function is used as the loss function and the value is 0.1.

Random forest

A random forest constructs a multitude of decision trees during the course of training and outputs the mean prediction of the individual trees. It can obtain a better result than a single decision tree. In this paper, there are 300 trees in the forest and 3 features of 7 will be randomly chosen from one tree for training.

Algorithm for segmenting points and recognizing lines

Erosion and dilation are applied to initially segment images. The effects of erosion and dilation are shown in Figure 3. After erosion, two raisins can be separated, such as Figure 3c. Dilation is used to restore each of the separated parts, such as in Figure 3d and Figure 3e. The intersections of two edges could be connected to form a segment line, such as in Figure 3f. Although the two edges cannot completely coincide with the real edge, the intersections of the edges (endpoints of the segmentation line) are quite accurate (Figure 3g). In addition, the endpoints of the segmentation line are the targets of this research.

In this paper, there are many raisins in one image that are amassed together and form a single connected region. After erosion, some of the smaller regions may contain only one raisin and some others may contain more than one raisin. Each region will be extracted and dilation will be applied for restoration. The structural elements for erosion and dilation should be the same and be carefully chosen so not to change the original shape of the small pile of raisins as much as possible. The prediction model for detecting the number of raisins will be applied to each region to determine the number of raisins within each region. The regions with more than one raisin will be further processed to segment each raisin inside the region. For the regions where the number of raisins is greater than four, erosion and dilation will be applied one more time to make sure that the number of raisins in each region is no more than four.

For the regions with more than one raisin and less than five raisins, horizontal and vertical Sobel operators will be applied to extract the edges. For each edge, the horizontal line through the centroid of the region will be used to divide the edge into two unclosed curves: the upper curve, and the lower curve. We put the upper curve and lower curve on two respective polar coordinates and resample the discrete points on the curve. A suitable sampling interval is necessary because sampling intervals that are either too large or too small could change the original shape.

First, the segment points will be determined and then the segmentation lines can be detected. To recognize the dividing points, the three features (polar axis, polar angle and angular velocity) of each discrete point on one curve will be calculated. The discrete points that satisfy at least one of the following conditions that could be regarded as segmentation points:

The polar axis of the current point is smaller than the left and right adjacent points and the difference greater than a threshold;

The polar angle of the current point is smaller than both the left and right adjacent points or greater than both the left and right adjacent points, and the difference is greater than a threshold; and

The angular velocity of a point is greater than a threshold.

The thresholds of the three conditions are critical parameters. The problem is whether the thresholds are suitable enough to achieve an ideal result. For conditions a and b, if the thresholds are too small, not enough discrete points will be chosen; and if the thresholds are too great, too many discrete points will be chosen. For condition c, the opposite is true. The solution is to prepare series of polar axis values, polar angle values and angular velocity values. After the initial tests, the ranges of the polar axis, polar angle and angular velocity values are 0 to 4, 0.003 to 0.004 and 1200 to 2000, respectively. The optimal values will be detected after each combination of the three values is tested. If there are five discrete points between two segment points, then the midpoint of the two will replace the two as a segmentation point. For the selected segmentation point, the normal lines are necessary for detecting the segmentation lines. After many observations, we found that if the angle between two normal lines is less than 20 degrees, the two points of the normal lines are on the same dividing line. Thus, if the angle between two normal lines is less than 20 degrees, the corresponding segmentation points are referred to as a pair of matched points. If a segmentation point cannot find another segmentation point between which the angle of two normal lines is greater than 20 degrees, then the point has not yet been matched and the corresponding segmentation point is referred to as an unmatched point. The connection between a pair of matched points is a segmentation line, which is called a point-point line.

For unmatched points along the same closed edge, if there are two, then the intersection of the two normal lines is a segmentation point inside the closed edge. The point is called the internal point, and the two connections between the internal point and the unmatched points are the segmentation lines, which are called internal lines. The connection between the internal point and the midpoint of the point-point line is another segmentation line, which is known as the internal-midline.

If the number of unmatched points is greater than two in a closed curve, the connections between the unmatched points and the centroid of the unmatched points are segmentation lines, which are called unmatched- centroid lines. The procedure of the segmentation line recognition algorithm and the diagram for the algorithm that is proposed by the paper are shown in Fig. 4.

**Step 1: Separate the image into unconnected regions with no more than four raisins**

Apply erosion to separate the image;

Apply dilation for restoration;

Apply prediction model to predict the number of raisins;

**Step 2: Segment each region using segmentation points and line detection:**

Extract edges, resample and separate into upper and lower curves;

Calculate polar axis, polar angle and angular velocity for each point;

Segmentation points recognition, categories of segmentation points:

Matched point

Unmatched point

Internal point

Point-point line

Internal line

Internal-midline

Unmatched-centroid line

RESULTS AND DISCUSSION

Taking the image in Figure 1 as an example, the raisins form a single connected region. Erosion and dilation will be used to separate this single connected region into several independent smaller connected regions. The structural element for erosion in the paper is a disk with a radius of 70 pixels. In Figure 5, the result of erosion is shown. The image has been separated into 19 connected regions of which 8 contain more than one raisin and the others contain only one raisin. The regions have been numbered from 1 to 19 in Figure 5.

We extract each of the 19 regions and dilation is applied to each region in sequence. The same structural element (a disk with a radius of 70 pixels) for erosion is applied to the dilation in order not to change the original shape of the raisins as far as possible. Each of the separated regions is shown in Figure 6. The edge of each region is extracted and all the edges are placed together according to their original positions with the result shown in Figure 7. The edges of two adjacent regions have two intersections and the lines between each are the dividing lines. The results are shown in Figure 8.

The algorithm for predicting the number of raisins will be applied to detect the number of raisins. The SVM, RF and DNN have been applied to achieve this. The test dataset includes the 19 connected regions that were initially separated by erosion in Figure 8, and the training dataset is from the regions in the other 9 images similar to Figure 8. There are 204 total regions. The inputs of the model are the roundness, area, X-axis value for the centroid, Y-axis value for the centroid, major axis length, minor axis length and perimeter of each connected region; and the output is the number of raisins. If there are more than 4 raisins in a region, the region will be eroded and dilated into smaller regions. Thus, if there are less than 5 raisins, the output of the region is the number of raisins; otherwise, the output is 5. Consequently, the regions will be classified into 5 categories. The prediction accuracy and confusion matrix for the training set and testing set will be used to estimate the performance of the models. The prediction accuracy of the SVM, RF and DNN are shown in Table 1. For the training set, the accuracy for DNN is the best one; and for the testing set, the three models obtain the same accuracy of 0.947368. The confusion matrices for the SVM, RF and DNN for training set are shown in Table 2 and the confusion matrices for the three models for the testing set are the same and are shown in Table 3. For the training set, the DNN algorithm results in the best accuracy. However, for the test set, the three models have the same performance. Thus, the DNN is chosen as the prediction method.

SVM Accuracy=94.1176% |
Prediction | ||||||
---|---|---|---|---|---|---|---|

1 | 2 | 3 | 4 | 5 | |||

Real | 1 | 101 | 1 | 102 | |||

2 | 1 | 34 | 5 | 40 | |||

3 | 32 | 1 | 33 | ||||

4 | 1 | 15 | 1 | 17 | |||

5 | 2 | 10 | 12 | ||||

DNN Accuracy=100% |
Prediction | ||||||

1 | 2 | 3 | 4 | 5 | |||

Real | 1 | 102 | 102 | ||||

2 | 40 | 40 | |||||

3 | 33 | 33 | |||||

4 | 17 | 17 | |||||

5 | 12 | 12 | |||||

RF Accuracy=94.6078% |
Prediction | ||||||

1 | 2 | 3 | 4 | 5 | |||

Real | 1 | 101 | 1 | 102 | |||

2 | 1 | 38 | 1 | 40 | |||

3 | 2 | 31 | 33 | ||||

4 | 2 | 12 | 3 | 17 | |||

5 | 1 | 11 | 12 |

The DNN is applied to all the connected regions and the regions with more than a single raisin are No. 5, No. 7, No. 10, No. 11, No. 13, No. 14, No. 15 and No. 18. These regions will be further processed to segment each raisin. Region No. 18 contains more than 4 raisins; therefore, erosion and dilation are applied to it. Then, the region is separated into 3 regions and no more than 4 raisins will be contained in each region. The results from region No. 18 are shown in Figure 9.

In cases where each edge contains more than a single raisin and less than five, the edges will be segmented into 2 unclosed curves by the horizontal line passing through the centroid of the edge. The upper and lower curves are placed onto two polar coordinates, respectively, whose centers are located at the centroid. The discrete points on the two curves are resampled and the sampling interval is 50. The segmentation points on the edges will be detected first, and then the segmentation lines and segmentation points inside the shape (internal point) will be detected. The polar axis, polar angle and angular velocity of each discrete point on an edge will be regarded as conditions to detect the segmentation points. The thresholds of the polar axis, polar angle and angular velocity are critical parameters. After repeated tests, the primary combination of the polar axis, polar angle and angular velocity are 3, 0.0034 and 1723, respectively.

For example, take region No. 13. Morphological analysis based on the edge parameters including the polar axis, polar angle and angular velocity will be applied to search for the suitable break points, which are useful for identifying the dividing lines. The conditions that are used to choose the segmentation points on the edges will be described in the Materials and Methods section. The discrete points that satisfy at least one of the conditions could be regarded as segmentation points. Thus, four segmentation points, which are shown in Figure 10, could be detected on the edges. The normal lines of the four points are shown in Figure 11(a). The intersection angle between the normal lines of segmentation points C and D is smaller than 20. Thus, points C and D are a pair of matched points and the connection between them is a point-point line. The point-point line is shown in Figure 11(b) as a yellow line. There are two unmatched points, points A and B; therefore, the intersection of the two normal lines of the two points is an internal point that is labeled as E in Figure 11(a). The connections between A and E and between B and E are internal lines, which are shown in Figure 11(b) as green lines. The connection between E and the midpoint of the point-point line is an internal-midline, which is shown as a purple line in Figure 11(b). The segmentation performance of region No. 13 is shown in Figure 11(c).

For the cases of more than two unmatched points, we take region No. 11 as an example. As shown in Figure 12, points A and B are a pair of matched points and there are 3 unmatched points: C, D and E. Thus, the centroid of the 3 unmatched points is an internal point that is labeled as F, which is shown in Figure 12, and the connections between points C and F, D and F, and E and F are unmatched-centroid lines. Another case of more than 2 unmatched points can be found in region No.14. There is no matched point in region No.14; thus, the centroid of the three points is an internal point. The result of No.14 is shown in Figure 13.

For each of regions Nos. 5, 7, 10 and 15 that contain two raisins, there are only two segmentation points that are detected and the intersection angle of two normal lines is smaller than 20; thus, the connection of the pair of matched points is a point-point line. The results of regions Nos. 5, 7, 10 and 15 are given in Figure 14.

The total segmentation result is shown in Figure 15. The red lines are detected by erosion and dilation and the segmentation line recognition (SLR) algorithm detects the green lines. There are 35 raisins in the image, and the algorithm that is proposed by the paper has the ability to separate each raisin. However, there are some imprecise segmentations such as raisins 1, 2, 3 and 4. There may be two reasons.

The thin green curves are the edges of the connected raisins. The existing shadow decreases the edge detection accuracy, and the SLR algorithm is based on the edges.

Some raisins with slight overlaps make the boundary of two not be straight line, and the algorithm in the paper provides only straight lines.

The algorithm for predicting the number of raisins that is proposed in this paper could detect the number of adjacent raisins, even if their shapes have changed. Three prediction models, the deep neural network (DNN), support vector machine (SVM) and random forest (RF), have been compared. Seven features, which could describe the shapes of raisins, have been used as the input set of the prediction model, and the output is the number of raisins. The accuracies for the training set are 0.941176 for the SVM, 0.946078 for the RF and 1 for the DNN. The DNN is the best one. The accuracies of the SVM, RF and DNN for the test set are the same at 0.947368. For the prediction model, higher accuracy equates to better performance of the model. Thus, based on these values, the deep neural network (DNN) is chosen as the prediction algorithm in the paper because it is the best one for the training set. The SLR algorithm has been used to detect the segmentation lines between raisins, and is mainly based on the shape of a raisin. The algorithm has two steps. The first step separates the raisins into several regions - each of which contains more than one and less than five raisins - and the second step separates each individual raisin. The experiment shows that the algorithm is capable of successfully separating adjacent raisins. The advantage of the SLR algorithm is that it could separate the adjacent soft objects even if their shapes have changed as a result of adhesion. The drawback of this algorithm, however, is that all the segmentation lines are straight lines, even though the actual segmentation lines are nonlinear lines or polylines.

CONCLUSIONS

This research proposed a novel segmentation algorithm based on deep learning and morphological analysis. First, morphological operations (dilation and erosion) are applied to the original image for enhancement and background elimination. Next, a new algorithm that recognizes the separation line between two connected raisins is used, which is called the segmentation line algorithm. Furthermore, a model has been built to predict the number of raisins in a certain field range. Three models have been used: the RF, the SVM and the DNN. The result showed that the prediction model of the DNN is more accurate for the test set. The DNN model obtains the highest accuracy compared to the other two models, which means that it has the best performance. For this reason, it was chosen as the model for the algorithm. Finally, this segmentation algorithm was proved to be useful for the segmentation problem of objects that are stuck together, whether the shape of these objects change or not.