Open-access Deep Learning Structure for Real-time Crop Monitoring Based on Neural Architecture Search and UAV

Abstract

Real-time monitoring of crop growth has become indispensable in modern agriculture, facilitating prompt detection of crop stress, diseases, and nutrient deficiencies by farmers. This study investigates the feasibility of leveraging unmanned aerial vehicles (UAVs) and deep learning algorithms for the real-time monitoring of Vicia faba L. crop growth stages, aimed at informing decisions related to irrigation, fertilization, and pest management. The study introduces a cutting-edge deep learning model tailored for accurate real-time monitoring of diverse growth stages based on neural architecture search (NAS). This model is benchmarked against seven other rigorously trained models using a diverse dataset of 2530 UAV-captured images, encompassing varied and complex lighting and background conditions. We meticulously fine-tuned the training parameters, closely examining and comparing the substantial performance of each model. Notably, the NAS-based architecture model proved outstanding results, achieving a precision rate of 95.80%, a recall rate of 98.80%, and a mAP@0.5_0.95 value of 71.30%. It strikes an optimal balance between precision, speed, and model size compared to alternative neural network models. The mean average precision (mAP) stands at 95.50%, and it maintains a refresh rate of 24.8 frames per second (FPS), all within a compact model size of 256 megabytes (MB). This chosen model achieves an impressive inference speed of 40.32 milliseconds per frame during testing with new images. This performance is underpinned by the current technology of the NVIDIA Quadro P1000, recognized for its high performance and significant pipelines/CUDA cores.

Keywords:
Deep Learning; Neural Architecture Search; CNN; UAV; Real-time detection

HIGHLIGHTS

Proposed innovative approach for robust surveillance and monitoring of Vicia faba crops.

Developed a novel deep-learning model based on neural architecture search to identify growth stages.

Performance of the real-time culture monitoring and classification system.

The method outperforms dataset recognition, yielding compelling experimental results.

INTRODUCTION

One of the first cultivated grain legumes in the world, the Vicia Faba L., also known as Fava bean, is well-known for its exceptional yield potential and rapid production rates. It is a staple food in diets around the world due to its many nutritional qualities, including its high protein composition (26.1%), abundant carbohydrates (58.3%), fiber (25.0%) [1], vitamins and minerals [2], A versatile food, Vicia faba L. can be consumed in several ways, including dried, fresh, frozen, or canned. National markets also serve as a vital source of income for farmers and their respective nations. Vicia faba also significantly contributes to soil fertility restoration by developing nodules on the roots. The bacteria that live inside the nodules draw nitrogen from the air to fertilize the soil while balancing the nitrogen levels in the atmosphere. This method is essential for crop rotation and intercropping, which enhances soil fertility for the sustainable cultivation of cereal crops, primarily wheat, alternating with vegetable crops [3]. Vicia faba is a global crop integrated into various farming systems, including dry grains, green grains, and nitrogen-fixing legumes [4]. Currently, Vicia faba L. ranks fourth among the essential seasonal fresh food legumes, followed by chickpeas, lentils, and peas. In 2019, they were grown on 2.57 million hectares, with a total production of 5.4 million tons (FAOSTAT database, 2019). However, managing and surveilling Vicia faba crops can be difficult, requiring constant development monitoring to maximize yields and reduce losses.

This situation has led to the development of advanced technologies such as computer vision and unmanned aerial vehicles (UAVs), providing compelling alternatives for efficient crop growth monitoring, UAVs equipped with cameras enable the collection of high-resolution spatial and temporal plant data. Aerial imaging enables farmers and researchers to track plant growth in real time by supplying valuable information on crop growth, development, health, and yield. Deep learning is an artificial intelligence method that enables robots to learn from data to handle challenging tasks in addition to UAVs. In agriculture, this method is being employed to address issues, including crop categorization, plant disease detection, and production prediction.

In agriculture, deep learning, an artificial intelligence method, empowers robots to master challenging tasks, complementing the capabilities of UAVs [5], this approach is harnessed to tackle diverse challenges such as crop categorization, plant disease detection, and production prediction. Modern optimization of deep neural networks has contributed to the development of techniques such as Faster Region-Based Convolutional Neural Network (Faster R-CNN), Visual Geometry Group (VGG), Mask Region-Based Convolutional Neural Network (Mask R-CNN), and You Only Look Once (YOLO). YOLO is frequently employed for real-time recognition of targets, objects, or individuals. Over time, the model has grown denser, more robust, and more complex, emerging as the most effective model for target identification; it exhibits invariance to geometric changes, deformations, and variations in lighting that commonly challenge traditional feature extraction algorithms, making it highly effective in addressing the challenges posed by the dynamic appearance of Vicia faba L. crops.

Deep learning, image processing, and innovative technologies like IoT and UAVs have revolutionized crop monitoring in a transformative agriculture shift. S-YOLO excelled in apple blossom monitoring, outperforming YOLOX-s by 7.94%-8.05% for four blossom states [6]. Meanwhile, timely agricultural practices and plant phenology were highlighted in [7], where YOLO v4 outperformed other structures. For coconut maturity, Parvathi & Tamil Selvi's R-CNN model outperformed SSD and YOLO-V3 [8]. Kumar and coauthors.'s GL-CNN integrated with IoT and UAVs achieved 95.96% accuracy in monitoring palm seedlings [9]. The study [10] demonstrated that YOLO V5s excelled in detecting apples before thinning, achieving an accuracy of 95.8%. The YOLO-deepsort network, proposed by [11], accurately monitored tomato growth stages with mean average precision rates of 93.1%-97.9% for flowers and tomatoes in a greenhouse scenario. These studies highlight the transformative power of deep learning and IoT in elevating detection accuracy for informed decision-making, improving crop yields, and ensuring superior fruit quality.

In this paper, we conducted the first-ever study for the real-time monitoring of Vicia faba L. crop growth. Our approach involved the integration of computer vision and UAV sensing technologies. The study encompassed a comparative analysis of Faster RCNN, RetinaNet, various YOLO versions, and our proposed model incorporating neural architecture search (NAS). We focused on capturing the dynamics of Vicia faba growth across three pivotal phases: germination, flowering, and complete pod fruiting. We evaluated the models' performance across various input image resolutions to elucidate potential trade-offs between speed and accuracy as influenced by model type and image size. Additionally, we assessed and compared the models' inference capabilities under varying field conditions, encompassing different levels of illumination and shading, various stages of Vicia faba growth, and different row orientations. Our findings demonstrate highly efficient monitoring of crop development with a notable degree of detection accuracy. This methodology holds promise for enhanced crop monitoring, potentially aiding farmers in making informed management decisions. These approaches have found applications in diverse fields, including the transport sector [12], security [13], and biomedical [14].

The rest of this paper is structured as follows: The subsequent section provides an extensive literature review, exploring the integration of object detectors with UAVs for monitoring agricultural development. Following this, we detail the structure of the experimental setup, elucidating the key components and methodologies applied in our study. The findings are then presented and analyzed in the following section. The concluding section offers recommendations for advancing our application.

MATERIAL AND METHODS

This section outlines the materials and methods for integrating UAVs and deep learning algorithms to monitor Vicia faba L. crop growth. It provides an in-depth exploration of the background and functionality of the proposed algorithms, aiming to identify the model with optimal efficiency. The study's flowchart is illustrated in Figure 1. Subsequently, the phenotyping system and tools are detailed, followed by a comprehensive description of the experimental site. The acquisition parameters and database collection methodology are thoroughly explained, and the annotation of images is discussed. Additionally, the application of data augmentation techniques is covered. Finally, the NAS architecture is introduced in the last part of this section, along with an elucidation of the metrics employed to assess the study's outcomes.

Figure 1
The proposed study's flowchart.

Experimental site and dataset construction

The experimental site was situated at coordinates 35°04'24.9"N 2°48'58.8"W and served as the research location from mid-January to early April 2023 at an agricultural farm located in the commune of Bouarg, within the Nador province of the Oriental region of Morocco. This area maintained an average temperature of 18.2°C and received an annual rainfall of 456mm. Our research focused on the Vicia faba crop, and 72 plots were selected randomly for the study. High-resolution images were captured using a DJI Mavic 3 equipped with a Hasselblad camera with a 20-megapixel 4/3 CMOS sensor, resulting in images with a maximum resolution of 5280×3956 pixels; we can effectively document the various stages of growth. Acquiring the dataset presented a significant challenge in our study, and obtaining our unique dataset required a substantial effort. From January to April 2023, 2530 high-resolution RGB (red, green, blue) images of Vicia faba fields were meticulously taken at altitudes ranging between 1.5 and 2 meters. These images were captured precisely 25, 29, and 60 days after planting. The photos encompassed a variety of lighting conditions and documented distinct phases of Vicia faba growth, including germination, flowering, and fruiting (full pod) development (Original source dataset is provided as supplementary material file). All images were saved in Joint Photographic Experts Group (JPG) format.

Our training and validation sets are divided into a 4:1 ratio. Figure 2 features several graphs illustrating the distribution of various dataset parameters. These graphs provide insights into the distribution of object heights and widths at different positions within the images and the spatial distribution of objects. It is worth noting that all object positions (x, y), heights, and widths are normalized relative to the size of each respective image. The histograms in Figure 2 depict the distribution of different crop growth stages within the dataset, highlighting their varying height and width dimensions and their x and y coordinate positions on the images. Each scatter plot within the figure also provides information on the number of Vicia faba crop stages and their corresponding x-axis and y-axis characteristics. Regarding image width and height, it is evident that most objects are less than 30% and 25% wide, respectively. Moreover, the identified growth stages are frequently located near the center of the images.

All the bounding boxes in the dataset are visually represented by the green boxes in the right plot of Figure 3. The source code is available upon request; you can email the author. The dataset can be downloaded at the following link: http://u.pc.cd/9yHctalK.

Figure 2
Distribution of object locations according to their size.

Image annotation

A critical phase in the training of deep learning models involves image annotation. Annotation is crucial in improving proposed models' performance and accuracy. The quality of labeling directly influences the model's ability to detect objects within images effectively. High-quality annotations enable models to grasp the distinguishing characteristics of various items, hasten the comprehension of object outlines and features, and facilitate generalization to new things. In this study, we employed the Python-based graphical annotation tool called 'LabelImg' due to its user-friendly nature and ability to store annotations in (.txt) format. During the annotation process, we carefully generated bounding boxes around regions of interest to minimize the inclusion of background pixels within the boxes. Additionally, we assigned object classes in small rectangular or polygon boxes, depending on the model trained, to represent different growth phases of the Vicia faba crop. After accurately annotating all images in our database, we obtained a text file (.txt) as the output. The text file provided crucial information about the three distinct classes: Germination, Flowering, and Full Pod, each assigned decimal values of 0, 1, and 2, respectively. The object classes used in our study are detailed in Table 1. The data set was split into two sections: training and validation. Table 2 displays the dataset and the distribution of labels.

Table 1
All classes were used in the study.

Figure 3
Bounding boxes and scatter plots display the dimensions and locations of the objects in the frames.

Experimental Environment

Neural Architecture Search (NAS) automates neural network model development using optimization algorithms to tailor structures for specific tasks. It combines with YOLO (You Only Look Once) to create YOLO-NAS, which improves YOLOv6 and YOLOv8 with a basic block for quantization, advanced training, and AutoNAC optimization. YOLO-NAS identifies small objects and optimizes performance for real-time edge device applications. The model is open-source, enhancing accessibility for research [15]. YOLO-NAS incorporates revolutionary features, including Quantization-Aware Searchable Pruning (QSP) and Quantization-Constrained Inference (QCI) Modules, as depicted in Figure 4, minimizing accuracy loss during post-training quantization. The combination of NAS and YOLO in YOLO-NAS offers a powerful solution, unlocking new possibilities for efficient and accurate object detection in various applications.

Table 2
All classes and labels were used in the study.

YOLO-NAS leverages Deci's AutoNAC technology for automatic neural structure creation, enhancing model efficiency. It adopts a hybrid quantization approach, selectively quantizing a subset of the model to balance latency and accuracy [16]. YOLO-NAS's pre-training regimen includes self-distillation methods, extensive dataset integration, and automatically labeled data to enhance model robustness. Moreover, it's accessible under an open-source licensing agreement. The AutoNAC system significantly performs across various tasks, data characteristics, inference contexts, and performance aims, considering factors such as compilers, quantization techniques, data characteristics, and hardware configurations [17]. For compatibility with Post-Training Quantization (PTQ), RepVGG blocks were incorporated into the model architecture during the NAS process, leading to the development of three distinct YOLO-NAS variants: YOLO-NASS, YOLO-NASM, and YOLO-NASL, denoting small, medium, and large architectures, as depicted in Figure 5.

Figure 4
QSP and QCI Modules.

Due to its computing efficiency and malleability, the YOLO-NAS model uses RepVGG as its backbone network [18]. Derived from the VGG model, RepVGG excels in inference speed and memory economy, which is vital for real-time object recognition [19]. Its ability to re-parameterize allows training in a multi-branch architecture, switching to a single-branch for inference without delay, maintaining feature representation, and accelerating inference times. RepVGG's integration enhances computing resource optimization and overall efficiency in YOLO-NAS. In inference, RepVGG uses 3x3 convolution and ReLU activation, while training employs ResNet's multi-branch topology with identity and 1x1 branches, as depicted in Figure 6.

RESULTS

Experimental Setup, Parameters, and Model Performance Evaluation

A Windows 10 Professional desktop PC served as the study's calculating environment. Torch version 2.0.1, CUDA version 11.8, and Python version 3.10.12 formed a software stack. An NVIDIA GeForce Quadro P1000 graphics card was added to the Intel® Xeon® W-2223 CPU, which had a 3.6 GHz clock speed. 16 GB of RAM was also included in the hardware setup.

The dataset was split into 4:1 training and validation sets and manually annotated for Vicia Faba's growth stages. Eight neural network architectures, including YOLOv5x6+TTA, YOLOv5L6, Faster RCNN, YOLOv8L, YOLOv8M, RetinaNet, YOLO-NASM, and YOLO-NASL, were trained using this dataset. Training used ten batches of eight images each-the research aimed to evaluate and identify the most effective model for Vicia faba growth stage identification. The trained model was saved post-training and evaluated with new test data. Hyperparameters of the best-performing model are detailed in Table 3 after comparing preset and evolved hyperparameters. The classification model has four prediction scenarios: true positive, true negative, false positive, and false negative. The connection between these four situations is illustrated in Table 4. The precision, whose calculation equation is shown in (1), is the percentage of predicted positive tests that are positive. Recall, F1 score, average precision (AP), average recognition precision (mAP@0.5 and mAP@0.5_0.95), and processing speed in frames per second (FPS) were also used to measure model performance. The average of the AP obtained for all classes is known as the mAP, where N is the number of categories (in our case, N is equal to 3). The mAP calculated at the intersection-on-union (IoU) threshold value 0.5 is called "mAP@0.5". The value mAP@0.5:0.95 indicates the average mAP over intersection thresholds with an interval of 0.05 and a range of 0.5 to 0.95. The formulas for recall, F1 score, AP, and mAP are (2), (3), (4), and (5), respectively [20][21].

P r e c i s i o n = T P T P + F P x 100 % (1)

Re c a l l = T P T P + F N x 100 % (2)

F 1 S c o r e = 2 x Pr e c i s i o n x Re c a l l Pr e c i s i o n + Re c a l l x 100 % (3)

A P = 0 1 Pr e c i s o n ( Re c a l l ) d ( Re c a l l ) x 100 % (4)

m A P = 1 3 i = 1 3 A P x 100 % (5)

Figure 5
An In-Depth Overview of the YOLO-NASL Architecture.

The properties of accuracy, mAP, and AP of single-class objects, computational efficacy, Precision-Recall curve, model detection speed, and network weight were all thoroughly considered in this study as evaluation indicators. Object identification expresses the degree of divergence between the model's predicted and actual values using the loss function. Bounding box loss (box_loss), object loss (obj_loss), and classification loss (cls_loss) are the three main elements that makeup loss function. To assess the precision of the projected bounding box on the target object, Box_loss stands for the bounding box regression loss. Obj_loss is a measure of objectivity loss that assesses the degree of certainty that an item is located inside the suggested zone of interest. Cls_loss stands for classification loss.

B o x l o s s = λ c o o r d i = 0 S ² j = 0 B I i , j o b j b j ( 2 w i * h i ) [ ( x i * x ^ i j ) 2 + ( y i * y ^ i j ) 2 + ( w i * w ^ i j ) 2 + ( h i * h ^ i j ) 2 ] (6)

O b j l o s s = λ n o o b j i = 0 S ² j = 0 B I i , j n o o b j ( c i c ^ j ) 2 + λ o b j i = 0 S ² j = 0 B I i , j o b j ( c i c ^ j ) 2 (7)

C l s l o s s = λ c l a s s i = 0 S ² j = 0 B I i , j o b j c c l a s s e s p i ( c ) log ( p ^ l ( c ) ) (8)

Figure 6
RepVGG architecture, topologies for Training, and Inference [18].

Comprehending the many variables and coefficients utilized in the various equations is crucial to correctly identifying issues. Numerical grids and bounding boxes inside each grid enable precise localization. S² is the grid number, and B is the bounding boxes in each grid. In estimating the position loss and accounting for the absence of objects in the bounding boxes, the coefficients "λcoord" and "λnoobj" are crucial. For precise object recognition, the actual center location (x̂ & ŷ), target width (ŵ), and target height (ĥ). The confidence and loss related to the predicted bounding boxes are evaluated using the actual and predicted confidence values (ci & ĉj) and λclass classes loss coefficient. Finally, the class probabilities pi(c) and true class values p̂l(c) are fundamental to classifying detected objects. Using all equations, the total loss function is calculated and shown as follows:

L o s s = B o x l o s s + O b j l o s s + C l s l o s s

Table 3
Hyperparameter settings.

Table 4
The link between prediction possibilities.

Assessing model performance across varied attention techniques

This study presents a comprehensive analysis of the performance results of the various detection models introduced in our research. Our investigation focuses on their ability to identify the different growth stages of Vicia faba crops. Impressively, all eight models showed remarkable accuracy in this task. The mean accuracy (mAP) at a confidence level of 0.5 ranged from 90.1% for RetinaNet to an impressive 95.5% for YOLO-NASL. When considering mAP in the 0.5 to 0.95 confidence interval, we can observe that, except for Faster RCNN, which scored 69.7%, all models demonstrated accuracy between 70.8% and 73.7%. Remarkably, six out of the eight models exceeded an accuracy rate of 92%. Figure 7 showcases examples of image predictions during the “Germination” Phase of the Vicia faba crop, generated by YOLO-NASL and YOLOv8M. These models consistently provided accurate predictions for images with diverse or cluttered backgrounds, even when surrounded by densely populated grasses adjacent to Vicia faba crop branches. Additionally, we compiled the predictions of our models into videos for visual representation. Our results underscore the effectiveness of the YOLO-NASL object detectors selected for the detection of multi-class Vicia faba growth stages. Table 5 summarizes the experimental performance results.

Performance Evaluation of the Training Process

After completing the training process, we meticulously analyzed the loss function value curves for the training data set for the different YOLO architectures proposed in our study. As shown in Figure 8(a), during the first training epochs (from 0 to 10), we observed a rapid and pronounced reduction in loss values. This phase corresponds to the models' rapid acquisition of low-level features and a preliminary understanding of the data set. Subsequently, there was a noticeable deceleration in the loss reduction rate, meaning that the models entered a fine-tuning phase. During this phase, the models focused on capturing more complex patterns and more nuanced information in the dataset. This iterative learning process continued over several epochs, gradually enabling the models to optimize their predictive capabilities. Significantly, the point of convergence, where the loss values stabilized, was reached after 80 epochs. The models have gained considerable refinement at this stage, stabilizing loss values at an impressive 1,1179. This stable state underlines the models' robustness and ability to generalize efficiently, confirming the effectiveness of our training method.

Figure 7
Visualization of predicted bounding boxes during the germination phase of Vicia faba crop.

To rigorously evaluate the improved pattern detection capabilities of the refined algorithm, a comprehensive evaluation was carried out using a new test dataset. This evaluation specifically used the proposed algorithm, and the ensuing results are listed as follows: The mean accuracy (mAP) recorded an exceptional score of 95.5%. At the same time, accuracy rates reached an impressive 95.8%. Moreover, an integral performance measure, the frame rate, reached the highly commendable figure of 24.8 frames per second (FPS), as Figure 8(b) clearly shows. The complete evaluation results confirm the robustness and efficiency of the YOLO-NASL model. In particular, the mAP value of 95.5% obtained attests to the extraordinary accuracy of the model in accurately identifying patterns in the test data. The concomitant accuracy of 95.8% also confirms the model's ability to deliver reliable classifications. Above all, achieving a frame rate underscores the model's ability to provide fast, real-time inference capabilities, an essential requirement in various real-world applications. In short, the confluence of these exemplary performance parameters unequivocally positions the YOLO-NASL model as a high-precision solution well aligned with the needs of sophisticated detection and classification tasks.

Figure 8
Training and Testing Performance Metrics: (a) Training dataset loss curve; (b) Testing Dataset precision and mAP Statistics.

Table 5
The detection performance evaluation across eight models on the entire test dataset: precision, recall, mAP@50, and mAP50@95 expressed in percentage.

Comparative algorithm analysis

We conducted a detailed comparison analysis of several detection networks to assess model performance comprehensively. The YOLO-NASL model, prominent ones like YOLOv8M and YOLOv5x6+TTA, and the notable YOLO-NASM, which had shown promise in preliminary analyses, were all included in our study. Each network underwent extensive training consisting of 80 epochs to obtain optimum convergence to provide an equal playing field. Our results demonstrate the superiority of the YOLO-NASL model, regularly outperforming its competitors in recall rates while retaining competitive accuracy levels. Additionally, the YOLO-NASL model continuously displays greater precision when subjected to the same recall rates, as briefly described in Table 5, showing its superiority in recall and precision. In addition to these crucial criteria, the YOLO-NASL model showed its power by regularly producing lower loss values, which are a sign of improved learning performance and convergence and are shown graphically in Figure 8(a). This suggests that the YOLO-NASL network performs admirably in accuracy and recall and has a greater learning capacity, resulting in a noticeably enhanced overall performance profile. According to several performance criteria, the YOLO-NASL model performs well beyond recall and precision, including greater overall accuracy and improved pattern recognition abilities. We conducted a systematic series of experiments on a consistent dataset using several models to thoroughly examine the feasibility and outline constraints. The findings of this thorough comparison investigation, shown in Figure 9, highlight the YOLO-NASL model's considerable advantages in object detection. Figures 9(a) and 9(b) clearly show that the YOLOv5x6+TTA and YOLOv8M models struggled to identify targets, such as “A” and “B”, during the flowering and fruiting phase (Full pods), resulting in a considerable number of missed detections. In contrast, YOLOv8M tended to misclassify feature “C” in the germination phase. In contrast, there were far fewer missed detections, demonstrating the robustness of YOLO-NASL. Figure 9 (d) reveals a different case, however, in which YOLO-NASL correctly classified the majority of growth phases. YOLO-NASL achieved the highest confidence scores of the eight techniques examined, demonstrating its incredible potential for accurate categorization. The YOLO-NASL detection algorithm demonstrates a remarkable ability to recognize the different growth phases of Vicia faba crops, which sums up our findings. Its performance surpasses previous models, maintaining consistently high levels of accuracy while significantly reducing missed detections and incorrect categorizations.

Figure 9
Assessing model accuracy, (a) YOLOv5x6+TTA, (b) YOLOv8M, (c) YOLO-NASM, and (d) YOLO-NASL.

In this study, all models used were compared in terms of precision, parameters, and time frame, as shown in Figure 10. This comparison will serve as a reference point for other crop growth studies. The comprehensive analysis of the provided figure, which presents performance metrics for various object detection models, yields valuable insights into their characteristics. Precision values, representing the models' accuracy in object detection, vary from 91.6% (YOLOv5L6) to 95.8% (YOLO NAS L), with an average precision of 93.1%. In terms of model complexity, as measured by the number of parameters (Params(M)), substantial variation is observed, ranging from 25.9 million (YOLOv8M) to 140 million (YOLOv5x6+TTA), averaging at approximately 68.9 million parameters. Real-time applicability is quantified by Frames Per Second (FPS) performance, with speeds ranging from 18.9 FPS (YOLOv5x6+TTA) to 46.3 FPS (YOLOv8M), with an average of 32.7 FPS. A nuanced examination hints at a potential trade-off between precision and speed, as faster models, like YOLOv8M, exhibit slightly lower precision, while models with more parameters tend to be slower. Nevertheless, these relationships are intricate and may be influenced by factors such as architectural design and optimization. This comprehensive understanding underscores the importance of thoughtful model selection, emphasizing the need to balance precision and computational efficiency to align with specific task requirements. Further exploration through visualizations and advanced statistical techniques could provide deeper insights into these relationships and guide optimal model choices in practical applications.

Figure 10
Comparative analysis of Vicia faba growth models.

We evaluated the impact of several loss functions in training models, particularly cls_loss Figure 11(a) and box_loss Figure 11(b). We aimed to determine how these loss functions affected the localization precision and model correctness. Through rigorous testing, we found that although box_loss greatly increased the precision of object localization, cls_loss significantly improved object categorization accuracy. These findings highlight the significance of designing loss functions for particular object detection and localization tasks, offering helpful information for streamlining model training procedures and enhancing performance in related applications. Figure 11 displays the experimental findings.

Exploring challenges in identifying Vicia faba growth stages

In the dataset procured during our investigation, we observed a recurring challenge of the "germination" phase, characterized by its frequent visual similarity to weed species. Consequently, this led to limited distinction and occasional image overlaps within the collected dataset. Concurrently, during the "fruiting" phase, the acquisition of aerial imagery introduced complexities in the delineation of Vicia faba pods, often impeded by interposing branches and the foliage of bean plants. These intricacies in spatial recognition imposed inherent difficulties in achieving precise object identification and classification. Furthermore, manually annotating growth phases within the Vicia faba crop dataset introduced an element of susceptibility to human error. Such inadvertent inaccuracies have the potential to exert a discernible influence on the overall efficiency of recognition systems. These inherent limitations in Vicia faba growth phase identification warrant judicious consideration when interpreting the outcomes of our study and when contemplating the practical implications of the YOLO-NASL model for real-world applications. It is imperative to acknowledge and address these inherent challenges to help refine and enhance recognition accuracy and analytical efficacy concerning identifying growth phases within the Vicia faba crop, thereby advancing the applicability and reliability of the YOLO-NASL model for agronomic and agricultural research.

Figure 11
Evaluating loss functions in models, (a) Classification Loss Function, (b) Box loss function

DISCUSSION

Observing and managing Vicia faba crop growth is vital for sustainable agriculture and food security, as it is a critical dietary component globally. Accurate monitoring of Vicia faba growth stages can enhance yields, conserve resources, and mitigate pest and disease impacts. Precision agricultural methods can further optimize resource use. However, traditional video surveillance methods for growth stage monitoring are limited by manual identification and often result in poor identification rates and slow detection due to the inability to extract essential plant characteristics. Thus, there is a need for more sophisticated and efficient monitoring methods.

Recent advancements in deep learning-based detection techniques have shown promise in solving these problems. Notably, the detection of target models has attracted interest, with the Faster R-CNN architecture serving as an example. These models do have certain limits, though, despite their potential. Their lengthy learning curve is a noteworthy negative that creates practical challenges, especially in agricultural situations with constantly changing crop dynamics. Furthermore, because such models' detection processes take a long time, they are unsuitable for real-time monitoring applications where quick reaction is crucial. A consideration in this context is the trade-off between detection speed and accuracy, a challenge that has spurred research into single-step target detection techniques. While capable of meeting real-time requirements, these methods often exhibit a modest reduction in detection accuracy. One standout solution in the crop growth phase detection domain is the YOLO (You Only Look Once) series, particularly YOLO-NAS (YOLO Neural Architecture Search). This approach boasts several advantages, including a compact design, cost-effective deployment, and rapid detection capabilities. Such attributes render it a compelling candidate for the surveillance and control of bean crop growth. The proposed model, known as YOLO-NASL, builds upon the foundation of YOLOv6 and YOLOv8 and incorporates the RepVGG backbone network module. Empirical experimentation has substantiated the practical applicability of this model in recognizing various types and sizes of bean growth stages. Notably, the YOLO-NASL model excels in high average accuracy and expeditious execution, establishing it as a promising tool for precision agriculture applications. The results support the claim that the proposed method outperforms existing target identification algorithms. It achieves notable results with accuracy rates of 95.8%, mAP@0.5 scores of 95.5%, and recall scores of 98.8%. It exhibits a remarkable detection speed, reaching a peak frame rate of 24.8 frames per second, corresponding to the algorithm's detection velocity, satisfying the requirements of real-time applications sufficiently.

The dataset used in this research, with 2530 images, could benefit from expansion to improve the precision and robustness of Vicia faba growth stage detection. The complex relationship between Vicia faba crops and surrounding weeds requires further exploration for nuanced detection methods. Future research aims to develop real-time video detection applications for more accurate and detailed bean crop growth detection. This forward-thinking approach reflects the commitment of agricultural researchers to leverage technology for sustainable and efficient farming practices. Continued innovation and refinement of detection methods are essential to meet the changing demands of modern agriculture and ensure global food security.

CONCLUSION

This research introduces a groundbreaking object detection algorithm tailored for precisely monitoring and identifying diverse growth stages in Vicia faba crops, ranging from germination to flowering and full pod fruition. Leveraging UAV technology, this study represents a pioneering endeavor in crop monitoring, exploring uncharted research territory. Traditional object detection models often struggle to accurately discern various crop growth phases, necessitating innovative solutions. Addressing this challenge, the proposed model based on neural architecture search (NAS) incorporates structural concepts from RepVGG, demonstrating exceptional efficiency in detecting and monitoring crop growth stages. The success in growth stage detection is attributed to the meticulous curation of a comprehensive dataset featuring precisely labeled instances of diverse Vicia faba crop states and rigorous model development efforts. Experimental outcomes unequivocally affirm the superiority of the proposed model over a spectrum of alternative networks. This superiority is underscored by the model's remarkable Precision rate and commendable Mean Average Precision (mAP). Consequently, the proposed model emerges as an effective and robust tool for crop monitoring in agriculture, addressing the challenge of detecting crop growth stages. This, in turn, positively impacts early-stage monitoring and harvest optimization of this crop. Future endeavors will involve deploying these models on GPU modules mounted on UAVs for field experiments and evaluating their performance using real-time field data. Additionally, expanding the dataset to encompass a more comprehensive array of field conditions is planned, fortifying the model's robustness and extending its applicability in precision agriculture. This research contributes significantly to the advancement of crop monitoring techniques, showcasing its potential for transformative applications in agricultural practices.

REFERENCES

  • 1 Dhull SB, Kidwai MK, Noor R, Chawla P, Rose PK. A review of the nutritional profile and processing of faba bean (Vicia faba L.). Legume Science. 2022 Sep;4(3):129.
  • 2 Duc G, Bao S, Baum M, Redden B, Sadiki M, Suso M. J, et al. Diversity maintenance and use of Vicia faba L. genetic resources. Field Crops Research. 2010 Feb;115(3):270-8.
  • 3 Wang G, Bei S, Li J, Bao X, Zhang J, Schultz PA, et al. Soil microbial legacy drives crop diversity advantage: Linking ecological plant-soil feedback with agricultural intercropping. J. Appl Ecol. 2021 Mar;58(3):496-506.
  • 4 Allito BB, Ewusi-Mensah N, Logah V. Legume-rhizobium strain specificity enhances nutrition and nitrogen fixation in faba bean (Vicia faba L.). Agronomy. 2020 Jun;10(6):826.
  • 5 Slimani H, Mhamdi JE, Jilbab A. Drone-Assisted Plant Disease Identification Using Artificial Intelligence: A Critical Review. Int. J. Comput. Digit. Syst. 2023 Oct;14(1):10433-46.
  • 6 Zhou X, Sun Gr, Xu N, Zhang X, Cai J, Yuan Y, et al. A Method of Modern Standardized Apple Orchard Flowering Monitoring Based on S-YOLO. Agriculture. 2023 Feb;13(2):380.
  • 7 Rodrigues L, Magalhães SA, da Silva DQ, dos Santos FN, Cunha M. Computer Vision and Deep Learning as Tools for Leveraging Dynamic Phenological Classification in Vegetable Crops. Agronomy. 2023 Feb;13(2):463.
  • 8 Parvathi S, Tamil Selvi S. Detection of maturity stages of coconuts in complex background using Faster R-CNN model. Biosyst. Eng. 2021 Feb;202:119-32.
  • 9 Kumar TA, Rajmohan R, Ajagbe SA, Gaber T, Zeng XJ, Masmoudi F. A novel CNN gap layer for growth prediction of palm tree plantlings. Plos one. 2023 Aug;18(8):0289963.
  • 10 Wang D, He D. Channel pruned YOLO V5s-based deep learning approach for rapid and accurate apple fruitlet detection before fruit thinning. Biosys Engineering. 2021 Oct;210: 271-81.
  • 11 Ge Y, Lin S, Zhang Y, Li Z, Cheng H, Dong J, et al. Tracking and Counting of Tomato at Different Growth Period Using an Improving YOLO-Deepsort Network for Inspection Robot. Machines. 2022 Jun;10(6):489.
  • 12 Rahman T, Siregar MAL, Kurniawan A, Juniastuti S, Yuniarno EM. Vehicle Speed Calculation from UAV Video Based on Deep Learning. In: Intern Conf on Comp Engin, Network, and Intel Multim (CENIM);2020 Dec 24; Surabaya, Indonesia. IEEE; c2021. p. 229-33.
  • 13 Madasamy K, Shanmuganathan V, Kandasamy V, Lee MY, Thangadurai M. OSDDY: embedded system-based object surveillance detection system with small UAV using deep YOLO. EURASIP J. Image and Video Process. 2021 May;2021(1):1-14.
  • 14 Slimani H, Mhamdi JE, Jilbab A. Assessing the advancement of artificial intelligence and drones’ integration in agriculture through a bibliometric study. Inter Journal of Elec and Comp Eng. 2024 Feb;14(1):878-890.
  • 15 Deci.ai. Yolo-nas [Internet]; 2023 Sep 1 [updated 2023 Dec 20; cited 2023 Nov 19]. Available from: https://github.com/Deci-AI/super-gradients/blob/master/YOLONAS.md
    » https://github.com/Deci-AI/super-gradients/blob/master/YOLONAS.md
  • 16 Aharon S, Louis-Dupont, Ofri Masad, Yurkova K, Lotem Fridman, Lkdci, et al. Super-Gradients [Internet]. 2021 [cited 2023 Nov 22]; Available from: https://docs.ultralytics.com/fr/models/yolo-nas
    » https://docs.ultralytics.com/fr/models/yolo-nas
  • 17 Terven J, Cordova-Esparza D. A Comprehensive Review of YOLO: From YOLOv1 and Beyond [Internet]. 2023 [cited 2023 Nov 15]; Available from: http://arxiv.org/abs/2304.00501
    » http://arxiv.org/abs/2304.00501
  • 18 Ding X, Zhang X, Ma N, Han J, Ding G, Sun J. RepVGG: Making VGG-style ConvNets Great Again. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition; 2021;13733-42. Available from: https://arxiv.org/abs/2101.03697
    » https://arxiv.org/abs/2101.03697
  • 19 Weng K, Chu X, Xu X, Huang J, Wei X. EfficientRep: An Efficient Repvgg-style ConvNets with Hardware-aware Neural Network Design [Internet]. 2023 Feb [cited 2023 Oct 10]; Available from: http://arxiv.org/abs/2302.00386
    » http://arxiv.org/abs/2302.00386
  • 20 Slimani H, Mhamdi JE, Jilbab A. Artificial Intelligence-based Detection of Fava Bean Rust Disease in Agricultural Settings: An Innovative Approach. Int. J. Adv. Comput Sci Appl. 2023 May;14(6).
  • 21 Alagarsamy S, James V, Raj RSP. An experimental analysis of optimal hybrid word embedding methods for text classification using a movie review dataset. Braz Arch Biol Technol. 2022 Aug;65.

Edited by

  • Editor-in-Chief:
    Alexandre Rasi Aoki
  • Associate Editor:
    Alexandre Rasi Aoki

Publication Dates

  • Publication in this collection
    30 Sept 2024
  • Date of issue
    2024

History

  • Received
    20 Nov 2023
  • Accepted
    16 May 2024
location_on
Instituto de Tecnologia do Paraná - Tecpar Rua Prof. Algacyr Munhoz Mader, 3775 - CIC, 81350-010 , Tel: +55 41 3316-3054 - Curitiba - PR - Brazil
E-mail: babt@tecpar.br
rss_feed Acompanhe os números deste periódico no seu leitor de RSS
Acessibilidade / Reportar erro