NAVIGATION STRATEGY FOR MOBILE ROBOT BASED ON COMPUTER VISION AND YOLOV5 NETWORK IN THE UNKNOWN ENVIRONMENT

The capacity to navigate effectively in complex environments is a crucial prerequisite for mobile robots. In this study, the YOLOv5 model is utilized to identify objects to aid the mobile robot in determining movement conditions. However, the limitation of deep learning models being trained on insufficient data, leading to inaccurate recognition in unforeseen scenarios, is addressed by introducing an innovative computer vision technology that detects lanes in real-time. Combining the deep learning model with computer vision technology, the robot can identify different types of objects, allowing it to estimate distance and adjust speed accordingly. Additionally, the paper investigates the recognition reliability in varying light intensities. When the light illumination increases from 300 lux to 1000 lux, the reliability of the recognition model on different objects also improves, from about 75% to 98%, respectively. The findings of this study offer promising directions for future breakthroughs in mobile robot navigation.


INTRODUCTION
A mobile robot consists of a chassis, or body, that houses the robot's hardware components. The chassis is typically designed to be sturdy, lightweight, and capable of accommodating various sensors and actuators. In addition, the software of a mobile robot encompasses the algorithms, control systems, and operating systems that enable it to function. The cameras also play a very important role; they provide visual perception and enable the robot to gather information about its environment. Thus, vision-based mobile robot navigation includes the design of robots, cameras, object recognition algorithms, and control systems. Intelligent mobile robots must possess the ability to navigate in complex environments. The field of mobile robot navigation is continuously evolving, with various technologies being developed. The development of intelligent robots capable of autonomous navigation in unknown environments is the primary focus of mobile robot navigation. It is a multidisciplinary field that combines techniques and technologies from robotics (Hichri et al., 2021;Nguyen & Vu, 2022;Thuan et al., 2023), computer vision (Marroquín et al., 2023), machine learning (Geng et al., 2018;Xiao et al., 2022), and control theory (Ren et al., 2022;Shang et al., 2021). The ultimate objective of mobile robot navigation research 83 is to create automated systems that can operate safely and efficiently in real-world environments, performing tasks such as exploration, surveillance, delivery, and transportation. This field has gained significant attention from researchers in recent years, with numerous studies focused on advancing mobile robot navigation techniques. A novel navigation model was proposed in the article, which not only establishes two-way communication between the cerebellum and basal ganglia but also allows for their co-development . This approach enabled the agent to autonomously develop its intelligence through hybrid learning techniques. An artificial potential field (APF) with brain signals through a brainrobot interface (BRI)-based control strategy, in conjunction with simultaneous localization and mapping (SLAM) techniques was developed in the study (Liu et al., 2020). This combination enabled the establishment of a relationship between the strength of EEG signals and the intensity of the potential field, facilitating the navigation and control of a mobile robot in uncertain environments. The framework for self-improving lifelong learning designed for a mobile robot was introduced in the research (Liu et al., 2021). This method could enable agents to operate in various environments. The approach for guiding mobile robots was proposed by (Ajeil et al., 2020). The core of this approach was based on hybridizing the Particle Swarm Optimization with the Modified Frequency Bat algorithm. The researchers in (Iqbal et al., 2020) discussed a mobile field robot that operates on a Robotic Operating System (ROS) and is capable of simultaneously navigating through occluded crop rows while performing various phenotyping tasks. The authors in (Gharajeh et al., 2020) proposed a method for collision-free navigation of autonomous mobile robots using a hybrid GPS-ANFIS approach. In the study (Ran et al., 2021), the authors developed a Convolutional Neural Network that exhibited superior scene classification accuracy and efficiency in processing monocular images. The research from Lagaza et al. (Lagaza, Kashyap, & Pandey, 2020) has identified navigation algorithms that can efficiently address path optimization challenges while minimizing the required time. Ehab Al Khatib et al. (Khatib et al., 2020) have identified navigation algorithms that can efficiently address path optimization challenges while minimizing the required time. A novel approach integrated with a knowledge-based neural fuzzy controller (KNFC) was proposed to control the navigation of mobile robots in the study (Chen et al., 2021). The authors in (Chen et al., 2021) proposed a multivariable, event-triggered, generalized super-twisting sliding-mode algorithm for the safe navigation of nonholonomic mobile robots in unknown indoor environments. The authors in (Wang et al., 2021) focused on the autonomous navigation of a wheeled mobile robot in a dynamic environment, utilizing a 3D point cloud map and the Creative Mechanism Design Methodology.
Inspired by the research mentioned, this paper proposes a detection framework for mobile robot navigation based on computer vision and deep learning. To be more detailed, a segmentation algorithm is designed to extract the information of movement lane for the robot. After that, the YOLOv5 network (Luo et. al, 2021) is built up and integrated automatically on the robot controller. Hence, the robot could detect and cluster the necessary information for navigation process. The experimental results indicate the working performance of the proposed method under different circumstances.
The rest of the content of this paper is included as follows: The next section illustrates the detailed structure of the proposed approach. In the third section, the experimental results are conducted and analyzed via different scenarios. Finally, the conclusion is drawn up in depth in Section 4. 84

THE PROPOSED METHOD
In general, the proposed method is designed based on the following criteria: the ability to identify lanes; the ability to detect different groups of objects and control reactions corresponding to them; and the ability to modify instantaneous speed. The sub-sections below analyze in depth all the mentioned criteria.

Lane detection algorithm
To determine the movement line for the robot model, a real-time image processing system is designed as shown in Fig. 1. There are six steps in total of this procedure. To begin with, the system takes the images extracted from cameras mounted at the front of the robot model. After that, the system determines the color threshold of the lane and calibrates the precision lane by reducing noise on the binary images. To be more specific, histogram equalization is used to eliminate the outline pixels from the photos, and then a morphological filter is employed for Hole Filling process (Cho et al., 2022). The latency for the computing phase after taking the photo and sending data to the drives is estimated at 1 seconds on average. As a result, the region for the robot motion is clearly separated from the input images as shown in Fig. 2. In the next step, the contours of the binary images are taken out to get an idea of how much the lanes curve. The average steering angle for the robot is used to figure out the value of the lane curvature. Also, in order to conserve computing resources for the processor, the authors severely reduce the required movement lanes on the image region. While evaluating the road's curve, the algorithm simply examines the curvature immediately in front of the robot model. So, the binary images are now cropped to a new image and limited by the orange circle dot that can be seen in Fig. 3. The mesh of lane curvature is established by aiding the method of summing pixels. As indicated in Fig. 4, the lane curvature included the pixels with a value of 255. All pixels of this type on the left and right sides of the y-axis, which are separated by the red line, are summarized. Thus, the yellow boundaries are drawn to display the lane for the robot to move in. However, when the robot runs through roadways with varying curves and intricacy, the numbers of pixels to the left and right of the y-axis are different. For instance, in the case of turning left, the pixels on the left are obviously greater than those on the right. The only case where the number of pixels is evenly distributed in both directions is when the robot is running straight.

Fig. 5. The lane with yellow dot instructions
For the purpose of ensuring the continuity and the precision of the robot moving decision every time step, the signal for the DC motor is unified and depicted by the yellow point as shown in Fig. 5. As the lane bends to the right, the yellow dot deviates to the right; when going in a straight line, the yellow dot is centered on the screen. The final results of lane determination are presented in Fig. 6.

Object Detection by Applying YOLOv5 Network
In this study, a YOLOv5 network is designed to detect several types of objects: static objects (traffic lights, speed indicators) and dynamic objects (human target). The detailed structure of the proposed model is illustrated in Fig. 7. There are 1200 images captured in total for the mentioned target groups. These obtained data sets are then imported and split for training the YOLOv5 model at the default ratios: seven parts of the data size for the training phase and three parts of the data size for the evaluation phase.
The architecture of the YOLOv5 network is composed of three main components: the Backbone, the Neck and the Head. To be more specific, the Backbone is combined between the Cross Stage Partial (H. Lin, & J. Yang, 2022) and the Darknet53 network . The purpose of this combination is for the Cross Stage Partial to address the redundant gradient problem by truncating the gradient flow, while the Darknet53 addresses the vanishing gradient issue. Thus, the number of calculated parameters is decreased, and the inference speed, a key metric in real-time object detection models, is dramatically boosted. The second component of the proposed network, the Neck, is built on the Path Aggregation 87 Network (PANet) (Zhou et al., 2022) to enhance the information flow and help in the accurate localization of pixels in the process of mask prediction. Hence, the possibility of creating features based on previously unseen data is improved. Finally, the Head is made of the convolutional sub-network to generate predictions from the anchor boxes for the detection process. To evaluate the the accuracy of the classified object in the realtime video stream, a synthetic loss function is established. It could be seen that the synthetic loss function is the sum of three kinds of losses: the confidence loss, the localization loss, and the classification loss. The formula of the synthetic loss function is written as follows: which: Synthetic  is the synthetic loss function; Confidence  is the confidence loss; Localization  is the localization loss; and Classification  is the classification loss. According to (Hu et al., 2022), the confidence loss, the localization loss, and localization loss are calculated as in Eq. (2) to Eq. (4): whereas: obj ij R means to judge whether there is an object center in grid mesh of the image. If the grid contains an object center, it is responsible for predicting the category probability of the object, i C represents the confidence score, i C represents the intersection of the prediction boundary box and the basic facts, and coord  represents the weight of the classification error.
For the activation stage of the proposed network, the SiLU function (Elfwing et al., 2018) is applied in the hidden layers, while the Sigmoid function (Alexandris et.al, 2019) is integrated in the output layers. To save computational time, the ADAM function (Cao et al., 2023) is applied, which is the default in the original YOLOv5 network.

RESULTS AND DISCUSSION
To inspect the stability as well as the working performance of the proposed method, a differential wheeled robot integrated with a smartphone at the front of the model is designed. The completed concept of the model is shown in Fig. 8. The study uses an ultrasonic sensor HC-SR04 to estimate the distance from the robot to the object. From there, the system will transmit control signals to navigate the robot in different situations.
After that, the case studies are investigated via several scenarios: human detection, signal traffic classification and navigation based on the line and the types of traffic light.

Case study 1: Human detection
The ability of the proposed approach to detect several human targets is shown in Fig. 9. It is evident that the model could perform well with the complicated attire of individuals dressed in white or black suits. Besides, the distance to each target is computed and displayed in real-time. To be more detailed, in the case of a stop sign, the robot will begin to detect and decelerate when the estimated distance to the sign reaches 0.5 meters. About 0.2 meters away from the sign, the robot will come to a complete stop. In addition, in the case of speed indicator signs, the robot will compute and update the motor velocity in order to maintain the permitted speed indicated by the signs. If the maximum speed is higher than the current speed, the robot could increase the velocity value. In contrast, the robot will slow down the current speed.

Case study 3: Navigation in complicated traffic situations
In case study 3, the hypothetical situation is that the traffic light is green but there are still some people crossing the road. The control commands are executed in order of priority, from human objects to light signals. Hence, the robot has to stop from a safe distance in this situation, as shown in Fig. 11.

Analyze the experimental statistics with various levels of illumination
For robots to be useful, they need to be smart enough to see what is going on around them and make decisions based on what they see. Object recognition is one of the critical tasks that robots must perform to function correctly. In this sub-section, all case studies mentioned above are conducted with different rate of illumination variables (Full bright -1000 lux, 91 Normal bright -750 lux, Weak light -300 lux). The normal speed of the robot is still set at 0.2 m/s and 0.5 m/s, respectively. Tab. 1 to Tab. 6 show the results of detection under different illumination conditions. The x-y figure generated from the tests revealed that the proposed algorithm had a high accuracy rate of over 75% in recognizing objects. The algorithm performed exceptionally well in recognizing traffic lights, with an accuracy rate of between 86.5% and 98.4%. The recognition of human objects, however, had a lower accuracy rate compared to that of traffic lights, with values ranging from 81.8% to 94.1%. The lowest accuracy rate was observed in the recognition of signs, with values ranging from 76.1% to 93%. The results showed that as the light intensity increased, the robot's ability to identify objects improved. However, we observed a stable recognition ability for the robot. Interestingly, the robot was less accurate in recognizing objects when it moved at a speed of 0.5 m/s than when it moved at a speed of 0.2m/s. This finding suggests that the target recognition of the robot model improves when it moves slowly. Additionally, the largest amplitude fluctuation in accuracy was found in the recognition of the stop sign at a light intensity of 1000 lux in both speed scenarios. Furthermore, it is noticed that when the robot encountered the speed indicator 20, it tended to slow down, which led to an improvement in identification accuracy. The results demonstrate that environmental conditions, such as light intensity levels, can impact object recognition ability in general. As the light intensity increased, the robot identification ability improved. This finding is consistent with previous studies on object recognition, where an increase in light intensity improved the recognition ability of algorithms. From the obtained results, some summaries are drawn up as follows:  To achieve the path planning, a lane detection method based on binary images is proposed to provide exact trajectories.  The results indicate that environmental conditions such as light intensity could impact the object recognition of the robot. As the light intensity increased, the recognition ability of the robot model improved. This finding is consistent with previous studies in the same field.  Moreover, it is noticed that the precision of recognition on multiple targets is better when the robot model moves at a slower speed of 0.2 m/s instead of 0.5 m/s. This implies that in practical scenarios, it may be necessary to slow down the robot to enhance the detection accuracy.  In the case of complicated traffic situations, it is necessary to design rules of precedence to give appropriate motion behaviors on the mobile robot. Hence, the safety of both the robot and the target could be ensured. This is proven clear in Case Study 3.

CONCLUSION
In this paper, a detection framework for mobile robot navigation based on computer vision and deep learning is proposed. This study aims to automatically navigate the mobile robot in complex traffic situations by intentionally combining several algorithms and the YOLOv5 model. The experimental environment is configured with the appropriate conditions to evaluate the functionality of the proposed approach. In conclusion, this study contributes to the understanding of the factors that affect the performance of an object recognition algorithm. Moreover, the proposed framework not only assures obstacle avoidance, but also stabilizes the robot in its desired position and orientation despite slippage. These findings have important implications for the development of robotics and automation technologies in various fields, including manufacturing, transportation, and healthcare. Future research can focus on exploring the impact of other environmental factors, such as temperature, humidity, and noise for mobile robots in real-time applications.