Object Detection

Computer vision detection algorithms locate and identify objects within digital images or video frames. One fast detection algorithm is YOLO (You Only Look Once). YOLO is known for real-time object detection and has several versions, with YOLOv8 released in 2023. YOLO detects multiple objects in an image or video frame in a single forward pass of the neural network. It divides the input image into a grid and predicts bounding boxes and class probabilities for each grid cell.

YOLO is popular for real-time applications like self-driving cars, surveillance, and robotics. Install YOLOv8 from ultralytics using pip:

pip install ultralytics

[$[Get Code]]

Below is an example code snippet to perform object detection from an image file:

from ultralytics import YOLO
import matplotlib.pyplot as plt

# Download sample image
file = 'street.png'
url = 'http://apmonitor.com/dde/uploads/Main/'+file

# Load pretrained YOLOv8n model
model = YOLO('yolov8n.pt')

# Prediction
results = model(url)

# Print results
fr = results[0].plot()
# convert BGR to RGB and show
plt.imshow(fr[:,:,[2,1,0]])
plt.show()

[$[Get Code]]

The results variable contains boxes with coordinates to the bounding coordinates and confidence for each detection.

The detection misses the airplane and bus and other handbags or briefcases. It correctly identifies objects such as the car and people. YOLOv8 can be used with a pretrained model or trained using a custom labeled image set.

Activity: Identify Objects from Video

Extend the single image object detection to the example from Video Data Analysis. Test the object detection with the runner video sample.

Detect Objects from Webcam Video

Modify the script to read from a webcam instead of a video file source. Change to cv2.VideoCapture(0) (1st webcam) or cv2.VideoCapture(1) (2nd webcam) if there are multiple webcams. Remove the 0.2 sec time delay, but add a time limit of 30 seconds. Selecting q also quits the video collection. There is no need to save the video. If the video is saved, adjust the frame resolution to the video output dimensions to avoid an error.

orange
blue
green
pink
cyan
red
violet

Data-Driven Engineering