📦 Part II: Multi-Object Tracking & Spatial Logic (ByteTrack + OpenCV)

Advanced Spatial Logic: Real-Time Object Tracking & Counting on Edge Devices

Building memory-driven, spatial intelligence layers on top of raw deep learning predictions to enable conveyor belt pick-and-place automation.

Python OpenCV Ultralytics YOLOv8 ByteTrack Spatial Mathematics Edge Deployment

01: Project Context & Challenge

Advanced Spatial Logic: Throughput Counting & Zone Monitoring

Two-Part Vision System (Part II of II) This project is the decision-making layer of a two-part edge vision system. While Part II (this page) details the temporal memory tracking and custom 2D polygon boundaries, Part I covers the foundational dataset creation and deep network custom training. Explore 🎯 Part I, Custom YOLOv8 Object Detection.

Raw deep learning models like YOLO are excellent detectors, but they operate as static frame-by-frame snapshot engines. They are completely temporal-blind: they lack "memory." To successfully automate a physical conveyor belt, knowing that an item exists in a single frame is not enough. The automation layer needs to verify if an object identified in Frame A is the exact same piece in Frame B, measure the total volume of objects moving past a threshold (throughput), and pinpoint the exact moment an item enters the robot's coordinates (pick-zone).

To bridge the gap between raw bounding boxes and mechanical action, we engineered an advanced spatial logic pipeline. Combining the lightweight YOLOv8 detector with the high-performance ByteTrack multi-object tracking engine and native OpenCV vector operations, we built a low-latency, modular system that converts raw video feeds into actionable, coordinate-level automation triggers.

The Spatial Challenge Deep learning models provide detections, but spatial logic provides decisions. By giving the AI "memory" and mathematical boundaries, we can reliably guide a robotic arm to execute pick-and-place routines on moving items.

30+

FPS on Edge Hardware

< 10ms

Processing Overhead

100%

Identity Cohesion

02: The Tracking Engine

Giving AI "Memory" via Multi-Object Tracking

The foundation of spatial logic is identity tracking. We integrated ByteTrack directly with our fine-tuned YOLOv8 model. Unlike generic trackers that throw away low-confidence detection boxes (often leading to tracking loss when items are partially blocked or shadowed), ByteTrack exploits association logic across every single detection box, matching low-score detections to existing trajectories using Kalman filters.

By comparing the Intersect-over-Union (IoU) of boxes across frames, ByteTrack assigns a persistent, unique ID to every detected item. This ID remains tied to the physical object as long as it is in the camera's field of view. This "memory" is what prevents the throughput counter from double-counting the same item, and it provides a smooth, continuous path of coordinates that a robotic arm can easily follow.

# Integrating multi-object tracking with YOLOv8 using ByteTrack
from ultralytics import YOLO

# Load fine-tuned conveyor model weights
model = YOLO("best.pt")

# Perform real-time inference with persistent tracking
results = model.track(source="conveyor_feed.mp4", persist=True, tracker="bytetrack.yaml")
for box in results[0].boxes:
    # Safely retrieve persistent track ID assigned by ByteTrack
    track_id = int(box.id[0]) if box.id is not None else None
    x1, y1, x2, y2 = box.xyxy[0]
    print(f"Object ID: {track_id} | Coordinates: ({x1:.1f}, {y1:.1f})")

03: Application 1: Zone-Based Monitoring

Polygon Boundary Logic for Robotic Workspace Gates

With unique IDs successfully assigned, we implemented virtual bounding zones to simulate industrial robotic workspace boundaries. Rather than using simple rectangular bounding boxes, we engineered support for arbitrary 2D polygons to model complex, real-world pick-zones or machine exclusion spaces.

Using OpenCV's mathematical vectors, we defined a custom coordinates array representing the green "Pick Zone." In each frame, the centroid $(C_x, C_y)$ of each active track ID is computed. We then call the cv2.pointPolygonTest algorithm:

# Check if object centroid lies within custom pick polygon
# dist = positive (inside), zero (on edge), negative (outside)
dist = cv2.pointPolygonTest(polygon_coords, (centroid_x, centroid_y), measureDist=False)
if dist >= 0:
    # Target object is inside pick zone, trigger robotic pickup event
    trigger_pick_action(track_id, centroid_x, centroid_y)

This allows the system to register precisely when objects enter the pick area, incrementing a live 'IN ZONE' monitor and feeding exact localized coordinates directly to the robot arm's actuation system.

Real-time zone tracking. The system draws a virtual green polygon and successfully increments the 'IN ZONE' counter only when the tracked centroids of the fine-tuned classes enter the designated area.

04: Application 2: Throughput Measurement

Tripwire Logic for High-Speed Production Analytics

The second application is a high-speed **tripwire throughput counter**. This is designed to replace traditional optoelectronic sensors, with the added capability of classifying exactly what kind of product crossed the line.

The system defines a linear threshold using two points. As items move from left to right down the conveyor belt, the tracking engine records the history of their coordinates. By comparing the previous X-coordinate of a specific object ID to the X-coordinate of the tripwire line, the system detects a threshold crossing. Once crossed, the object's ID is registered in a "counted set" to prevent duplicate tracking, and the global throughput count is updated instantly.

Real-time throughput counting. A virtual red tripwire is established. As the multi-object tracker maintains the ID of the moving items, the top-left 'COUNT' updates precisely as each centroid crosses the threshold.

05: Key Learnings & Scalability

Industrial Efficacy and Edge Deployment Potential

This project highlights that the real value of deep learning in industrial contexts is unlocked by clever post-processing algorithms. By keeping our vision pipeline lightweight and computationally efficient, we achieve robust spatial intelligence that scales easily to standard hardware.

High Edge Efficiency: Layering tracking algorithms and coordinate logic using vector operations on top of a YOLO pipeline is highly modular and runs at 30+ FPS with extremely minimal overhead.
Industrial Production Analytics: The tripwire throughput system can easily be deployed on packaging lines, logistics hubs, or traffic analysis stations to classify and measure flow rates.
Safety & Danger Zone Enforcement: The polygon zone logic directly translates to safety systems (PPE compliance areas, heavy machinery lockouts, and personnel separation barriers) to protect workers in automated plants.

Open-Source Project Repository & Notebooks All source files, Jupyter Notebooks, and training weights for this custom fine-tuning model are open-sourced on GitHub at vision-yolo-finetune. Specifically, this repository provides:

Google Colab Notebook: The fully documented Jupyter/Colab notebook used to set up the environment, import the Roboflow dataset, and fine-tune YOLOv8 on our 4 custom shape classes.
Local Inference Script: A standalone production script used to execute the trained YOLOv8 model locally on a live camera video feed to test, debug, and validate inference accuracy and real-time latency before deployment.