AI Vision Tracking System
A real-time computer vision system built with Python, OpenCV, and YOLOv8 that detects and tracks people through a webcam feed with stable IDs and live overlays.
See it in action
Overview
This project is a real-time computer vision system built using Python, OpenCV, and YOLOv8. It is designed to detect and track people through a webcam feed, assign consistent IDs, and display live tracking information directly on the video output.
The main focus of the project is stable, real-time detection with minimal jitter and consistent identification of multiple people in a scene.
Problem It Solves
Traditional object detection systems often struggle with instability between frames. Objects can flicker, lose their label, or be repeatedly re-identified as new targets. This makes it difficult to track movement accurately over time.
This system addresses those issues by adding tracking persistence, filtering, and smoothing to maintain consistent results.
How it works
Webcam stream is read frame-by-frame with OpenCV at the target FPS.
YOLOv8 runs inference on each frame and filters to people only.
Built-in tracker assigns persistent IDs across frames to keep identity consistent.
Bounding boxes, IDs, counts, and offsets render live and stream out over serial.


Key Features
- Detects only people in real time using YOLOv8.
- Assigns stable IDs such as Person 1, Person 2, etc.
- Displays bounding boxes and labels on each detection.
- Shows live count of people in the frame.
- Maintains consistent tracking across movement.
- Visual debugging overlays: center points, offsets, and IDs.
Example Behavior
When one person enters the frame, they are labeled as Person 1 and the counter shows 1. If another person enters, they are labeled Person 2 and the count updates to 2.
If a person leaves and returns later, the system attempts to maintain identity consistency rather than creating a new label each time.
Challenges & Learnings
Object detection stability
At the start, the detection kept flickering and shifting even when the object barely moved. The system was too sensitive to small changes in lighting and noise, which made tracking feel unstable. I had to tune thresholds and filtering so it only reacted to strong, consistent signals instead of random variations.
Balancing smooth tracking and responsiveness
Making the tracking movement feel natural was tricky. Too smooth and it lagged behind the target. Too responsive and it became jittery and overshot. I tested different scaling values and smoothing techniques until it could follow movement quickly while staying stable.
Camera calibration and movement mapping
Translating pixel distance into real movement was not straightforward. Early versions made the system either move too aggressively or barely react at all. I adjusted how the center offset was calculated and scaled so movement matched real-world position more accurately.
Use Cases
This system can be applied in a variety of real-world scenarios. It can be used for crowd monitoring in public spaces, smart surveillance systems, or basic people counting in controlled environments.
It also serves as a foundation for robotics applications, where real-time human detection and tracking can be used for interaction, navigation, or targeting systems.
Technical Approach
The system uses YOLOv8 for object detection and OpenCV for video processing. Tracking is handled using built-in YOLO tracking features with persistence enabled.
Filtering is applied to ensure only human detections are processed. Additional smoothing is used to reduce instability caused by frame-to-frame variation.
Summary
This project demonstrates practical experience in computer vision, real-time processing, and AI-based tracking systems. It focuses on stability, responsiveness, and usable real-world output rather than just detection accuracy.
It can be extended into more advanced systems such as predictive tracking, multi-camera setups, or integration with robotic control systems.