Layer 2-4: Detection & Tracking

The Detection layers use a dual-stage approach: YOLOv8 for person detection and tracking, followed by InsightFace SCRFD for face detection. This combination provides robust person tracking with accurate face recognition.

Overview

Layer	Component	Technology	Purpose
2	Person Detection	YOLOv8	Detect all people in frame
3	Tracking	ByteTrack	Assign persistent IDs
4	Face Detection	InsightFace SCRFD	Locate faces in full frame

Layer 2: Person Detection (YOLOv8)

YOLOv8 from Ultralytics provides fast and accurate person detection.

Key Features

Feature	Description
Real-time Detection	20-30+ FPS on modern hardware
High Accuracy	State-of-the-art object detection
Person Only	Filters to class 0 (person)
Confidence Scores	Detection confidence per person

Detection Process

Frame is passed to YOLOv8 model
Model detects all objects in frame
Filter to class 0 (person) only
Return bounding boxes with confidence scores
Pass to ByteTrack for tracking

Model Options

Model	Speed	Accuracy	Size	Use Case
`yolov8n.pt`	Fastest	Good	~6MB	Real-time (default)
`yolov8s.pt`	Fast	Better	~22MB	Balanced
`yolov8m.pt`	Medium	High	~52MB	Higher accuracy

Configuration

Setting	Description	Default
YOLO Model	Detection model	yolov8n.pt
YOLO Confidence	Minimum confidence	0.8
YOLO Classes	Object classes	[0] (person)

Layer 3: Tracking (ByteTrack)

ByteTrack provides persistent tracking IDs across video frames.

Key Features

Feature	Description
Persistent IDs	Stable track IDs across frames
Motion Prediction	Handles temporary occlusions
Re-identification	Recovers ID when person re-enters
Multi-object	Tracks multiple people simultaneously

Tracking Process

Receive person detections from YOLOv8
Match detections to existing tracks using IoU and motion
Assign new track IDs to unmatched detections
Predict locations for temporarily lost tracks
Return TrackedPerson objects with stable IDs

Tracker Options

Tracker	Description
`bytetrack.yaml`	Default, fast and accurate
`botsort.yaml`	Alternative with different motion model

Configuration

Setting	Description	Default
YOLO Tracker	Tracker config file	bytetrack.yaml
Track Mode	all or known_only	all
Show Track ID	Display IDs in output	True

Layer 4: Face Detection (InsightFace SCRFD)

InsightFace SCRFD detects faces in the full frame for maximum accuracy.

Key Features

Feature	Description
Multi-face Detection	Detect multiple faces in single frame
High Accuracy	Works with varied angles and lighting
Landmarks	5-point facial landmarks included
Embeddings	512-d embedding extracted per face
Confidence Score	Detection confidence (det_score)

Detection Process

Full frame is passed to InsightFace (not cropped to person boxes)
SCRFD detects all faces in the image
Each face includes bbox, det_score, landmarks, embedding
Quality filtering removes low-quality detections
Valid faces proceed to recognition and IoU matching

Why Full-Frame Detection?

Higher accuracy - SCRFD optimized for full-frame detection
Better small face detection - Not constrained by person crop
Faster - Single detection pass instead of per-person crops
IoU matching - Links faces to tracked persons by spatial overlap

Quality Filtering

Not all detected faces are suitable for recognition. The system filters by:

Filter	Threshold	Purpose
Min Face Size	50 pixels	Ensure face detail
Quality Threshold	0.5	Detection confidence

Configuration

Setting	Description	Default
Model Name	InsightFace model	buffalo_l
Detection Size	Input size for detector	640x640
GPU Support	Enable CUDA	Off
Min Face Size	Minimum face pixels	50
Quality Threshold	Minimum detection confidence	0.5

Model Options

Model	Detection Speed	Accuracy	Size
buffalo_l	Medium	Highest	~400MB
buffalo_m	Fast	High	~200MB
buffalo_s	Fastest	Good	~100MB

Detection Size

Size	Speed	Small Face Detection
320x320	Fast	Poor
640x640	Medium	Good (default)
1280x1280	Slow	Excellent

IoU Matching

After person tracking and face detection, IoU (Intersection over Union) matching links faces to tracked persons.

Matching Process

For each tracked person bounding box
Calculate IoU with each detected face bounding box
Assign face with highest IoU to person
Link face identity to person track ID

Benefits

Stable identity - Face identity linked to tracked person
Identity caching - Last known identity preserved per track
Handles occlusions - Identity survives temporary face loss

Face Object Properties

Each detected face object contains:

Property	Type	Description
Bounding Box	Array	[x1, y1, x2, y2] coordinates
Detection Score	Float	Detection confidence (0-1)
Embedding	Array	512-d face embedding
Landmarks (5)	Array	5-point facial landmarks