LogoVisionLog

Layer 2-4: Detection & Tracking

Person detection with YOLOv8 and face detection with InsightFace SCRFD

The Detection layers use a dual-stage approach: YOLOv8 for person detection and tracking, followed by InsightFace SCRFD for face detection. This combination provides robust person tracking with accurate face recognition.

Overview

LayerComponentTechnologyPurpose
2Person DetectionYOLOv8Detect all people in frame
3TrackingByteTrackAssign persistent IDs
4Face DetectionInsightFace SCRFDLocate faces in full frame

Layer 2: Person Detection (YOLOv8)

YOLOv8 from Ultralytics provides fast and accurate person detection.

Key Features

FeatureDescription
Real-time Detection20-30+ FPS on modern hardware
High AccuracyState-of-the-art object detection
Person OnlyFilters to class 0 (person)
Confidence ScoresDetection confidence per person

Detection Process

  1. Frame is passed to YOLOv8 model
  2. Model detects all objects in frame
  3. Filter to class 0 (person) only
  4. Return bounding boxes with confidence scores
  5. Pass to ByteTrack for tracking

Model Options

ModelSpeedAccuracySizeUse Case
yolov8n.ptFastestGood~6MBReal-time (default)
yolov8s.ptFastBetter~22MBBalanced
yolov8m.ptMediumHigh~52MBHigher accuracy

Configuration

SettingDescriptionDefault
YOLO ModelDetection modelyolov8n.pt
YOLO ConfidenceMinimum confidence0.8
YOLO ClassesObject classes[0] (person)

Layer 3: Tracking (ByteTrack)

ByteTrack provides persistent tracking IDs across video frames.

Key Features

FeatureDescription
Persistent IDsStable track IDs across frames
Motion PredictionHandles temporary occlusions
Re-identificationRecovers ID when person re-enters
Multi-objectTracks multiple people simultaneously

Tracking Process

  1. Receive person detections from YOLOv8
  2. Match detections to existing tracks using IoU and motion
  3. Assign new track IDs to unmatched detections
  4. Predict locations for temporarily lost tracks
  5. Return TrackedPerson objects with stable IDs

Tracker Options

TrackerDescription
bytetrack.yamlDefault, fast and accurate
botsort.yamlAlternative with different motion model

Configuration

SettingDescriptionDefault
YOLO TrackerTracker config filebytetrack.yaml
Track Modeall or known_onlyall
Show Track IDDisplay IDs in outputTrue

Layer 4: Face Detection (InsightFace SCRFD)

InsightFace SCRFD detects faces in the full frame for maximum accuracy.

Key Features

FeatureDescription
Multi-face DetectionDetect multiple faces in single frame
High AccuracyWorks with varied angles and lighting
Landmarks5-point facial landmarks included
Embeddings512-d embedding extracted per face
Confidence ScoreDetection confidence (det_score)

Detection Process

  1. Full frame is passed to InsightFace (not cropped to person boxes)
  2. SCRFD detects all faces in the image
  3. Each face includes bbox, det_score, landmarks, embedding
  4. Quality filtering removes low-quality detections
  5. Valid faces proceed to recognition and IoU matching

Why Full-Frame Detection?

  • Higher accuracy - SCRFD optimized for full-frame detection
  • Better small face detection - Not constrained by person crop
  • Faster - Single detection pass instead of per-person crops
  • IoU matching - Links faces to tracked persons by spatial overlap

Quality Filtering

Not all detected faces are suitable for recognition. The system filters by:

FilterThresholdPurpose
Min Face Size50 pixelsEnsure face detail
Quality Threshold0.5Detection confidence

Configuration

SettingDescriptionDefault
Model NameInsightFace modelbuffalo_l
Detection SizeInput size for detector640x640
GPU SupportEnable CUDAOff
Min Face SizeMinimum face pixels50
Quality ThresholdMinimum detection confidence0.5

Model Options

ModelDetection SpeedAccuracySize
buffalo_lMediumHighest~400MB
buffalo_mFastHigh~200MB
buffalo_sFastestGood~100MB

Detection Size

SizeSpeedSmall Face Detection
320x320FastPoor
640x640MediumGood (default)
1280x1280SlowExcellent

IoU Matching

After person tracking and face detection, IoU (Intersection over Union) matching links faces to tracked persons.

Matching Process

  1. For each tracked person bounding box
  2. Calculate IoU with each detected face bounding box
  3. Assign face with highest IoU to person
  4. Link face identity to person track ID

Benefits

  • Stable identity - Face identity linked to tracked person
  • Identity caching - Last known identity preserved per track
  • Handles occlusions - Identity survives temporary face loss

Face Object Properties

Each detected face object contains:

PropertyTypeDescription
Bounding BoxArray[x1, y1, x2, y2] coordinates
Detection ScoreFloatDetection confidence (0-1)
EmbeddingArray512-d face embedding
Landmarks (5)Array5-point facial landmarks

Landmarks

5-point landmarks include:

  • Left eye center
  • Right eye center
  • Nose tip
  • Left mouth corner
  • Right mouth corner

GPU Acceleration

Enable GPU for faster detection on both YOLOv8 and InsightFace.

Performance

HardwareYOLOv8InsightFaceCombined
CPU (i7)~50ms~100ms~150ms per frame
GPU (RTX 3060)~10ms~20ms~30ms per frame
GPU (RTX 4090)~3ms~5ms~8ms per frame

Visualization

Box Colors

StatusColor
Known personGreen
UnknownRed

Display Elements

  • Person bounding box (YOLOv8)
  • Track ID (e.g., "#1")
  • Person name
  • Confidence score

Best Practices

Image Quality

  • Minimum resolution: 640x480
  • Face should be at least 50 pixels
  • Even lighting preferred
  • Avoid heavy shadows

Multiple People

  • System handles multiple people per frame
  • Each person tracked independently
  • Each face matched to nearest person by IoU

Edge Cases

  • Profile faces may not detect reliably
  • Occluded faces (masks) may have lower scores
  • Very small faces filtered out
  • Person without visible face still tracked (identity cached)

On this page

Layer 2-4: Detection & Tracking