LogoVisionLog

Video Processing

Batch processing CCTV recordings, video files, and RTSP streams for face recognition

The video processor enables batch processing of CCTV recordings, video files, and RTSP streams for face detection, recognition, and attendance tracking with YOLOv8 person tracking.

Features

  • Process single video or folder of videos
  • YOLOv8 person detection with ByteTrack tracking
  • Persistent track IDs across frames
  • Frame skipping for performance optimization
  • Save annotated output video with bounding boxes
  • CSV logging with frame, timestamp, track ID, name, confidence
  • Session summary with unique tracks and identified persons
  • Zero-lag RTSP engine for IP cameras

Supported Formats

  • MP4
  • AVI
  • MKV
  • MOV
  • WMV
  • RTSP streams (IP cameras)

Technical Workflow Diagram

The following diagram illustrates the multi-stage pipeline used for batch video processing and RTSP streaming:

Video Processing Logic

Processing Pipeline

For each frame, the video processor runs:

StepComponentDescription
1YOLOv8Detect all people in frame
2ByteTrackAssign/maintain track IDs
3InsightFace SCRFDDetect faces in full frame
4InsightFace ArcFaceExtract embeddings, match identity
5IoU MatchingLink faces to tracked persons
6Identity CachePreserve last known identity
7LoggingWrite to CSV and annotate frame

Processing Options

Frame Skipping

Skip frames for faster processing.

Skip ValueUse Case
1Maximum accuracy (default)
5Balanced
10Fast processing
30Quick scan

Tracking Modes

ModeDescription
allTrack and display all detected persons (default)
known_onlyOnly show persons with recognized faces

RTSP Zero-Lag Engine

For IP camera streams, the system uses a multi-threaded zero-lag engine:

Features:

  • Multi-threaded reading - Frames decoded on background thread
  • Always latest frame - No buffering, always processes most recent frame
  • Auto-reconnect - Detects connection drops, retries every 5 seconds
  • 24/7 monitoring - Designed for continuous operation

How it works:

  1. Daemon thread continuously reads frames from RTSP stream
  2. Main thread always gets the latest frame (not buffered)
  3. Prevents lag accumulation common with standard video capture
  4. Handles network instability without crashing

Output Files

Annotated Video

When saving video output, creates video with:

  • Person bounding boxes (from YOLOv8)
  • Track IDs (e.g., "#1", "#2")
  • Person names and confidence scores
  • Timestamp overlay
  • Progress percentage

CSV Log

Automatically generated log containing:

ColumnDescription
FrameFrame number
TimestampTime in video (HH:MM:SS)
Track_IDPersistent person tracking ID
NameRecognized person name or "Unknown"
ConfidenceRecognition confidence (0-1)
X1, Y1, X2, Y2Person bounding box coordinates
Detection_ScoreYOLOv8 detection confidence

Example CSV:

Frame,Timestamp,Track_ID,Name,Confidence,X1,Y1,X2,Y2,Detection_Score
1,00:00:00,1,John,0.8923,100,50,200,150,0.95
2,00:00:00,1,John,0.8920,102,52,202,152,0.94
3,00:00:01,2,Unknown,0.0000,300,60,400,160,0.91

Session Summary

After processing completes, displays:

============================================================
PROCESSING COMPLETE
============================================================
Frames: 1500/1500
Unique Tracks: 12
Identified Persons: 8
Names: Bob, Jane, John, Mike, Sarah

Preview Controls

During processing (when preview is enabled):

KeyAction
ESC / QStop processing and show summary

Identity Caching

The system maintains an identity cache per track ID:

  • Purpose: Preserve identity when face temporarily not visible
  • Behavior: Only updates on confirmed (non-Unknown) matches
  • Benefit: Stable identity even when person turns away briefly

Configuration

SettingDescriptionDefault
Frame SkipProcess every Nth frame1
Output DirectoryAnnotated video locationoutput/
YOLO ModelPerson detection modelyolov8n.pt
YOLO TrackerTracking algorithmbytetrack.yaml
YOLO ConfidencePerson detection threshold0.8
Track Modeall or known_onlyall
Show Track IDDisplay track IDsEnabled

Performance Tips

Processing Speed

FactorImpact
Frame skip valueHigher = faster
Video resolutionLower = faster
GPU enabledSignificant speedup
Preview disabledSlight speedup
YOLO modelyolov8n (fastest) vs yolov8m (slower)

Estimated Processing Time

For a 1-hour video at 30 FPS (108,000 frames):

SkipFramesApprox. Time (CPU)
1108,000~60 minutes
521,600~12 minutes
1010,800~6 minutes
303,600~2 minutes

Actual times vary by hardware and video content.

Use Cases

CCTV Recording Analysis

Process daily CCTV recordings with all output options enabled.

Real-time IP Camera Monitoring

Use RTSP mode with zero-lag engine for live monitoring:

  • Auto-reconnect handles network issues
  • Always processes latest frame
  • Session summary on exit

Find if a specific person appears in video by using high frame skip value and checking the CSV log.

Building Attendance from Video

  1. Process video to identify all persons
  2. Check CSV for track IDs and first-seen times
  3. Import CSV to attendance system

Troubleshooting

Video won't open

  • Check file path is correct
  • Verify video file isn't corrupted
  • Install video codecs if needed

RTSP stream disconnects

  • System auto-reconnects every 5 seconds
  • Check network stability
  • Verify RTSP URL is correct

Out of memory

  • Increase frame skip value
  • Process shorter video segments
  • Enable GPU to offload processing

Slow processing

  • Use headless mode (no preview)
  • Increase frame skip value
  • Enable GPU if available
  • Process on faster storage (SSD)
  • Use yolov8n.pt instead of larger models

Track IDs changing unexpectedly

  • Adjust YOLO confidence threshold
  • Use botsort.yaml tracker for different motion model
  • Ensure good lighting conditions

On this page