LogoVisionLog

Real-time Person Tracking via CCTV

Technical workflow for real-time video file analysis and person identification.

The Identification Scenario is a high-performance video analysis module designed to simulate real-time surveillance processing. It unifies person tracking with facial recognition to maintain persistent identities even when faces are temporarily obscured.

Core AI Components

LayerModelDescription
Object DetectionYOLOv8 (Nano)High-speed person detection optimized for real-time batch throughput.
TrackingByteTrackMaintains persistent "Track IDs" by analyzing motion patterns between frames.
Face DetectionInsightFace SCRFDHigh-precision face localization within full video frames.
RecognitionArcFaceGenerates 512-D embeddings to match faces against the known database.

Technical Logic & Heuristics

1. Identity Caching

To prevent ID flickering, the system uses an Identity Cache. When a person is tracked but their face is not visible (e.g., turned away), the system "remembers" their last known identity based on their Track ID.

2. Automated Cropping Logic

For the frontend "Identified Feed," the system generates real-time thumbnails. The cropping logic uses a hierarchy of precision:

  • Precise Face Crop: If a face is actively detected in the current frame, the system crops exactly to the face bounding box.
  • Heuristic Head Crop: If a person is tracked but no face is found (stale identity), the system applies a 25% Top-Vertical Heuristic. It calculates the top 25% of the body bounding box to center the "head" in the thumbnail.
  • Safety Constraints: All crops are clamped within the frame dimensions and base64 encoded as low-quality JPEGs to minimize SSE payload size.

3. Real-Time Communication

The backend utilizes Server-Sent Events (SSE) via FastAPI's StreamingResponse.

  • Metadata Event: Sent once on start (Total frames, FPS).
  • Frame Event: Multi-part JSON containing base64 frame, current detections, unique track counts, and progression %.
  • Completion Event: Final summary of all identified individuals and forensic logs.

Performance Thresholds

  • Similarity Threshold: Default 0.6 for High Precision.
  • Frame Skipping: Configurable (Default: 2) to balance between CPU utilization and identification accuracy.
  • Parallelism: Utilizes a ThreadPoolExecutor (4 workers) for concurrent facial embedding generation.

On this page