Real-time Person Tracking via CCTV
Technical workflow for real-time video file analysis and person identification.
The Identification Scenario is a high-performance video analysis module designed to simulate real-time surveillance processing. It unifies person tracking with facial recognition to maintain persistent identities even when faces are temporarily obscured.
Core AI Components
| Layer | Model | Description |
|---|---|---|
| Object Detection | YOLOv8 (Nano) | High-speed person detection optimized for real-time batch throughput. |
| Tracking | ByteTrack | Maintains persistent "Track IDs" by analyzing motion patterns between frames. |
| Face Detection | InsightFace SCRFD | High-precision face localization within full video frames. |
| Recognition | ArcFace | Generates 512-D embeddings to match faces against the known database. |
Technical Logic & Heuristics
1. Identity Caching
To prevent ID flickering, the system uses an Identity Cache. When a person is tracked but their face is not visible (e.g., turned away), the system "remembers" their last known identity based on their Track ID.
2. Automated Cropping Logic
For the frontend "Identified Feed," the system generates real-time thumbnails. The cropping logic uses a hierarchy of precision:
- Precise Face Crop: If a face is actively detected in the current frame, the system crops exactly to the face bounding box.
- Heuristic Head Crop: If a person is tracked but no face is found (stale identity), the system applies a 25% Top-Vertical Heuristic. It calculates the top 25% of the body bounding box to center the "head" in the thumbnail.
- Safety Constraints: All crops are clamped within the frame dimensions and base64 encoded as low-quality JPEGs to minimize SSE payload size.
3. Real-Time Communication
The backend utilizes Server-Sent Events (SSE) via FastAPI's StreamingResponse.
- Metadata Event: Sent once on start (Total frames, FPS).
- Frame Event: Multi-part JSON containing base64 frame, current detections, unique track counts, and progression %.
- Completion Event: Final summary of all identified individuals and forensic logs.
Performance Thresholds
- Similarity Threshold: Default
0.6for High Precision. - Frame Skipping: Configurable (Default: 2) to balance between CPU utilization and identification accuracy.
- Parallelism: Utilizes a
ThreadPoolExecutor(4 workers) for concurrent facial embedding generation.
.png)