System Architecture
Understanding the modular architecture of VisionLog - AI Attendance
The VisionLog - AI Attendance module is built on a modular architecture designed for efficient face recognition and attendance tracking. The system uses a dual-stage pipeline: YOLOv8 for person detection and tracking, combined with InsightFace (buffalo_l model) for face detection and recognition.
Architecture Overview
The system follows a 6-layer processing pipeline:
| Layer | Component | Technology |
|---|---|---|
| 1 | Input | OpenCV, RTSP |
| 2 | Person Detection | YOLOv8 |
| 3 | Tracking | ByteTrack |
| 4 | Face Detection + Recognition | InsightFace (SCRFD + ArcFace) |
| 5 | Attendance | Identity caching, logging |
| 6 | Storage | Pickle database, CSV logs |
Core Components
Person Detection & Tracking (YOLOv8 + ByteTrack)
The first stage uses YOLOv8 for robust person detection with ByteTrack for persistent tracking across frames.
Key Features:
| Feature | Description |
|---|---|
| Person Detection | YOLOv8 detects all people in frame with bounding boxes |
| Persistent Tracking | ByteTrack assigns stable IDs across frames |
| Motion Prediction | Handles occlusions and re-entries |
| Configurable Tracker | Support for ByteTrack or BoT-SORT |
Tracking Configuration:
| Setting | Default | Description |
|---|---|---|
| YOLO Model | yolov8n.pt | Nano model (fast), also supports yolov8s, yolov8m |
| Tracker | bytetrack.yaml | ByteTrack or botsort.yaml |
| Confidence | 0.8 | Person detection confidence threshold |
| Track Mode | all | Track all persons or known_only |
Face Engine (InsightFace)
The heart of the system - handles face detection and recognition operations.
Key Functions:
| Function | Description |
|---|---|
| Face Detection | SCRFD detects all faces in full frame |
| Face Recognition | ArcFace extracts 512-d embeddings and matches against database |
| IoU Matching | Links detected faces to tracked persons by spatial overlap |
| Identity Caching | Preserves last known identity per track ID |
Enrollment System
Folder-based batch enrollment for adding faces to the database.
| Feature | Description |
|---|---|
| Batch Processing | Enroll multiple persons at once |
| Folder Structure | One folder per person with multiple images |
| Automatic Averaging | Multiple images averaged for robustness |
Recognition System
Real-time face recognition from multiple input sources.
Input Modes:
- Camera feed (live recognition)
- Single image file
- Video file processing
- RTSP streams (IP cameras)
Features:
- FPS counter display
- Screenshot capture
- Database reload
- Confidence display toggle
Video Processor
Batch processing of CCTV recordings, video files, and RTSP streams.
Capabilities:
- Frame skipping for performance (configurable)
- YOLOv8 person tracking with persistent IDs
- Annotated video output with bounding boxes
- CSV logging with frame, timestamp, track ID, name, confidence
- Session summary with unique tracks and identified persons
Zero-Lag RTSP Engine:
- Multi-threaded frame reading (always processes latest frame)
- Auto-reconnect on connection loss (5-second retry)
- Designed for 24/7 continuous monitoring
Attendance Tracker
Real-time attendance logging with duplicate prevention.
Features:
- Identity caching per track ID
- Duplicate prevention mechanism
- CSV export with timestamps
- Filter by date, person, or camera
Processing Pipeline
1. Person Detection (YOLOv8)
YOLOv8 detects all people in the frame, returning bounding boxes with confidence scores.
2. Tracking (ByteTrack)
ByteTrack assigns persistent track IDs to each person, maintaining identity across frames even through occlusions.
3. Face Detection (SCRFD)
InsightFace SCRFD detects faces on the full frame for maximum accuracy.
Quality Filters:
- Minimum face size: 50 pixels
- Detection confidence threshold: 0.5
4. Face Recognition (ArcFace)
Each detected face is converted to a 512-dimensional embedding vector using the ArcFace model.
5. IoU Matching
Faces are matched to tracked persons using Intersection over Union (IoU) of bounding boxes.
6. Identity Assignment
Recognition uses cosine similarity between embeddings.
Matching Process:
- Compare query embedding against all enrolled embeddings
- Find the highest similarity score
- Accept match if score >= threshold (default: 0.35)
- Cache successful identities per track ID
Special Handling:
- Names ending with
_Twinsuse stricter 0.7 threshold
Configuration
Centralized settings for the entire system:
| Setting | Description | Default |
|---|---|---|
| Model Name | InsightFace model | buffalo_l |
| Detection Size | Input resolution | 640x640 |
| GPU Support | Enable CUDA acceleration | Off |
| Similarity Threshold | Match acceptance threshold | 0.35 |
| Min Face Size | Minimum detectable face | 50 pixels |
| Quality Threshold | Detection confidence minimum | 0.5 |
| Frame Skip | Video processing optimization | 1 (all frames) |
| YOLO Model | Person detection model | yolov8n.pt |
| YOLO Tracker | Tracking algorithm | bytetrack.yaml |
| YOLO Confidence | Person detection threshold | 0.8 |
Storage
| Component | Purpose |
|---|---|
| embeddings.pkl | Face embeddings database (Pickle format) |
| CSV Logs | Detection logs with frame, timestamp, track ID, name |
| Source Images | Original images organized by person |
| Output | Processed video files |
Technology Summary
| Layer | Technology |
|---|---|
| Input Sources | Camera, Video Files, Images, RTSP Streams |
| Person Detection | YOLOv8 (Ultralytics) |
| Person Tracking | ByteTrack / BoT-SORT |
| Face Detection | InsightFace SCRFD |
| Face Recognition | InsightFace ArcFace (buffalo_l) |
| Matching | Cosine Similarity + IoU |
| Database | Pickle (embeddings.pkl) |
| Logging | CSV format |
.png)