AI Attendance Architecture Layers
Detailed documentation for each layer of the system architecture
VisionLog uses a 6-layer architecture for face recognition attendance tracking, combining YOLOv8 person tracking with InsightFace face recognition.
Layer Overview
| Layer | Purpose | Technology |
|---|---|---|
| 1. Input | Video/image capture | OpenCV, RTSP |
| 2. Person Detection | Detect people in frame | YOLOv8 |
| 3. Tracking | Persistent IDs across frames | ByteTrack |
| 4. Face Detection | Find faces in frame | InsightFace SCRFD |
| 5. Face Recognition | Identify person | InsightFace ArcFace |
| 6. Attendance | Log with identity caching | CSV, Pickle |
Data Flow
| Step | Layer | Action |
|---|---|---|
| 1 | Input | Capture video frame from camera/video/RTSP |
| 2 | Person Detection | YOLOv8 detects all people in frame |
| 3 | Tracking | ByteTrack assigns persistent track IDs |
| 4 | Face Detection | InsightFace SCRFD locates faces in full frame |
| 5 | Face Recognition | ArcFace extracts embeddings, matches identity |
| 6 | IoU Matching | Links faces to tracked persons by overlap |
| 7 | Attendance | Records entry with track ID and timestamp |
Layer Details
1. Input Layer
- Webcam capture
- Video file processing (MP4, AVI, MKV)
- RTSP stream processing (IP cameras)
- Zero-lag multi-threaded reading for RTSP
- Folder-based enrollment images
2. Person Detection Layer
- YOLOv8 object detection (Ultralytics)
- Detects class 0 (person) only
- Returns bounding boxes with confidence scores
- Configurable model: yolov8n, yolov8s, yolov8m
- Confidence threshold: 0.8 (default)
3. Tracking Layer
- ByteTrack multi-object tracking algorithm
- Persistent track IDs maintained across frames
- Motion prediction for occlusion handling
- Alternative: BoT-SORT tracker
- Track mode: all or known_only
4. Face Detection Layer
- InsightFace SCRFD model
- Full-frame face detection (not cropped to person boxes)
- Returns bounding boxes, landmarks, and confidence scores
- Quality filtering: min 50 pixels, confidence >= 0.5
5. Face Recognition Layer
- ArcFace face embedding extraction (512-d vectors)
- Cosine similarity matching against database
- Configurable match threshold (default: 0.35)
- Special _Twins handling (0.7 threshold)
6. Attendance Layer
- IoU matching links faces to tracked persons
- Identity caching per track ID
- Records: frame, timestamp, track ID, name, confidence, bbox
- CSV logging with session summary
- Duplicate prevention mechanism
.png)