AI Attendance Architecture Layers

Detailed documentation for each layer of the system architecture

VisionLog uses a 6-layer architecture for face recognition attendance tracking, combining YOLOv8 person tracking with InsightFace face recognition.

Layer Overview

Layer	Purpose	Technology
1. Input	Video/image capture	OpenCV, RTSP
2. Person Detection	Detect people in frame	YOLOv8
3. Tracking	Persistent IDs across frames	ByteTrack
4. Face Detection	Find faces in frame	InsightFace SCRFD
5. Face Recognition	Identify person	InsightFace ArcFace
6. Attendance	Log with identity caching	CSV, Pickle

Data Flow

Step	Layer	Action
1	Input	Capture video frame from camera/video/RTSP
2	Person Detection	YOLOv8 detects all people in frame
3	Tracking	ByteTrack assigns persistent track IDs
4	Face Detection	InsightFace SCRFD locates faces in full frame
5	Face Recognition	ArcFace extracts embeddings, matches identity
6	IoU Matching	Links faces to tracked persons by overlap
7	Attendance	Records entry with track ID and timestamp

Layer Details

1. Input Layer

Webcam capture
Video file processing (MP4, AVI, MKV)
RTSP stream processing (IP cameras)
Zero-lag multi-threaded reading for RTSP
Folder-based enrollment images

2. Person Detection Layer

YOLOv8 object detection (Ultralytics)
Detects class 0 (person) only
Returns bounding boxes with confidence scores
Configurable model: yolov8n, yolov8s, yolov8m
Confidence threshold: 0.8 (default)

3. Tracking Layer

ByteTrack multi-object tracking algorithm
Persistent track IDs maintained across frames
Motion prediction for occlusion handling
Alternative: BoT-SORT tracker
Track mode: all or known_only

4. Face Detection Layer

InsightFace SCRFD model
Full-frame face detection (not cropped to person boxes)
Returns bounding boxes, landmarks, and confidence scores
Quality filtering: min 50 pixels, confidence >= 0.5

5. Face Recognition Layer

ArcFace face embedding extraction (512-d vectors)
Cosine similarity matching against database
Configurable match threshold (default: 0.35)
Special _Twins handling (0.7 threshold)

6. Attendance Layer

IoU matching links faces to tracked persons
Identity caching per track ID
Records: frame, timestamp, track ID, name, confidence, bbox
CSV logging with session summary
Duplicate prevention mechanism

Layer 1: Input

Video capture from cameras, webcams, and video files

On this page

Layer Overview Data Flow Layer Details 1. Input Layer 2. Person Detection Layer 3. Tracking Layer 4. Face Detection Layer 5. Face Recognition Layer 6. Attendance Layer