System Architecture

The VisionLog - AI Attendance module is built on a modular architecture designed for efficient face recognition and attendance tracking. The system uses a dual-stage pipeline: YOLOv8 for person detection and tracking, combined with InsightFace (buffalo_l model) for face detection and recognition.

System Architecture

Architecture Overview

The system follows a 6-layer processing pipeline:

Layer	Component	Technology
1	Input	OpenCV, RTSP
2	Person Detection	YOLOv8
3	Tracking	ByteTrack
4	Face Detection + Recognition	InsightFace (SCRFD + ArcFace)
5	Attendance	Identity caching, logging
6	Storage	Pickle database, CSV logs

Core Components

Person Detection & Tracking (YOLOv8 + ByteTrack)

The first stage uses YOLOv8 for robust person detection with ByteTrack for persistent tracking across frames.

Key Features:

Feature	Description
Person Detection	YOLOv8 detects all people in frame with bounding boxes
Persistent Tracking	ByteTrack assigns stable IDs across frames
Motion Prediction	Handles occlusions and re-entries
Configurable Tracker	Support for ByteTrack or BoT-SORT

Tracking Configuration:

Setting	Default	Description
YOLO Model	yolov8n.pt	Nano model (fast), also supports yolov8s, yolov8m
Tracker	bytetrack.yaml	ByteTrack or botsort.yaml
Confidence	0.8	Person detection confidence threshold
Track Mode	all	Track all persons or known_only

Face Engine (InsightFace)

The heart of the system - handles face detection and recognition operations.

Key Functions:

Function	Description
Face Detection	SCRFD detects all faces in full frame
Face Recognition	ArcFace extracts 512-d embeddings and matches against database
IoU Matching	Links detected faces to tracked persons by spatial overlap
Identity Caching	Preserves last known identity per track ID

Enrollment System

Folder-based batch enrollment for adding faces to the database.

Feature	Description
Batch Processing	Enroll multiple persons at once
Folder Structure	One folder per person with multiple images
Automatic Averaging	Multiple images averaged for robustness

Recognition System

Real-time face recognition from multiple input sources.

Input Modes:

Camera feed (live recognition)
Single image file
Video file processing
RTSP streams (IP cameras)

Features:

FPS counter display
Screenshot capture
Database reload
Confidence display toggle

Video Processor

Batch processing of CCTV recordings, video files, and RTSP streams.

Capabilities:

Frame skipping for performance (configurable)
YOLOv8 person tracking with persistent IDs
Annotated video output with bounding boxes
CSV logging with frame, timestamp, track ID, name, confidence
Session summary with unique tracks and identified persons

Zero-Lag RTSP Engine:

Multi-threaded frame reading (always processes latest frame)
Auto-reconnect on connection loss (5-second retry)
Designed for 24/7 continuous monitoring

Attendance Tracker

Real-time attendance logging with duplicate prevention.

Features:

Identity caching per track ID
Duplicate prevention mechanism
CSV export with timestamps
Filter by date, person, or camera

Processing Pipeline

1. Person Detection (YOLOv8)

YOLOv8 detects all people in the frame, returning bounding boxes with confidence scores.

2. Tracking (ByteTrack)

ByteTrack assigns persistent track IDs to each person, maintaining identity across frames even through occlusions.

3. Face Detection (SCRFD)

InsightFace SCRFD detects faces on the full frame for maximum accuracy.

Quality Filters:

Minimum face size: 50 pixels
Detection confidence threshold: 0.5

Compare query embedding against all enrolled embeddings
Find the highest similarity score
Accept match if score >= threshold (default: 0.35)
Cache successful identities per track ID

Special Handling:

Names ending with _Twins use stricter 0.7 threshold

Configuration

Centralized settings for the entire system:

Setting	Description	Default
Model Name	InsightFace model	buffalo_l
Detection Size	Input resolution	640x640
GPU Support	Enable CUDA acceleration	Off
Similarity Threshold	Match acceptance threshold	0.35
Min Face Size	Minimum detectable face	50 pixels
Quality Threshold	Detection confidence minimum	0.5
Frame Skip	Video processing optimization	1 (all frames)
YOLO Model	Person detection model	yolov8n.pt
YOLO Tracker	Tracking algorithm	bytetrack.yaml
YOLO Confidence	Person detection threshold	0.8

Storage

Component	Purpose
embeddings.pkl	Face embeddings database (Pickle format)
CSV Logs	Detection logs with frame, timestamp, track ID, name
Source Images	Original images organized by person
Output	Processed video files

Technology Summary

Layer	Technology
Input Sources	Camera, Video Files, Images, RTSP Streams
Person Detection	YOLOv8 (Ultralytics)
Person Tracking	ByteTrack / BoT-SORT
Face Detection	InsightFace SCRFD
Face Recognition	InsightFace ArcFace (buffalo_l)
Matching	Cosine Similarity + IoU
Database	Pickle (embeddings.pkl)
Logging	CSV format

System Architecture

Architecture Overview

Core Components

Person Detection & Tracking (YOLOv8 + ByteTrack)

Face Engine (InsightFace)

Enrollment System

Recognition System

Video Processor

Attendance Tracker

Processing Pipeline

1. Person Detection (YOLOv8)

2. Tracking (ByteTrack)

3. Face Detection (SCRFD)

4. Face Recognition (ArcFace)

5. IoU Matching

6. Identity Assignment

Configuration

Storage

Technology Summary

On this page