Processing Pipeline

Step-by-step processing flow from enrollment to attendance logging in VisionLog - AI Attendance

The VisionLog - AI Attendance module follows a dual-stage processing pipeline using YOLOv8 for person detection and tracking, combined with InsightFace for face detection and recognition.

Pipeline Overview

Step	Stage	Technology	Description
1	Enrollment	InsightFace	Create face database from organized folders
2	Video Input	OpenCV	Capture from camera, video file, or RTSP stream
3	Person Detection	YOLOv8	Detect all people in frame
4	Tracking	ByteTrack	Assign persistent IDs across frames
5	Face Detection	InsightFace SCRFD	Locate faces in full frame
6	Face Recognition	InsightFace ArcFace	Extract 512-d embeddings, match identity
7	IoU Matching	NumPy	Link faces to tracked persons
8	Attendance	CSV/Pickle	Log with identity caching

Step-by-Step Details

Step 1: Face Enrollment

Create the face database using folder-based batch enrollment.

Folder Structure:

One folder per person
Folder name = person identifier
Multiple images per folder (3-5 recommended)

Enrollment Process:

Scan enrollment folder for person subfolders
Load all images from each person's folder
Detect face in each image (SCRFD)
Extract 512-d embedding (ArcFace)
Average embeddings for robustness
Save to embeddings.pkl database

Step 2: Video Input

Capture video or image input for processing.

Input Type	Description
Webcam	Live camera feed for real-time recognition
Video File	Batch process recorded video (MP4, AVI, MKV)
RTSP Stream	IP camera feed with zero-lag engine
Image	Single image recognition

Step 3: Person Detection

Component: YOLOv8

Detect all people in the frame before face processing.

Detection Output:

Property	Description
Bounding Box	Person location [x1, y1, x2, y2]
Confidence	Detection confidence (threshold: 0.8)
Class	Person (class 0)

Why Person Detection First?

YOLOv8 is fast and robust for detecting people at various distances
Enables persistent tracking across frames
Provides stable IDs even when faces aren't visible

Step 4: Tracking

Component: ByteTrack

Assign persistent track IDs to each detected person.

Tracking Features:

Persistent IDs maintained across frames
Motion prediction handles occlusions
Re-identification when person re-enters frame
Stable IDs even during partial occlusions

Setting	Default	Description
Tracker	bytetrack.yaml	Tracking algorithm config
Alternative	botsort.yaml	BoT-SORT tracker option

Step 5: Face Detection

Component: InsightFace SCRFD

Detect all faces in the full frame (not cropped to person boxes).

Detection Output:

Property	Description
Bounding Box	Face location coordinates
Detection Score	Confidence level (0-1)
Landmarks	5-point facial landmarks

Quality Filters:

Minimum face size: 50 pixels
Detection confidence: >= 0.5

Step 6: Face Recognition

Component: InsightFace ArcFace

Extract 512-dimensional face embedding for each detected face.

Embedding Properties:

Dimension: 512
Normalized: Unit vector
Model: ArcFace (buffalo_l)

Step 7: IoU Matching

Component: Intersection over Union

Match detected faces to tracked persons based on spatial overlap.

Matching Process:

For each tracked person bounding box
Calculate IoU with each detected face bounding box
Assign face with highest IoU (if > threshold)
Link face identity to person track ID

Step 8: Identity Assignment & Attendance

Compare query embedding against enrolled database and log attendance.

Matching Process:

Compare query embedding against all enrolled embeddings
Find the highest similarity score
Accept match if score >= threshold

Threshold Guidelines:

Threshold	Security	Use Case
0.35	Default	Balanced (current default)
0.45	Medium	Stricter matching
0.55	High	Fewer false positives
0.70	Very High	Twin detection (_Twins suffix)

Identity Caching:

Last known identity stored per track ID
Survives temporary face occlusions
Only updates on confirmed (non-Unknown) matches

Attendance Logging:

Frame number
Timestamp
Track ID
Person name
Confidence score
Bounding box coordinates

Configuration

All settings are centralized for easy customization:

Setting	Description	Default
Model Name	InsightFace model	buffalo_l
Detection Size	Input resolution	640x640
GPU Support	CUDA acceleration	Off
Similarity Threshold	Match acceptance	0.35
Min Face Size	Minimum detectable	50 pixels
Quality Threshold	Confidence minimum	0.5
Frame Skip	Video optimization	1 (all frames)
YOLO Model	Person detection	yolov8n.pt
YOLO Tracker	Tracking algorithm	bytetrack.yaml
YOLO Confidence	Person threshold	0.8

Data Flow Summary

Stage	Input	Output
Enrollment	Person image folders	embeddings.pkl
Input	Camera/Video/RTSP	Video frames
Person Detection	Frame	Person bounding boxes
Tracking	Person boxes	Track IDs
Face Detection	Frame	Face locations
Face Recognition	Face crop	512-d embedding
IoU Matching	Person + Face boxes	Face-Person links
Attendance	Identity + Track ID	CSV log record

Processing Pipeline

On this page