Processing Pipeline
Step-by-step processing flow from enrollment to attendance logging in VisionLog - AI Attendance
The VisionLog - AI Attendance module follows a dual-stage processing pipeline using YOLOv8 for person detection and tracking, combined with InsightFace for face detection and recognition.
Pipeline Overview
| Step | Stage | Technology | Description |
|---|---|---|---|
| 1 | Enrollment | InsightFace | Create face database from organized folders |
| 2 | Video Input | OpenCV | Capture from camera, video file, or RTSP stream |
| 3 | Person Detection | YOLOv8 | Detect all people in frame |
| 4 | Tracking | ByteTrack | Assign persistent IDs across frames |
| 5 | Face Detection | InsightFace SCRFD | Locate faces in full frame |
| 6 | Face Recognition | InsightFace ArcFace | Extract 512-d embeddings, match identity |
| 7 | IoU Matching | NumPy | Link faces to tracked persons |
| 8 | Attendance | CSV/Pickle | Log with identity caching |
Step-by-Step Details
Step 1: Face Enrollment
Create the face database using folder-based batch enrollment.
Folder Structure:
- One folder per person
- Folder name = person identifier
- Multiple images per folder (3-5 recommended)
Enrollment Process:
- Scan enrollment folder for person subfolders
- Load all images from each person's folder
- Detect face in each image (SCRFD)
- Extract 512-d embedding (ArcFace)
- Average embeddings for robustness
- Save to embeddings.pkl database
Step 2: Video Input
Capture video or image input for processing.
| Input Type | Description |
|---|---|
| Webcam | Live camera feed for real-time recognition |
| Video File | Batch process recorded video (MP4, AVI, MKV) |
| RTSP Stream | IP camera feed with zero-lag engine |
| Image | Single image recognition |
Step 3: Person Detection
Component: YOLOv8
Detect all people in the frame before face processing.
Detection Output:
| Property | Description |
|---|---|
| Bounding Box | Person location [x1, y1, x2, y2] |
| Confidence | Detection confidence (threshold: 0.8) |
| Class | Person (class 0) |
Why Person Detection First?
- YOLOv8 is fast and robust for detecting people at various distances
- Enables persistent tracking across frames
- Provides stable IDs even when faces aren't visible
Step 4: Tracking
Component: ByteTrack
Assign persistent track IDs to each detected person.
Tracking Features:
- Persistent IDs maintained across frames
- Motion prediction handles occlusions
- Re-identification when person re-enters frame
- Stable IDs even during partial occlusions
| Setting | Default | Description |
|---|---|---|
| Tracker | bytetrack.yaml | Tracking algorithm config |
| Alternative | botsort.yaml | BoT-SORT tracker option |
Step 5: Face Detection
Component: InsightFace SCRFD
Detect all faces in the full frame (not cropped to person boxes).
Detection Output:
| Property | Description |
|---|---|
| Bounding Box | Face location coordinates |
| Detection Score | Confidence level (0-1) |
| Landmarks | 5-point facial landmarks |
Quality Filters:
- Minimum face size: 50 pixels
- Detection confidence: >= 0.5
Step 6: Face Recognition
Component: InsightFace ArcFace
Extract 512-dimensional face embedding for each detected face.
Embedding Properties:
- Dimension: 512
- Normalized: Unit vector
- Model: ArcFace (buffalo_l)
Step 7: IoU Matching
Component: Intersection over Union
Match detected faces to tracked persons based on spatial overlap.
Matching Process:
- For each tracked person bounding box
- Calculate IoU with each detected face bounding box
- Assign face with highest IoU (if > threshold)
- Link face identity to person track ID
Step 8: Identity Assignment & Attendance
Compare query embedding against enrolled database and log attendance.
Matching Process:
- Compare query embedding against all enrolled embeddings
- Find the highest similarity score
- Accept match if score >= threshold
Threshold Guidelines:
| Threshold | Security | Use Case |
|---|---|---|
| 0.35 | Default | Balanced (current default) |
| 0.45 | Medium | Stricter matching |
| 0.55 | High | Fewer false positives |
| 0.70 | Very High | Twin detection (_Twins suffix) |
Identity Caching:
- Last known identity stored per track ID
- Survives temporary face occlusions
- Only updates on confirmed (non-Unknown) matches
Attendance Logging:
- Frame number
- Timestamp
- Track ID
- Person name
- Confidence score
- Bounding box coordinates
Configuration
All settings are centralized for easy customization:
| Setting | Description | Default |
|---|---|---|
| Model Name | InsightFace model | buffalo_l |
| Detection Size | Input resolution | 640x640 |
| GPU Support | CUDA acceleration | Off |
| Similarity Threshold | Match acceptance | 0.35 |
| Min Face Size | Minimum detectable | 50 pixels |
| Quality Threshold | Confidence minimum | 0.5 |
| Frame Skip | Video optimization | 1 (all frames) |
| YOLO Model | Person detection | yolov8n.pt |
| YOLO Tracker | Tracking algorithm | bytetrack.yaml |
| YOLO Confidence | Person threshold | 0.8 |
Data Flow Summary
| Stage | Input | Output |
|---|---|---|
| Enrollment | Person image folders | embeddings.pkl |
| Input | Camera/Video/RTSP | Video frames |
| Person Detection | Frame | Person bounding boxes |
| Tracking | Person boxes | Track IDs |
| Face Detection | Frame | Face locations |
| Face Recognition | Face crop | 512-d embedding |
| IoU Matching | Person + Face boxes | Face-Person links |
| Attendance | Identity + Track ID | CSV log record |
.png)