LogoVisionLog

Processing Pipeline

Step-by-step processing flow from enrollment to attendance logging in VisionLog - AI Attendance

The VisionLog - AI Attendance module follows a dual-stage processing pipeline using YOLOv8 for person detection and tracking, combined with InsightFace for face detection and recognition.

Pipeline Overview

StepStageTechnologyDescription
1EnrollmentInsightFaceCreate face database from organized folders
2Video InputOpenCVCapture from camera, video file, or RTSP stream
3Person DetectionYOLOv8Detect all people in frame
4TrackingByteTrackAssign persistent IDs across frames
5Face DetectionInsightFace SCRFDLocate faces in full frame
6Face RecognitionInsightFace ArcFaceExtract 512-d embeddings, match identity
7IoU MatchingNumPyLink faces to tracked persons
8AttendanceCSV/PickleLog with identity caching

Step-by-Step Details

Step 1: Face Enrollment

Create the face database using folder-based batch enrollment.

Folder Structure:

  • One folder per person
  • Folder name = person identifier
  • Multiple images per folder (3-5 recommended)

Enrollment Process:

  1. Scan enrollment folder for person subfolders
  2. Load all images from each person's folder
  3. Detect face in each image (SCRFD)
  4. Extract 512-d embedding (ArcFace)
  5. Average embeddings for robustness
  6. Save to embeddings.pkl database

Step 2: Video Input

Capture video or image input for processing.

Input TypeDescription
WebcamLive camera feed for real-time recognition
Video FileBatch process recorded video (MP4, AVI, MKV)
RTSP StreamIP camera feed with zero-lag engine
ImageSingle image recognition

Step 3: Person Detection

Component: YOLOv8

Detect all people in the frame before face processing.

Detection Output:

PropertyDescription
Bounding BoxPerson location [x1, y1, x2, y2]
ConfidenceDetection confidence (threshold: 0.8)
ClassPerson (class 0)

Why Person Detection First?

  • YOLOv8 is fast and robust for detecting people at various distances
  • Enables persistent tracking across frames
  • Provides stable IDs even when faces aren't visible

Step 4: Tracking

Component: ByteTrack

Assign persistent track IDs to each detected person.

Tracking Features:

  • Persistent IDs maintained across frames
  • Motion prediction handles occlusions
  • Re-identification when person re-enters frame
  • Stable IDs even during partial occlusions
SettingDefaultDescription
Trackerbytetrack.yamlTracking algorithm config
Alternativebotsort.yamlBoT-SORT tracker option

Step 5: Face Detection

Component: InsightFace SCRFD

Detect all faces in the full frame (not cropped to person boxes).

Detection Output:

PropertyDescription
Bounding BoxFace location coordinates
Detection ScoreConfidence level (0-1)
Landmarks5-point facial landmarks

Quality Filters:

  • Minimum face size: 50 pixels
  • Detection confidence: >= 0.5

Step 6: Face Recognition

Component: InsightFace ArcFace

Extract 512-dimensional face embedding for each detected face.

Embedding Properties:

  • Dimension: 512
  • Normalized: Unit vector
  • Model: ArcFace (buffalo_l)

Step 7: IoU Matching

Component: Intersection over Union

Match detected faces to tracked persons based on spatial overlap.

Matching Process:

  1. For each tracked person bounding box
  2. Calculate IoU with each detected face bounding box
  3. Assign face with highest IoU (if > threshold)
  4. Link face identity to person track ID

Step 8: Identity Assignment & Attendance

Compare query embedding against enrolled database and log attendance.

Matching Process:

  1. Compare query embedding against all enrolled embeddings
  2. Find the highest similarity score
  3. Accept match if score >= threshold

Threshold Guidelines:

ThresholdSecurityUse Case
0.35DefaultBalanced (current default)
0.45MediumStricter matching
0.55HighFewer false positives
0.70Very HighTwin detection (_Twins suffix)

Identity Caching:

  • Last known identity stored per track ID
  • Survives temporary face occlusions
  • Only updates on confirmed (non-Unknown) matches

Attendance Logging:

  • Frame number
  • Timestamp
  • Track ID
  • Person name
  • Confidence score
  • Bounding box coordinates

Configuration

All settings are centralized for easy customization:

SettingDescriptionDefault
Model NameInsightFace modelbuffalo_l
Detection SizeInput resolution640x640
GPU SupportCUDA accelerationOff
Similarity ThresholdMatch acceptance0.35
Min Face SizeMinimum detectable50 pixels
Quality ThresholdConfidence minimum0.5
Frame SkipVideo optimization1 (all frames)
YOLO ModelPerson detectionyolov8n.pt
YOLO TrackerTracking algorithmbytetrack.yaml
YOLO ConfidencePerson threshold0.8

Data Flow Summary

StageInputOutput
EnrollmentPerson image foldersembeddings.pkl
InputCamera/Video/RTSPVideo frames
Person DetectionFramePerson bounding boxes
TrackingPerson boxesTrack IDs
Face DetectionFrameFace locations
Face RecognitionFace crop512-d embedding
IoU MatchingPerson + Face boxesFace-Person links
AttendanceIdentity + Track IDCSV log record

On this page