LogoVisionLog

AI Attendance Architecture Layers

Detailed documentation for each layer of the system architecture

VisionLog uses a 6-layer architecture for face recognition attendance tracking, combining YOLOv8 person tracking with InsightFace face recognition.

Layer Overview

LayerPurposeTechnology
1. InputVideo/image captureOpenCV, RTSP
2. Person DetectionDetect people in frameYOLOv8
3. TrackingPersistent IDs across framesByteTrack
4. Face DetectionFind faces in frameInsightFace SCRFD
5. Face RecognitionIdentify personInsightFace ArcFace
6. AttendanceLog with identity cachingCSV, Pickle

Data Flow

StepLayerAction
1InputCapture video frame from camera/video/RTSP
2Person DetectionYOLOv8 detects all people in frame
3TrackingByteTrack assigns persistent track IDs
4Face DetectionInsightFace SCRFD locates faces in full frame
5Face RecognitionArcFace extracts embeddings, matches identity
6IoU MatchingLinks faces to tracked persons by overlap
7AttendanceRecords entry with track ID and timestamp

Layer Details

1. Input Layer

  • Webcam capture
  • Video file processing (MP4, AVI, MKV)
  • RTSP stream processing (IP cameras)
  • Zero-lag multi-threaded reading for RTSP
  • Folder-based enrollment images

2. Person Detection Layer

  • YOLOv8 object detection (Ultralytics)
  • Detects class 0 (person) only
  • Returns bounding boxes with confidence scores
  • Configurable model: yolov8n, yolov8s, yolov8m
  • Confidence threshold: 0.8 (default)

3. Tracking Layer

  • ByteTrack multi-object tracking algorithm
  • Persistent track IDs maintained across frames
  • Motion prediction for occlusion handling
  • Alternative: BoT-SORT tracker
  • Track mode: all or known_only

4. Face Detection Layer

  • InsightFace SCRFD model
  • Full-frame face detection (not cropped to person boxes)
  • Returns bounding boxes, landmarks, and confidence scores
  • Quality filtering: min 50 pixels, confidence >= 0.5

5. Face Recognition Layer

  • ArcFace face embedding extraction (512-d vectors)
  • Cosine similarity matching against database
  • Configurable match threshold (default: 0.35)
  • Special _Twins handling (0.7 threshold)

6. Attendance Layer

  • IoU matching links faces to tracked persons
  • Identity caching per track ID
  • Records: frame, timestamp, track ID, name, confidence, bbox
  • CSV logging with session summary
  • Duplicate prevention mechanism

On this page