LogoVisionLog

System Architecture

Understanding the modular architecture of VisionLog - AI Attendance

The VisionLog - AI Attendance module is built on a modular architecture designed for efficient face recognition and attendance tracking. The system uses a dual-stage pipeline: YOLOv8 for person detection and tracking, combined with InsightFace (buffalo_l model) for face detection and recognition.

System Architecture

Architecture Overview

The system follows a 6-layer processing pipeline:

LayerComponentTechnology
1InputOpenCV, RTSP
2Person DetectionYOLOv8
3TrackingByteTrack
4Face Detection + RecognitionInsightFace (SCRFD + ArcFace)
5AttendanceIdentity caching, logging
6StoragePickle database, CSV logs

Core Components

Person Detection & Tracking (YOLOv8 + ByteTrack)

The first stage uses YOLOv8 for robust person detection with ByteTrack for persistent tracking across frames.

Key Features:

FeatureDescription
Person DetectionYOLOv8 detects all people in frame with bounding boxes
Persistent TrackingByteTrack assigns stable IDs across frames
Motion PredictionHandles occlusions and re-entries
Configurable TrackerSupport for ByteTrack or BoT-SORT

Tracking Configuration:

SettingDefaultDescription
YOLO Modelyolov8n.ptNano model (fast), also supports yolov8s, yolov8m
Trackerbytetrack.yamlByteTrack or botsort.yaml
Confidence0.8Person detection confidence threshold
Track ModeallTrack all persons or known_only

Face Engine (InsightFace)

The heart of the system - handles face detection and recognition operations.

Key Functions:

FunctionDescription
Face DetectionSCRFD detects all faces in full frame
Face RecognitionArcFace extracts 512-d embeddings and matches against database
IoU MatchingLinks detected faces to tracked persons by spatial overlap
Identity CachingPreserves last known identity per track ID

Enrollment System

Folder-based batch enrollment for adding faces to the database.

FeatureDescription
Batch ProcessingEnroll multiple persons at once
Folder StructureOne folder per person with multiple images
Automatic AveragingMultiple images averaged for robustness

Recognition System

Real-time face recognition from multiple input sources.

Input Modes:

  • Camera feed (live recognition)
  • Single image file
  • Video file processing
  • RTSP streams (IP cameras)

Features:

  • FPS counter display
  • Screenshot capture
  • Database reload
  • Confidence display toggle

Video Processor

Batch processing of CCTV recordings, video files, and RTSP streams.

Capabilities:

  • Frame skipping for performance (configurable)
  • YOLOv8 person tracking with persistent IDs
  • Annotated video output with bounding boxes
  • CSV logging with frame, timestamp, track ID, name, confidence
  • Session summary with unique tracks and identified persons

Zero-Lag RTSP Engine:

  • Multi-threaded frame reading (always processes latest frame)
  • Auto-reconnect on connection loss (5-second retry)
  • Designed for 24/7 continuous monitoring

Attendance Tracker

Real-time attendance logging with duplicate prevention.

Features:

  • Identity caching per track ID
  • Duplicate prevention mechanism
  • CSV export with timestamps
  • Filter by date, person, or camera

Processing Pipeline

1. Person Detection (YOLOv8)

YOLOv8 detects all people in the frame, returning bounding boxes with confidence scores.

2. Tracking (ByteTrack)

ByteTrack assigns persistent track IDs to each person, maintaining identity across frames even through occlusions.

3. Face Detection (SCRFD)

InsightFace SCRFD detects faces on the full frame for maximum accuracy.

Quality Filters:

  • Minimum face size: 50 pixels
  • Detection confidence threshold: 0.5

4. Face Recognition (ArcFace)

Each detected face is converted to a 512-dimensional embedding vector using the ArcFace model.

5. IoU Matching

Faces are matched to tracked persons using Intersection over Union (IoU) of bounding boxes.

6. Identity Assignment

Recognition uses cosine similarity between embeddings.

Matching Process:

  1. Compare query embedding against all enrolled embeddings
  2. Find the highest similarity score
  3. Accept match if score >= threshold (default: 0.35)
  4. Cache successful identities per track ID

Special Handling:

  • Names ending with _Twins use stricter 0.7 threshold

Configuration

Centralized settings for the entire system:

SettingDescriptionDefault
Model NameInsightFace modelbuffalo_l
Detection SizeInput resolution640x640
GPU SupportEnable CUDA accelerationOff
Similarity ThresholdMatch acceptance threshold0.35
Min Face SizeMinimum detectable face50 pixels
Quality ThresholdDetection confidence minimum0.5
Frame SkipVideo processing optimization1 (all frames)
YOLO ModelPerson detection modelyolov8n.pt
YOLO TrackerTracking algorithmbytetrack.yaml
YOLO ConfidencePerson detection threshold0.8

Storage

ComponentPurpose
embeddings.pklFace embeddings database (Pickle format)
CSV LogsDetection logs with frame, timestamp, track ID, name
Source ImagesOriginal images organized by person
OutputProcessed video files

Technology Summary

LayerTechnology
Input SourcesCamera, Video Files, Images, RTSP Streams
Person DetectionYOLOv8 (Ultralytics)
Person TrackingByteTrack / BoT-SORT
Face DetectionInsightFace SCRFD
Face RecognitionInsightFace ArcFace (buffalo_l)
MatchingCosine Similarity + IoU
DatabasePickle (embeddings.pkl)
LoggingCSV format

On this page