LogoVisionLog

Tech Stack

Technologies used in VisionLog AI Attendance system

This page provides an overview of all technologies used to build the VisionLog AI Attendance application, explaining what each technology does and how it's utilized.

Overview

VisionLog AI Attendance is built with Python using industry-leading open-source libraries for computer vision and deep learning. The system uses a dual-stage pipeline: YOLOv8 for person tracking and InsightFace for face recognition.

LayerTechnology
InputOpenCV
Person DetectionYOLOv8 (Ultralytics)
Person TrackingByteTrack
Face DetectionInsightFace SCRFD
Face RecognitionInsightFace ArcFace
ProcessingNumPy, ONNX Runtime
StoragePickle, CSV

Hardware

ComponentCurrent SetupGPU Support
PlatformWindows, Linux-
ProcessorCPUNVIDIA CUDA
InferenceONNX Runtime (CPU)ONNX Runtime (GPU)

Current Configuration:

The system has been tested on both Windows and Linux machines, currently using CPU for all inference operations. This provides a straightforward cross-platform setup without requiring GPU drivers or CUDA installation.

GPU Acceleration:

Both YOLOv8 and InsightFace support GPU acceleration through CUDA. To enable GPU support:

  1. Install NVIDIA GPU drivers
  2. Install CUDA Toolkit (compatible version)
  3. Install onnxruntime-gpu instead of onnxruntime

Performance Expectations:

ModeTypical FPSBest For
CPU5-10 FPSDevelopment, small-scale deployments
GPU (CUDA)30-100+ FPSProduction, real-time processing

YOLOv8 (Ultralytics)

PropertyValue
TypeObject Detection & Tracking
Version>= 8.0.0
LicenseAGPL-3.0

What it is: YOLOv8 is the latest version of the YOLO (You Only Look Once) object detection model from Ultralytics. It provides state-of-the-art real-time object detection with built-in tracking capabilities.

How we use it:

  • Person Detection - Detects all people in each video frame, returning bounding boxes with confidence scores. We use class 0 (person) only.

  • Person Tracking - YOLOv8's built-in tracking with ByteTrack assigns persistent IDs to each detected person, maintaining identity across frames.

Model Options:

ModelSpeedAccuracySizeUse Case
yolov8n.ptFastestGood~6MBReal-time (our default)
yolov8s.ptFastBetter~22MBBalanced
yolov8m.ptMediumHigh~52MBHigher accuracy

ByteTrack

PropertyValue
TypeMulti-Object Tracking Algorithm
IntegrationBuilt into Ultralytics

What it is: ByteTrack is a simple, effective, and generic association method for multi-object tracking. It associates almost every detection box instead of only the high score ones.

How we use it:

  • Persistent Tracking - Assigns stable track IDs to each person across video frames
  • Motion Prediction - Handles temporary occlusions and re-entries
  • Identity Preservation - Maintains consistent IDs even when people overlap or temporarily leave frame

Alternative Tracker:

TrackerDescription
bytetrack.yamlDefault, fast and accurate
botsort.yamlAlternative with different motion model

InsightFace

PropertyValue
TypeFace Analysis Library
Version>= 0.7.0
LicenseMIT

What it is: InsightFace is an open-source 2D & 3D deep face analysis library that provides state-of-the-art face detection, recognition, and analysis models. It consistently ranks among the top performers on face recognition benchmarks.

How we use it:

  • Face Detection (SCRFD) - Detects all faces in a frame using SCRFD (Sample and Computation Redistribution for Face Detection), returning bounding box coordinates and facial landmarks (eyes, nose, mouth corners). Handles multiple faces simultaneously with high accuracy even in challenging conditions.

  • Face Recognition (ArcFace) - Generates 512-dimensional embedding vectors that uniquely represent each face. These embeddings are compared using cosine similarity for identity matching.

Model Options:

ModelAccuracySpeedSizeUse Case
buffalo_lHighest (99.83% LFW)Medium~400MBProduction (our choice)
buffalo_mHighFast~200MBBalanced
buffalo_sGoodFastest~100MBEdge devices

OpenCV

PropertyValue
TypeComputer Vision Library
Version>= 4.5.0
LicenseApache 2.0

What it is: OpenCV (Open Source Computer Vision Library) is the most widely used computer vision library, providing 2500+ optimized algorithms for image and video processing.

How we use it:

  • Video/Camera Input - Capturing frames from webcams, video files (MP4, AVI, MKV), and RTSP streams using VideoCapture.

  • Image I/O - Reading enrollment images.

  • Visualization - Drawing bounding boxes around detected persons and faces, displaying names, track IDs, confidence scores, FPS, and status information in real-time.


ONNX Runtime

PropertyValue
TypeML Inference Engine
Version>= 1.10.0
LicenseMIT

What it is: ONNX Runtime is a high-performance inference engine for machine learning models in the ONNX (Open Neural Network Exchange) format. It provides optimized execution across different hardware platforms.

How we use it:

InsightFace models are stored in ONNX format, and ONNX Runtime handles the actual neural network inference for both face detection and recognition. It enables seamless switching between CPU and GPU execution.

Performance Comparison:

ProviderHardwareTypical FPS
CPUIntel i75-10 FPS
CUDA GPURTX 306030-50 FPS
CUDA GPURTX 4090100+ FPS

NumPy

PropertyValue
TypeNumerical Computing Library
Version>= 1.21.0
LicenseBSD

What it is: NumPy is the fundamental package for scientific computing in Python, providing support for large, multi-dimensional arrays and matrices, along with mathematical functions to operate on them.

How we use it:

  • Embedding Storage - Face embeddings are stored as 512-dimensional NumPy arrays, enabling efficient memory management and serialization.

  • Similarity Calculation - Computing cosine similarity between face embeddings using vectorized dot product and norm operations.

  • Batch Operations - Efficiently comparing a query face against all enrolled faces in the database using vectorized array operations.

  • IoU Calculation - Computing Intersection over Union for matching faces to tracked persons.


Python

PropertyValue
TypeProgramming Language
Version>= 3.8
LicensePSF License

What it is: Python is a high-level, general-purpose programming language known for its readability and extensive ecosystem of libraries for data science and machine learning.

How we use it:

Python serves as the primary language for the entire AI attendance application, chosen for its excellent support for machine learning frameworks, computer vision libraries, and rapid development capabilities.


Dependencies Summary

PackageVersionPurpose
numpy>= 1.21.0Numerical computing & embeddings
opencv-python>= 4.5.0Computer vision & video I/O
insightface>= 0.7.0Face detection & recognition
ultralytics>= 8.0.0YOLOv8 person detection & tracking
onnxruntime>= 1.10.0ML inference (CPU)
onnxruntime-gpu>= 1.10.0ML inference (GPU, optional)

Technology Selection Rationale

TechnologyWhy We Chose It
YOLOv8State-of-the-art object detection, built-in tracking, actively maintained
ByteTrackSimple yet effective tracking, handles occlusions well
InsightFaceState-of-the-art accuracy (99.83% LFW), open-source, actively maintained
OpenCVIndustry standard, extensive documentation, cross-platform support
ONNX RuntimeHardware-agnostic inference, easy CPU/GPU switching, optimized performance
NumPyFast vectorized operations, seamless integration with ML libraries

See Also

On this page