Tech Stack

This page provides an overview of all technologies used to build the VisionLog AI Attendance application, explaining what each technology does and how it's utilized.

Overview

VisionLog AI Attendance is built with Python using industry-leading open-source libraries for computer vision and deep learning. The system uses a dual-stage pipeline: YOLOv8 for person tracking and InsightFace for face recognition.

Layer	Technology
Input	OpenCV
Person Detection	YOLOv8 (Ultralytics)
Person Tracking	ByteTrack
Face Detection	InsightFace SCRFD
Face Recognition	InsightFace ArcFace
Processing	NumPy, ONNX Runtime
Storage	Pickle, CSV

Hardware

Component	Current Setup	GPU Support
Platform	Windows, Linux	-
Processor	CPU	NVIDIA CUDA
Inference	ONNX Runtime (CPU)	ONNX Runtime (GPU)

Current Configuration:

The system has been tested on both Windows and Linux machines, currently using CPU for all inference operations. This provides a straightforward cross-platform setup without requiring GPU drivers or CUDA installation.

GPU Acceleration:

Both YOLOv8 and InsightFace support GPU acceleration through CUDA. To enable GPU support:

Install NVIDIA GPU drivers
Install CUDA Toolkit (compatible version)
Install onnxruntime-gpu instead of onnxruntime

Performance Expectations:

Mode	Typical FPS	Best For
CPU	5-10 FPS	Development, small-scale deployments
GPU (CUDA)	30-100+ FPS	Production, real-time processing

YOLOv8 (Ultralytics)

Property	Value
Type	Object Detection & Tracking
Version	>= 8.0.0
License	AGPL-3.0

What it is: YOLOv8 is the latest version of the YOLO (You Only Look Once) object detection model from Ultralytics. It provides state-of-the-art real-time object detection with built-in tracking capabilities.

How we use it:

Person Detection - Detects all people in each video frame, returning bounding boxes with confidence scores. We use class 0 (person) only.
Person Tracking - YOLOv8's built-in tracking with ByteTrack assigns persistent IDs to each detected person, maintaining identity across frames.

Model Options:

Model	Speed	Accuracy	Size	Use Case
`yolov8n.pt`	Fastest	Good	~6MB	Real-time (our default)
`yolov8s.pt`	Fast	Better	~22MB	Balanced
`yolov8m.pt`	Medium	High	~52MB	Higher accuracy

ByteTrack

Property	Value
Type	Multi-Object Tracking Algorithm
Integration	Built into Ultralytics

What it is: ByteTrack is a simple, effective, and generic association method for multi-object tracking. It associates almost every detection box instead of only the high score ones.

How we use it:

Persistent Tracking - Assigns stable track IDs to each person across video frames
Motion Prediction - Handles temporary occlusions and re-entries
Identity Preservation - Maintains consistent IDs even when people overlap or temporarily leave frame

Alternative Tracker:

Tracker	Description
`bytetrack.yaml`	Default, fast and accurate
`botsort.yaml`	Alternative with different motion model

InsightFace

Property	Value
Type	Face Analysis Library
Version	>= 0.7.0
License	MIT

What it is: InsightFace is an open-source 2D & 3D deep face analysis library that provides state-of-the-art face detection, recognition, and analysis models. It consistently ranks among the top performers on face recognition benchmarks.

How we use it:

Face Detection (SCRFD) - Detects all faces in a frame using SCRFD (Sample and Computation Redistribution for Face Detection), returning bounding box coordinates and facial landmarks (eyes, nose, mouth corners). Handles multiple faces simultaneously with high accuracy even in challenging conditions.
Face Recognition (ArcFace) - Generates 512-dimensional embedding vectors that uniquely represent each face. These embeddings are compared using cosine similarity for identity matching.

Model Options:

Model	Accuracy	Speed	Size	Use Case
`buffalo_l`	Highest (99.83% LFW)	Medium	~400MB	Production (our choice)
`buffalo_m`	High	Fast	~200MB	Balanced
`buffalo_s`	Good	Fastest	~100MB	Edge devices

OpenCV

Property	Value
Type	Computer Vision Library
Version	>= 4.5.0
License	Apache 2.0

What it is: OpenCV (Open Source Computer Vision Library) is the most widely used computer vision library, providing 2500+ optimized algorithms for image and video processing.

How we use it:

Video/Camera Input - Capturing frames from webcams, video files (MP4, AVI, MKV), and RTSP streams using VideoCapture.
Image I/O - Reading enrollment images.
Visualization - Drawing bounding boxes around detected persons and faces, displaying names, track IDs, confidence scores, FPS, and status information in real-time.

ONNX Runtime

Property	Value
Type	ML Inference Engine
Version	>= 1.10.0
License	MIT

What it is: ONNX Runtime is a high-performance inference engine for machine learning models in the ONNX (Open Neural Network Exchange) format. It provides optimized execution across different hardware platforms.

How we use it:

InsightFace models are stored in ONNX format, and ONNX Runtime handles the actual neural network inference for both face detection and recognition. It enables seamless switching between CPU and GPU execution.

Performance Comparison:

Provider	Hardware	Typical FPS
CPU	Intel i7	5-10 FPS
CUDA GPU	RTX 3060	30-50 FPS
CUDA GPU	RTX 4090	100+ FPS

NumPy

Property	Value
Type	Numerical Computing Library
Version	>= 1.21.0
License	BSD

What it is: NumPy is the fundamental package for scientific computing in Python, providing support for large, multi-dimensional arrays and matrices, along with mathematical functions to operate on them.

How we use it:

Embedding Storage - Face embeddings are stored as 512-dimensional NumPy arrays, enabling efficient memory management and serialization.
Similarity Calculation - Computing cosine similarity between face embeddings using vectorized dot product and norm operations.
Batch Operations - Efficiently comparing a query face against all enrolled faces in the database using vectorized array operations.
IoU Calculation - Computing Intersection over Union for matching faces to tracked persons.

Python

Property	Value
Type	Programming Language
Version	>= 3.8
License	PSF License

What it is: Python is a high-level, general-purpose programming language known for its readability and extensive ecosystem of libraries for data science and machine learning.

How we use it:

Python serves as the primary language for the entire AI attendance application, chosen for its excellent support for machine learning frameworks, computer vision libraries, and rapid development capabilities.

Dependencies Summary

Package	Version	Purpose
`numpy`	>= 1.21.0	Numerical computing & embeddings
`opencv-python`	>= 4.5.0	Computer vision & video I/O
`insightface`	>= 0.7.0	Face detection & recognition
`ultralytics`	>= 8.0.0	YOLOv8 person detection & tracking
`onnxruntime`	>= 1.10.0	ML inference (CPU)
`onnxruntime-gpu`	>= 1.10.0	ML inference (GPU, optional)

Technology Selection Rationale

Technology	Why We Chose It
YOLOv8	State-of-the-art object detection, built-in tracking, actively maintained
ByteTrack	Simple yet effective tracking, handles occlusions well
InsightFace	State-of-the-art accuracy (99.83% LFW), open-source, actively maintained
OpenCV	Industry standard, extensive documentation, cross-platform support
ONNX Runtime	Hardware-agnostic inference, easy CPU/GPU switching, optimized performance
NumPy	Fast vectorized operations, seamless integration with ML libraries

Tech Stack

Overview

Hardware

YOLOv8 (Ultralytics)

ByteTrack

InsightFace

OpenCV

ONNX Runtime

NumPy

Python

Dependencies Summary

Technology Selection Rationale

See Also

On this page