Tech Stack
Technologies used in VisionLog AI Attendance system
This page provides an overview of all technologies used to build the VisionLog AI Attendance application, explaining what each technology does and how it's utilized.
Overview
VisionLog AI Attendance is built with Python using industry-leading open-source libraries for computer vision and deep learning. The system uses a dual-stage pipeline: YOLOv8 for person tracking and InsightFace for face recognition.
| Layer | Technology |
|---|---|
| Input | OpenCV |
| Person Detection | YOLOv8 (Ultralytics) |
| Person Tracking | ByteTrack |
| Face Detection | InsightFace SCRFD |
| Face Recognition | InsightFace ArcFace |
| Processing | NumPy, ONNX Runtime |
| Storage | Pickle, CSV |
Hardware
| Component | Current Setup | GPU Support |
|---|---|---|
| Platform | Windows, Linux | - |
| Processor | CPU | NVIDIA CUDA |
| Inference | ONNX Runtime (CPU) | ONNX Runtime (GPU) |
Current Configuration:
The system has been tested on both Windows and Linux machines, currently using CPU for all inference operations. This provides a straightforward cross-platform setup without requiring GPU drivers or CUDA installation.
GPU Acceleration:
Both YOLOv8 and InsightFace support GPU acceleration through CUDA. To enable GPU support:
- Install NVIDIA GPU drivers
- Install CUDA Toolkit (compatible version)
- Install
onnxruntime-gpuinstead ofonnxruntime
Performance Expectations:
| Mode | Typical FPS | Best For |
|---|---|---|
| CPU | 5-10 FPS | Development, small-scale deployments |
| GPU (CUDA) | 30-100+ FPS | Production, real-time processing |
YOLOv8 (Ultralytics)
| Property | Value |
|---|---|
| Type | Object Detection & Tracking |
| Version | >= 8.0.0 |
| License | AGPL-3.0 |
What it is: YOLOv8 is the latest version of the YOLO (You Only Look Once) object detection model from Ultralytics. It provides state-of-the-art real-time object detection with built-in tracking capabilities.
How we use it:
-
Person Detection - Detects all people in each video frame, returning bounding boxes with confidence scores. We use class 0 (person) only.
-
Person Tracking - YOLOv8's built-in tracking with ByteTrack assigns persistent IDs to each detected person, maintaining identity across frames.
Model Options:
| Model | Speed | Accuracy | Size | Use Case |
|---|---|---|---|---|
yolov8n.pt | Fastest | Good | ~6MB | Real-time (our default) |
yolov8s.pt | Fast | Better | ~22MB | Balanced |
yolov8m.pt | Medium | High | ~52MB | Higher accuracy |
ByteTrack
| Property | Value |
|---|---|
| Type | Multi-Object Tracking Algorithm |
| Integration | Built into Ultralytics |
What it is: ByteTrack is a simple, effective, and generic association method for multi-object tracking. It associates almost every detection box instead of only the high score ones.
How we use it:
- Persistent Tracking - Assigns stable track IDs to each person across video frames
- Motion Prediction - Handles temporary occlusions and re-entries
- Identity Preservation - Maintains consistent IDs even when people overlap or temporarily leave frame
Alternative Tracker:
| Tracker | Description |
|---|---|
bytetrack.yaml | Default, fast and accurate |
botsort.yaml | Alternative with different motion model |
InsightFace
| Property | Value |
|---|---|
| Type | Face Analysis Library |
| Version | >= 0.7.0 |
| License | MIT |
What it is: InsightFace is an open-source 2D & 3D deep face analysis library that provides state-of-the-art face detection, recognition, and analysis models. It consistently ranks among the top performers on face recognition benchmarks.
How we use it:
-
Face Detection (SCRFD) - Detects all faces in a frame using SCRFD (Sample and Computation Redistribution for Face Detection), returning bounding box coordinates and facial landmarks (eyes, nose, mouth corners). Handles multiple faces simultaneously with high accuracy even in challenging conditions.
-
Face Recognition (ArcFace) - Generates 512-dimensional embedding vectors that uniquely represent each face. These embeddings are compared using cosine similarity for identity matching.
Model Options:
| Model | Accuracy | Speed | Size | Use Case |
|---|---|---|---|---|
buffalo_l | Highest (99.83% LFW) | Medium | ~400MB | Production (our choice) |
buffalo_m | High | Fast | ~200MB | Balanced |
buffalo_s | Good | Fastest | ~100MB | Edge devices |
OpenCV
| Property | Value |
|---|---|
| Type | Computer Vision Library |
| Version | >= 4.5.0 |
| License | Apache 2.0 |
What it is: OpenCV (Open Source Computer Vision Library) is the most widely used computer vision library, providing 2500+ optimized algorithms for image and video processing.
How we use it:
-
Video/Camera Input - Capturing frames from webcams, video files (MP4, AVI, MKV), and RTSP streams using
VideoCapture. -
Image I/O - Reading enrollment images.
-
Visualization - Drawing bounding boxes around detected persons and faces, displaying names, track IDs, confidence scores, FPS, and status information in real-time.
ONNX Runtime
| Property | Value |
|---|---|
| Type | ML Inference Engine |
| Version | >= 1.10.0 |
| License | MIT |
What it is: ONNX Runtime is a high-performance inference engine for machine learning models in the ONNX (Open Neural Network Exchange) format. It provides optimized execution across different hardware platforms.
How we use it:
InsightFace models are stored in ONNX format, and ONNX Runtime handles the actual neural network inference for both face detection and recognition. It enables seamless switching between CPU and GPU execution.
Performance Comparison:
| Provider | Hardware | Typical FPS |
|---|---|---|
| CPU | Intel i7 | 5-10 FPS |
| CUDA GPU | RTX 3060 | 30-50 FPS |
| CUDA GPU | RTX 4090 | 100+ FPS |
NumPy
| Property | Value |
|---|---|
| Type | Numerical Computing Library |
| Version | >= 1.21.0 |
| License | BSD |
What it is: NumPy is the fundamental package for scientific computing in Python, providing support for large, multi-dimensional arrays and matrices, along with mathematical functions to operate on them.
How we use it:
-
Embedding Storage - Face embeddings are stored as 512-dimensional NumPy arrays, enabling efficient memory management and serialization.
-
Similarity Calculation - Computing cosine similarity between face embeddings using vectorized dot product and norm operations.
-
Batch Operations - Efficiently comparing a query face against all enrolled faces in the database using vectorized array operations.
-
IoU Calculation - Computing Intersection over Union for matching faces to tracked persons.
Python
| Property | Value |
|---|---|
| Type | Programming Language |
| Version | >= 3.8 |
| License | PSF License |
What it is: Python is a high-level, general-purpose programming language known for its readability and extensive ecosystem of libraries for data science and machine learning.
How we use it:
Python serves as the primary language for the entire AI attendance application, chosen for its excellent support for machine learning frameworks, computer vision libraries, and rapid development capabilities.
Dependencies Summary
| Package | Version | Purpose |
|---|---|---|
numpy | >= 1.21.0 | Numerical computing & embeddings |
opencv-python | >= 4.5.0 | Computer vision & video I/O |
insightface | >= 0.7.0 | Face detection & recognition |
ultralytics | >= 8.0.0 | YOLOv8 person detection & tracking |
onnxruntime | >= 1.10.0 | ML inference (CPU) |
onnxruntime-gpu | >= 1.10.0 | ML inference (GPU, optional) |
Technology Selection Rationale
| Technology | Why We Chose It |
|---|---|
| YOLOv8 | State-of-the-art object detection, built-in tracking, actively maintained |
| ByteTrack | Simple yet effective tracking, handles occlusions well |
| InsightFace | State-of-the-art accuracy (99.83% LFW), open-source, actively maintained |
| OpenCV | Industry standard, extensive documentation, cross-platform support |
| ONNX Runtime | Hardware-agnostic inference, easy CPU/GPU switching, optimized performance |
| NumPy | Fast vectorized operations, seamless integration with ML libraries |
See Also
- Architecture - System architecture overview
- Pipeline - Processing pipeline details
- Layers - Detailed layer-by-layer breakdown
.png)