VisionLog Overview

VisionLog is a modular AI platform designed for advanced computer vision monitoring, forensic analysis, and real-time behavioral intelligence. By leveraging state-of-the-art deep learning models, VisionLog provides a flexible ecosystem for various specialized operations, including automated attendance tracking, security monitoring, and identity search.

AI Attendance Tracking

AI Attendance is a primary module within the VisionLog platform. It automates attendance logging using CCTV cameras, webcams, RTSP streams, or video files with AI-powered face recognition. The system uses a dual-stage pipeline: YOLOv8 for person detection and tracking, combined with InsightFace (buffalo_l model) for face detection and recognition.

Live Vision Operation

The Live Vision module offers real-time situational awareness by processing live streams from CCTV, IP cameras, or webcams. It displays an annotated video feed with bounding boxes, track IDs, and an identification sidebar showcasing recognized individuals alongside real-time analytics like total person count and active track count.

Person Search (Identity Search)

This forensic tool allows operators to locate specific individuals across entire video archives or live feeds. By uploading a reference photo, the system utilizes high-precision neural architectures (InsightFace ArcFace) to scan for matches, triggering real-time alerts for high-similarity detections and maintaining detailed track logs with timestamps.

Real-time Person Tracking

A unified pipeline combining person detection (YOLOv8) and robust tracking (ByteTrack) with facial recognition. The system maintains persistent "Track IDs" even when faces are temporarily obscured or turned away, using an advanced Identity Cache to ensure identity continuity across the entire tracking duration.

Key Features

Multi-Source Input - Supports live camera streams, webcams, RTSP streams, and offline video files (MP4, AVI, MKV)
YOLOv8 Person Tracking - Persistent track IDs across frames with ByteTrack
InsightFace Engine - Uses SCRFD for face detection and ArcFace for recognition (buffalo_l model)
Folder-Based Enrollment - Organize person images in folders named by person for easy batch enrollment
Real-time Recognition - Process camera feed or video files with live face detection
Attendance Tracking - Automatic logging with identity caching and duplicate prevention
Zero-Lag RTSP - Multi-threaded IP camera processing with auto-reconnect
Comprehensive Logging - CSV logs with frame, timestamp, track ID, name, and confidence

How It Works

The system follows a dual-stage processing pipeline:

Step	Component	Technology	Description
1	Enrollment	InsightFace	Create face database by enrolling persons from folders
2	Video Input	OpenCV	Capture from camera, video files, or RTSP streams
3	Person Detection	YOLOv8	Detect all people in frame with bounding boxes
4	Tracking	ByteTrack	Assign persistent track IDs across frames
5	Face Detection	InsightFace SCRFD	Detect faces in full frame
6	Face Recognition	InsightFace ArcFace	Generate 512-d embeddings, match against database
7	IoU Matching	NumPy	Link detected faces to tracked persons
8	Attendance Logging	CSV	Log recognized persons with timestamps and track IDs

System Architecture

The architecture consists of modular components for different functions:

Module	Purpose
Face Tracker	YOLOv8 person detection with ByteTrack
Face Engine	InsightFace detection & recognition
Enrollment	Folder-based batch face enrollment
Recognition	Real-time face recognition
Attendance	Attendance tracking with identity caching
Video Processor	Video/webcam/RTSP processing
Configuration	Centralized settings

Technology Stack

Component	Technology
Person Detection	YOLOv8 (Ultralytics)
Person Tracking	ByteTrack
Face Detection	InsightFace SCRFD
Face Recognition	InsightFace ArcFace (buffalo_l model)
Embedding Storage	Pickle (embeddings.pkl)
Image Processing	OpenCV
Logging	CSV format
GPU Support	CUDA (optional)

Experimental Lab

Beyond the core attendance and surveillance modules, VisionLog includes an Experimental Lab — a dedicated section for advanced computer vision features under active development. These use cases explore capabilities that extend the platform into broader AI-vision domains.

Access the Experimental Lab via the Lab button in the main sidebar, or navigate directly to /experimental.

Face Draw (Randomizer)

An AI-powered participant randomizer that detects all faces in an uploaded group photo and uses them as entries in an animated draw. The system analyzes the image via the recognition engine, extracts individual face crops, and lets operators run a fair random selection using one of five animated draw methods.

Draw Method	Description
Spin Wheel	Classic spinning wheel with each face as a segment
Spotlight	Sweeping spotlight that slows to land on the winner
Bubble Pop	Face bubbles eliminated one-by-one
Black Hole	Vortex pulls participants in until one remains
Cyber Lock	Futuristic targeting lock-on mechanism

External participants (not in the photo) can also be added manually.

Cross-Camera Tracker

Tracks a target individual across multiple simultaneous camera feeds using biometric identity matching. A reference image is used to scan live or recorded footage from different sources at once, enabling persistent identity tracking even when the subject moves between camera zones.

Key capabilities:

Multi-feed simultaneous identity search
Reference-image based biometric matching
Cross-zone continuity tracking

Sign to Text

A real-time American Sign Language (ASL) recognition system using the device webcam and MediaPipe hand landmark detection. The system captures hand gestures frame-by-frame and converts them into text output with a stability filter to prevent false commits.

Supports:

16 ASL letters — A, B, C, D, F, I, K, L, O, R, S, U, V, W, X, Y
7 word gestures — HELLO, GOOD, BAD, I LOVE YOU, STOP, ROCK ON, OK
Live detection history and clipboard export

Folder Person Search

Scans a server-side folder of images for a specific individual using a reference photo. The InsightFace ArcFace engine generates 512-dimensional embeddings to compare faces across all images in the dataset, returning matches ranked by similarity score.

Key capabilities:

Reference-photo based biometric search across an image archive
Configurable similarity threshold
Displays matched image paths and confidence scores

Webcam Folder Search

An extension of Folder Person Search where the reference face is captured live from the webcam instead of an uploaded image. The operator takes a snapshot, which is then used as the search query against a pre-indexed server-side folder.

Key capabilities:

Live webcam capture as the identity reference
Same InsightFace ArcFace biometric engine as Folder Person Search
Useful for on-site identification without needing a stored reference image

Photo Batch Analysis (Folder Identity Lab)

Processes a folder of images to discover and cluster all unique individuals present across the entire collection. The system runs face detection and clustering on every image and produces a person-first view — showing every photo in which each unique face appears.

Supports both sources:

Local Folder — Upload images directly from the browser
Server Folder — Point to a directory already indexed on the backend

Output includes: Unique person count, total face count, known-vs-unknown breakdown, per-person photo gallery.

Intelli Image Search

A natural-language image search engine powered by Google Gemini AI. Instead of face-matching, operators describe what they're looking for in plain text (e.g. "photo with the most people", "image with happiest faces") and the system scores every image in a server folder against the query.

Key capabilities:

Free-text semantic search over image archives
Google Gemini Vision API integration (requires API key)
Results ranked with a match score (0–10) and AI-generated reasoning per image

Automatic Focus Detection

Evaluates the sharpness of uploaded face images using a 3-metric ensemble algorithm (Tenengrad, Crête blur score, FFT high-frequency ratio). It outputs a normalized 0–100 focus quality score that is highly resistant to content and scale biases, rejecting blurry enrollment photos automatically and prioritizing eye-level sharpness.

AI Image Editor

An AI-assisted image editing tool for performing operations like cropping, annotation, and face-aware adjustments on images. Integrated into the VisionLog pipeline to enable lightweight in-browser editing before analysis or enrollment.

Emotion Detection

Evaluates facial expressions in real-time or within static images using the DeepFace model. Integrating face detection with expression inference, it categorizes emotions (Happiness, Sadness, Anger, Surprise, Neutral) to append contextual emotional metadata to standard biometric identity logs.

Duplicate Selector

An operational tool designed to refine biometric datasets by surfacing potential duplicate images. It identifies image clusters with exceptionally high cosine similarity (>0.95), allowing human operators to manually resolve identity collisions (e.g., merge identities, delete redundant files, or explicitly differentiate subjects like twins).

Quick Start

Scenario: An organisation wants to automate daily attendance for 50 employees using a single CCTV camera at the main entrance, and also be able to search footage when a visitor is reported missing.

Step 1 — Build the Face Database

Before any recognition can happen, every employee's face must be enrolled. Capture 3–5 clear photos of each person and organise them into a folder structure where each sub-folder is named after the person:

enrollment_photos/
├── Alice_Sharma/
│   ├── front.jpg
│   └── profile.jpg
├── Bob_Menon/
│   └── bob_office.jpg
└── ...

Navigate to Training & Database → Enroll Faces and point the system at this root folder. The enrollment module extracts 512-dimensional ArcFace embeddings for each detected face and stores them in embeddings.pkl. This only needs to be done once per person (or whenever their appearance changes significantly).

See the Enrollment guide for field configuration and tips on photo quality.

Step 2 — Start the Live Attendance Feed

With the face database ready, go to Live Vision → Live Stream and connect the entrance CCTV (either via webcam device index or an RTSP URL):

rtsp://admin:password@192.168.1.64:554/stream1

The system begins the dual-stage pipeline immediately — YOLOv8 locates every person in each frame, ByteTrack assigns persistent track IDs, and InsightFace matches faces against the enrolled database. Recognised employees appear in the identification sidebar with their name, confidence score, and first-seen timestamp.

Attendance is logged automatically. Each recognised person is written to a CSV file (frame number, timestamp, track ID, name, confidence) with duplicate prevention — the same person won't be logged again until they leave and re-enter the frame.

Step 3 — Review Attendance Logs

At the end of the day, open the attendance CSV from the configured output directory. Each row represents a unique sighting:

Frame	Timestamp	Track ID	Name	Confidence
1482	09:03:14	7	Alice_Sharma	0.91
2031	09:07:52	12	Bob_Menon	0.88

Cross-reference this with your HR system to generate a daily attendance report. Employees who appear zero times in the log were absent.

Step 4 — Search for a Specific Individual (Forensic Use)

Suppose security needs to confirm when a particular visitor was on-site. Go to Video Search → Target Search, upload a reference photo of the individual, and point the system at the recorded footage file. VisionLog will scan every frame, surface all timestamps where that face appears, and display a track log with confidence scores.

For offline batch use across a folder of images, use ⚗ Lab → Folder Person Search — upload the reference and select the image archive. Results are ranked by similarity score.

For detailed configuration of models, thresholds, and RTSP settings, see the Architecture and Enrollment pages.

VisionLog Overview

On this page