LogoVisionLog

System Architecture

Understanding the modular architecture of VisionLog - Object Detection

The VisionLog - Object Detection module is built on a modular architecture designed for efficient object detection and classification. The system uses YOLO (You Only Look Once) with Ultralytics for real-time detection and embedding-based matching.

System Architecture

Core Components

Detection Engine

The heart of the system - handles all object detection and classification operations.

Key Functions:

FunctionDescription
Object DetectionDetect all objects in a frame with confidence filtering
Feature ExtractionExtract embeddings from detected regions
Object ClassificationMatch embedding against enrolled database
Frame ProcessingCombined detection + classification for a frame

Enrollment System

Folder-based batch enrollment for adding object classes to the database.

FeatureDescription
Batch ProcessingEnroll multiple object classes at once
Folder StructureOne folder per class with multiple reference images
Automatic AveragingMultiple images averaged for robustness

Recognition System

Real-time object detection from multiple input sources.

Input Modes:

  • Camera feed (live detection)
  • Single image file
  • Video file processing

Features:

  • FPS counter display
  • Screenshot capture
  • Database reload
  • Confidence display toggle

Video Processor

Batch processing of CCTV recordings and video files.

Capabilities:

  • Frame skipping for performance (configurable)
  • Unknown object cropping and storage
  • Annotated video output with bounding boxes
  • Report generation with detection summary
  • Progress tracking with ETA

Detection Logger

Real-time detection logging to PostgreSQL with duplicate prevention.

Features:

  • Duplicate prevention mechanism (cooldown-based)
  • PostgreSQL database storage
  • Query and export detection logs
  • Filter by date, class, or camera

YOLO Pipeline

1. Object Detection (YOLOv8)

YOLO uses a single-pass architecture for efficient detection. For each detected object, the system extracts:

  • Bounding box coordinates
  • Detection confidence score
  • Class prediction (for pre-trained classes)

Quality Filters:

  • Minimum confidence threshold: 0.25
  • Non-max suppression IoU: 0.45

2. Feature Extraction

Each detected object region is cropped and converted to a 512-dimensional embedding vector using the backbone feature extractor.

3. Object Matching

Classification uses cosine similarity between embeddings.

Matching Process:

  1. Compare query embedding against all enrolled class embeddings
  2. Find the highest similarity score
  3. Accept match if score >= threshold (default: 0.6)

4. Embedding Averaging

For more robust enrollment, multiple reference images are averaged and normalized to create a single representative embedding per class.

Configuration

Centralized settings for the entire system:

SettingDescriptionDefault
Model NameYOLO model variantyolov8n
Input SizeDetection resolution640x640
GPU SupportEnable CUDA accelerationOff
Similarity ThresholdMatch acceptance threshold0.6
Confidence ThresholdDetection minimum0.25
NMS ThresholdNon-max suppression IoU0.45
Frame SkipVideo processing optimizationEvery 5 frames

Storage

ComponentPurpose
PostgreSQLObject embeddings and detection logs
Reference ImagesOriginal images organized by class
Unknown ObjectsCropped images of unclassified detections
OutputProcessed video files and reports

Technology Summary

LayerTechnology
Input SourcesCamera, Video Files, Images
Object DetectionYOLO (Ultralytics)
Feature BackboneCSPDarknet + PANet
MatchingCosine Similarity
DatabasePostgreSQL with pgvector

On this page