LogoVisionLog

Object Detection

YOLO-based real-time object localization in video streams

The detection stage uses YOLO (You Only Look Once) to locate objects in video frames with bounding boxes and confidence scores.

Overview

YOLO processes each video frame to identify and localize objects, outputting bounding box coordinates for downstream matching.

YOLO Architecture

ComponentDescription
BackboneFeature extraction network (CSPDarknet)
NeckFeature pyramid network (PANet)
HeadDetection output layer

Model Variants

ModelSizeSpeedAccuracyUse Case
YOLOv8n6MBFastestGoodEdge devices, real-time
YOLOv8s22MBFastBetterBalanced performance
YOLOv8m52MBMediumHighServer deployment
YOLOv8l87MBSlowerHigherHigh accuracy needs
YOLOv8x137MBSlowestBestMaximum accuracy

Detection Pipeline

StepActionOutput
1Frame captureRaw image (BGR)
2PreprocessingResized, normalized tensor
3YOLO inferenceRaw predictions
4Non-max suppressionFiltered detections
5Output formattingBounding boxes + scores

Detection Output

Each detection includes:

FieldTypeDescription
bbox[x1, y1, x2, y2]Bounding box coordinates
confidencefloatDetection confidence (0-1)
class_idintYOLO class index
class_namestringHuman-readable label

Configuration

ParameterDefaultDescription
conf_threshold0.25Minimum confidence to keep
iou_threshold0.45NMS IoU threshold
max_detections300Maximum objects per frame
img_size640Input resolution

Performance Optimization

GPU Acceleration

SettingDescription
CUDANVIDIA GPU inference
TensorRTOptimized NVIDIA inference
CoreMLApple Silicon optimization

Batching

ApproachLatencyThroughput
Single frameLowLower
Batch processingHigherHigher

Frame Processing

ModeDescriptionUse Case
StreamContinuous video processingLive camera feeds
BatchProcess image directoryOffline analysis
SingleOne image at a timeAPI requests

Multi-scale Detection

FeatureDescription
FPNFeature pyramid for small objects
Large objectsDetected at lower resolution
Small objectsDetected at higher resolution

Output Format

Detection results per frame:

KeyTypeDescription
frame_idintSequential frame number
timestampfloatFrame timestamp
detectionsarrayList of detection objects
inference_timefloatProcessing time (ms)

On this page