Live GPU Inference

Real-Time Multimodal Inference

Production-grade computer vision — detection, pose estimation, OCR, and optical flow — running live from your camera on a GPU backend.

Architecture

Four models, one pipeline

Camera frames stream over WebSocket to an EC2 GPU instance running Triton Inference Server. Each frame is processed by YOLO Detection, YOLO Pose, Farneback optical flow, and event-triggered OCR. Results are smoothed and rendered in real time.

Configuration

Select models

YOLO Detection

Person & object detection

YOLO Pose

Human pose estimation

OCR

Text recognition

Farneback

Optical flow · motion

Initializing server…

Starting EC2 instance and backend. This can take up to 2 minutes. You will have 5 minutes to run inference, the time is limited due to EC2 cost optimization. Turn on your camera