TRAIN MODELS ANDBUILD AGENTSat practicalhardware limits

The open-source AgentOps platform. Train and fine-tune models with a native C++/CUDA engine — then build, deploy, and operate autonomous AI agents on your own infrastructure.

Apache 2.0C++/CUDA coreAutonomous AgentsMulti-GPUSelf-Hosted

What Surogate does

Near hardware-limit training

A native C++/CUDA engine that pushes your NVIDIA GPUs to their practical limits — from single-GPU rigs to multi-node clusters.

Autonomous agent runtime

Compose agents from skills, tools, MCP servers, and sub-agents. Agents reason, plan, and execute complex multi-step workflows.

Closed improvement loop

Production traces become training data. Fine-tune Specialized Language Models from agent trajectories. Agents get better automatically.

Full agent observability

Complete execution traces for every agent run — every LLM call, tool invocation, sub-agent step, and failure. Visual trace viewer included.

Self-hosted, full control

Runs on Kubernetes. Your infrastructure, your models, your data. On-premise or cloud. Air-gap capable for regulated environments.

What is Surogate?

Surogate is the open-source AgentOps platform by Invergent. It combines a high-performance LLM training engine with a full autonomous agent runtime — so you can train the models that power your agents, and operate those agents at production scale, all in one self-hosted system.

The training core is built in native C++/CUDA, engineered for near hardware-limit throughput on modern NVIDIA GPUs. The agent layer runs on Kubernetes, with full execution tracing, skill lifecycle management, and a continuous improvement loop from production data.

Pre-training + SFTLoRA / QLoRABF16 • FP8 • NVFP4Autonomous AgentsSkill FrameworkMCP SupportC++/CUDA core
Surogate mascot

Precision recipes

Choose a precision recipe that matches your hardware and goals — from maximum numerical stability to maximum SOL.

BF16

Accuracy

Baseline recipe using bfloat16 GEMMs for maximum numerical accuracy. No quantization.

  • Best when stability matters most
  • Great for validation baselines

FP8

Performance

Native FP8 training (E4M3 for weights & activations, E5M2 for gradients) with delayed scaling for stability.

  • Excellent SOL on modern GPUs
  • Works well with QLoRA

NVFP4

Extreme

CUTLASS FP4 (E2M1) training with block scaling, stochastic rounding, and Hadamard transforms for stability.

  • Built for Blackwell‑class GPUs
  • Max memory efficiency + SOL

QLoRA

BitsAndBytes, FP8 and NVFP4 dynamic quantization to help maximize SOL on Ampere/Hopper/Blackwell hardware.

Learn more on QLoRA

Surogate Studio

The full-stack AgentOps platform — built on the open-source core.

Surogate Studio

Autonomous Agent Runtime

Compose agents from skills, tools, MCP servers, and custom LLMs. Agents reason, coordinate sub-agents, and execute complex workflows — deployed as containerized Kubernetes applications.

Full Agent Observability

Every agent run generates a complete execution trace — LLM calls, tool invocations, sub-agent steps, memory operations, errors. Visual trace viewer, session replay, anomaly alerts, performance dashboards.

Continuous Improvement Loop

Collect production traces → convert to training datasets → fine-tune Specialized Language Models (SLMs) from agent trajectories → evaluate → promote to production. Agents improve automatically over time.

High-Performance Training & Serving

Full fine-tuning, LoRA/QLoRA, RL alignment (GRPO, DPO, PPO), multi-GPU/multi-node. GPU-accelerated inference with vLLM, KV-cache offloading, tensor parallelism, LoRA adapter stacking.

Data Hub — Git-style Artifact Registry

Central versioned registry for models, datasets, agent definitions, skills, and tools. Git-style branches, commits, tags, PRs, diffs. Import/export from HuggingFace and ModelScope. Single source of truth across the entire platform.

Join the community

Get help, share benchmarks, discuss training recipes, and follow the project. The Surogate community is where the interesting conversations happen.

Quickstart

Below is a minimal flow: install the package, create a small YAML config, and start a supervised fine‑tune.

1

Install

Install

Run the following command:

curl -sSL https://surogate.ai/install.sh | bash

This installs the CLI so you can run training with simple commands.

*Requires Ubuntu 24.04 x64 with CUDA 12.8/12.9/13.0

2

Run

Run

Start the SFT job using your config:

surogate sft examples/sft/qwen3-lora-qbnb.yaml

Output

Checkpoints, logs, and artifacts are written under output_dir.

Config example (YAML)

LoRA enabled
Config reference
model: Qwen/Qwen3-0.6B
output_dir: ./output

# training
per_device_train_batch_size: 2
gradient_accumulation_steps: 4
sequence_len: 2048
learning_rate: 2e-4

# LoRA / QLoRA
lora: true
lora_rank: 16
# qlora_fp8: true  # optional, hardware-dependent
# qlora_fp4: true  # Blackwell+

datasets:
  - path: "mlabonne/FineTome-100k"
    type: auto

Swap the model

Use any supported base model you want to fine‑tune.

Tune sequence length

Set sequence_len to fit your GPU memory + target task.

Enable QLoRA

Flip qlora_fp8 or qlora_fp4 when your hardware supports it.

Hardware targets

Runs on Linux with an NVIDIA GPU, recent drivers, CUDA (12/13), NCCL, and cuDNN. GPU support spans multiple architectures.

Supported NVIDIA SMs

From sm80 to sm121 (including Hopper & Blackwell generations).

SM80SM86SM89SM90SM100SM103SM120SM121

Why this matters

  • Same workflow from consumer RTX cards to datacenter GPUs.
  • Precision recipes let you trade stability vs throughput in a controlled way.
  • QLoRA & CPU offloading helps fit larger runs when VRAM is tight.

Want deeper examples?

Browse docs and curated examples for recipes and model‑specific settings.