XVI Robotics

THE CORE BOTTLENECK

Embodied AI's core bottleneck—no general-purpose brain

Hardware is mature enough, but the industry has no universal brain — the ecosystem can't take off.
Why? Two hidden structural fractures.

02 · PROBLEM

ISSUE 01 · DATA

01

Data has never really Scaled

LLMs unlocked Scaling Law with internet-scale text;
embodied data has never reached that scale —
teleop is expensive, low coverage, no scalable data engine.

COST/UNIT

HIGH

COVERAGE

LOW

SCALING LAW

N/A

ISSUE 02 · TWO WORLDS

02

Digital and physical R&D are siloed

Agents and LLMs evolve at warp speed in the digital world;
robots evolve in isolation in the physical world.
But Physical AGI = Digital AI + Physical AI.

DIGITAL

LLM · Agent

PHYSICAL

Robot · WBC

→

What's missing: one unified brain that bridges the digital and physical worlds.

CORE TEAM · AI-NATIVE COMPANY

Core Team · An Agent-Driven AI-Native Company

Full-stack coverage: LLMs, agents, motion control, robot hardware — 10 humans + N AI agents, per-capita output far above traditional teams.

03 · TEAM

FOUNDER & CEO

Flood Sung

宋鸿涌

▸Former Head of Post-Training / RL at Moonshot AI; deeply involved in the K-series LLMs

▸Hands-on with RLHF, long-chain reasoning, and Agentic Task training — Transformer-as-brain conviction

▸Creator of MetaBot — already battle-tested Agent-Native org paradigm

▸Full-stack experience across LLM · Agent · motion control · robot hardware

CORE TEAM · 4 LEADS

04

VP

YH · VP of Technology — former Head of Long-Context Post-Training at Moonshot AI; ByteDance Seed researcher

WBC

FHQ · Head of Humanoid Locomotion — Nanjing University PhD, first-author Nature Communications

NAV

WZC · Humanoid Navigation — Shanghai AI Lab postdoc, core author of InternVLA-N1

MANI

ZZA · Loco-Manipulation — Tsinghua MS, Renforce-Dynamics community lead

ORG MODEL · AGENT-NATIVE

∞

10 + N

HUMAN + AGENTS

Agent-Native organization powered by MetaBot — per-capita output equivalent to a ≈ 50-person team.

METABOT · OPEN-SOURCE AI AGENT INFRA

MetaBot · Agent-Native Org Infrastructure

An agent framework reaching from digital to physical — the gateway to Physical AGI.

04 · INFRA

github.com/xvirobotics/metabot

MODULE 01

MetaMemory

Persistent knowledge base; agents share memory docs and HTML — org knowledge accrues automatically.

MODULE 02

Skill Hub

Agents upload and share the skills they accumulate — experience becomes reusable across the team.

MODULE 03

Agent Bus

Agents interconnect across instances — task delegation, real-time messaging, and live collaboration.

MODULE 04

T5T · Top 5 Things

Project-management skill — every agent's project has a kanban; the committee sees everything at a glance.

WHY THIS IS THE MOAT

Why this is the moat

01

Cognitive moat — only an agent-native founder builds this. Putting "agent-native infrastructure" at the very top of the company's priority list is a cognition decision, not a technical one. Teams not living in this paradigm can't even see the path.

02

The path to a fully self-evolving multi-agent organization. The org accumulates memory / skills / goals / projects like a swarm — every accumulation is the launchpad for the next step. MetaMemory + Skill Hub + Agent Bus + T5T is its skeleton.

03

The org is the testbed — iteration speed is a generation ahead. Every day we use MetaBot to validate and accelerate ourselves — an order of magnitude faster than a normal company.

04

One framework extending into the physical world — the gateway to Physical AGI. Open-sourced, claiming the Agent-for-Robotics niche.

10 + N

OUTPUT
Per-capita output ≈ 50-person team

THREE-LAYER UNIFIED STACK

An Agent-Driven Three-Layer Unified Architecture

05 · ARCHITECTURE

L1

LAYER ONE

MetaBot · Agent Layer

ALWAYS-ON

Top-level orchestration bridging digital and physical worlds

01Top-level agent orchestration: task planning · multi-step reasoning · error recovery

02MetaMemory + Skill Hub + Agent Bus + T5T

03The hub connecting digital and physical worlds

04Human supervision + self-evolution loop

L2

LAYER TWO

VLM · Vision-Language Brain

5–10 Hz

Humanoid Foundation Model · perception & decision

01 video pretrain → post-training → RL Computer Use Agent analogy ↗

02Core components: DreamVPT + IDM

03In-Context RL · learn on the fly, adapt fast to new tasks

L3

LAYER THREE

WBC · Motion Cerebellum

50–500 Hz

Whole-Body Controller · execution layer

01Controls full body 29 DOF + dexterous hands 22×2

02Trained independently in RL sim (Isaac Gym)

03Vision-aware, adapts across terrains

STACK OVERVIEW

Digital Intelligence ⇋ Physical Intelligence

KEY INSIGHT

L2 and L3 connect via latent space — decision and control flow seamlessly.

KEYWORDS · CORE TECH

Core Technology Keywords

From the underlying method up to model-level capability, four keywords define XVI's brain.

06 · CORE TECH

KEYWORD 01

/method

DreamVPT

Synthetic video + visual pretraining. The brain learns physical intuition from massive "dreamed" videos.

Inverting the data pyramid ↗

KEYWORD 02

/architecture

Long Context
WholeBody VLA

Long context · vision-language-action unified — end-to-end policy covering full body and dexterous hands.

Why this architecture ↗

KEYWORD 03

/learning

In-Context RL

Learn on the fly, no retraining. Policy evolves inside the task context — GPT-style few-shot.

Why this is unavoidable in the endgame ↗

KEYWORD 04

/capability

Compositional
Generalization

VLM × WBC compositional generalization — WBC covers full-body motion, VLM perceives and understands the world, their product covers everything.

Why A × B = everything ↗

GENERAL FOUNDATION · UNIVERSAL HUMANOID FM

Same playbook as the LLMs · Benchmark-Driven

We're building a universal humanoid foundation model — not a vertical solution. Same scaling as LLMs, same benchmark grind.
Every public humanoid benchmark — indoor, outdoor, manipulation, navigation, single-step, long-horizon — we aim to top all of them. General capability proven by hard evidence.

07 · GENERAL

PUBLIC BENCHMARKS · FULL COVERAGE

DOMAIN 01

Indoor Manipulation

Home · office · lab — grasping, placement, tool use

DOMAIN 02

Outdoor Locomotion

Complex terrain · dynamic environments · long-range autonomous navigation

DOMAIN 03

Bimanual Coordination

Symmetric / asymmetric two-hand tasks · assembly · transport · tool handoff

DOMAIN 04

Long-Horizon Tasks

Multi-step planning · error recovery · tool-chain calls

DOMAIN 05

Human-Robot Collab

Natural language understanding · joint operation · intent inference

DOMAIN 06

Generalization

New objects · new scenes · zero-shot transfer

The general foundation is the chassis — each public benchmark is hard evidence that "we can ship anything." Not a claim — a leaderboard.

TASTE × MOAT · TARGETED BETS

Beyond general · betting on high-value physical-world scenarios

We bet on high-value physical-world scenarios — places humans can't go, won't go, or shouldn't go.
These three directions are not capability boundaries — they're resource focus. Exclusive data, exclusive scenarios, exclusive benchmarks: a moat nobody else can replicate.

08 · MARKET

PRIVATE BENCHMARKS · EXCLUSIVE SCENARIOS

MARKET 01

PRIVATE BENCHMARK

Humanoid Astronauts

Space-station inspection · lunar/Mars base construction · scientific payload deployment

COST ↓

1–2 orders

UPTIME

24 × 7

Why astronauts ↗

MARKET 02

PRIVATE BENCHMARK

Robot Hardware Engineers

Robots autonomously testing other robots — replacing human hardware engineers

ITERATION

24 × 7

COST ↓

Massive

Robots testing robots ↗

MARKET 03

PRIVATE BENCHMARK

Robot Lab Technicians

Replacing physics-experiment researchers — autonomously design and run experiments

THROUGHPUT

10×

SAFETY

HIGH

Why this is the biggest bet ↗

ANALOGY · THE TRAILBLAZER PATH

Claude is a general LLM · Anthropic bet on coding · topped SWE-bench · shipped Claude Code.

XVI is the universal humanoid foundation · betting on these three directions · each one a killer app of embodied AI.

General is the foundation · taste is the moat — both required, no conflict.

ROADMAP × BUSINESS MODEL

Model Leadership → Vertical Integration

Two-phase path — Phase 1 obsesses over the model layer to establish authority; Phase 2 launches in-house hardware toward mass-produced humanoids. Move fast first, go heavy later — never both at once.

09 · ROADMAP

PHASE 01 · MODEL-FIRST

2026 — 2027 H1 · win the model layer first

2026 · H1

Tech Validation

DreamVPT + IDM + WBC
~100h real-robot seed
core PoC running

2026 · H2

Open-Source Release

scale to 1000h data
model open-sourced · arXiv paper
DeepSeek-style playbook

2026 · Q4

Mars Demo

Ulanqab field site
In-Context RL closed loop
first public live demo

2027 · H1

Model SOTA

leading VLA benchmarks
10000h data
authority established

MODE · LASER FOCUS

10 people, all-in on the model

No hardware distraction · no proactive OEM partnerships · no commercial KPIs

OPEN SOURCE · COMMUNITY-DRIVEN

Models + papers fully open

DeepSeek playbook · community-first · let the SOTA model do the talking

AUTHORITY · MODEL LEADERSHIP

Top the benchmarks

Establish embodied VLA authority · build leverage for Phase 2 fundraise

PHASE 02 · VERTICAL INTEGRATION

From 2027 H2 · in-house hardware · mass-produced humanoids

2027 · H2

Launch In-House Hardware

funding closes → form humanoid team
supply-chain build-out
in-house roadmap locked

2028

GPT-4 Moment

embodied GPT-4 moment
full-body prototype v1
industry inflection reached

2029

Mass Production

XVI in-house humanoids ship
MARKET 01-03 first-party RaaS
data flywheel kicks in

2030

Scale Deployment

humanoids in everyday spaces
high-value first, household last
core position in the value chain

PRIMARY · MAIN BATTLE

XVI first-party humanoid RaaS

MARKET 01-03 delivered on our own humanoids · full-stack end-to-end service

DATA FLYWHEEL

In-house humanoid · 100% data ownership

Scarce high-value data flows back into the brain · model moat compounds

SECONDARY · SECOND CURVE

API licensing to OEMs

Model leadership spills over naturally · doesn't cannibalize first-party humanoid

Build model authority first · then vertically integrate — not Tesla doing both at once, not Mobileye staying out of hardware forever. We take the third path.

THE RAISE

Funding

A clear capital allocation to fuel the full arc — from PoC to open-source release to ecosystem build-out.

10 · FUNDING

ROUND · ANGEL++

Angel++ · Lead investor welcome

RAISE AMOUNT

$15–20M

USD 15M – 20M

RUNWAY

18 – 24 MONTHS

MILESTONE

2026 Q4 DEMO

ALLOCATION

100%

COMPUTE · 40%

DATA · 30%

TEAM · 20%

HW · 10%

40%

COMPUTE

Compute

GPU cluster leasing — fuels large-scale RL simulation and VLM training

30%

DATA

Data

Video capture · annotation · synthetic-data generation pipeline

20%

TEAM

Team

Core hires + long-term equity incentives (ESOP)

10%

HARDWARE

Robots

Procuring humanoid platforms and dexterous hands for real-robot validation

CONTACT

floodsung@xvirobotics.com · xvirobotics.com

→ LET'S BUILD THE UNIFIED BRAIN

Agent-Native Universal Humanoid Foundation Model

Embodied AI's core bottleneck—no general-purpose brain

Data has never really Scaled

Digital and physical R&D are siloed

Core Team · An Agent-Driven AI-Native Company

Flood Sung

MetaBot · Agent-Native Org Infrastructure

MetaMemory

Skill Hub

Agent Bus

T5T · Top 5 Things

Why this is the moat

An Agent-Driven Three-Layer Unified Architecture

MetaBot · Agent Layer

VLM · Vision-Language Brain

WBC · Motion Cerebellum

Core Technology Keywords

DreamVPT

Long ContextWholeBody VLA

In-Context RL

CompositionalGeneralization

Same playbook as the LLMs · Benchmark-Driven

Indoor Manipulation

Outdoor Locomotion

Bimanual Coordination

Long-Horizon Tasks

Human-Robot Collab

Generalization

Beyond general · betting on high-value physical-world scenarios

Humanoid Astronauts

Robot Hardware Engineers

Robot Lab Technicians

Model Leadership → Vertical Integration

Tech Validation

Open-Source Release

Mars Demo

Model SOTA

10 people, all-in on the model

Models + papers fully open

Top the benchmarks

Launch In-House Hardware

GPT-4 Moment

Mass Production

Scale Deployment

XVI first-party humanoid RaaS

In-house humanoid · 100% data ownership

API licensing to OEMs

Funding

Angel++ · Lead investor welcome

Compute

Data

Team

Robots

Enter invite code

Agent-Native
Universal Humanoid Foundation Model

Long Context
WholeBody VLA

Compositional
Generalization