Willow 5: The Spatial Intelligence Fabric

NVIDIA Jetson ARM Silicon MS Azure NVIDIA Isaac Quest 3 / Vision Pro

Willow 5 is the first "Plug-and-Play" architecture designed for the modern hardware stack. We solve the transition from 3D visual perception to deterministic machine reasoning across the world's most powerful edge and robotics platforms.

1. The Engineering Workflow

The Willow 5 platform provides an end-to-end cloud oracle for developing and deploying motion intelligence through three primary interfaces:

The Model Creator (Enrollment)

Traditional AI requires massive datasets. Willow 5 utilizes Zero-Shot Enrollment. By ingesting exactly one reference - whether an RGB video or a synthetic 3D animation - the Creator distills the "Kinetic DNA" into a topological signature. Through Confidence-Weighted Consensus Fusion, multiple seed videos (from different camera angles) can be triangulated and averaged into a single, flawless "golden" 3D signature.

The Vision Lab (Action Detection Engine)

A high-performance sandbox where engineers verify model resilience before deployment. It provides real-time visualization of bounding boxes, confidence scores, and overlapping triggers. Engineers can simulate edge environments to ensure models trigger with absolute precision.

The End-to-End Developer API

The Cloud Oracle is a fully exposed, API-first architecture. B2B integrators and XR developers are not limited to just model retrieval—they have programmatic access to the entire platform. Using our RESTful endpoints, engineers can automate Zero-Shot model creation, run massive batch-scan video testing, and provision edge devices dynamically. This turns Willow 5 into a complete spatial intelligence backend for any custom application. Read the full API docs and the Developer SDKs for more information.

2. Mathematical Engine: Dual Spatial Modes & The 76-Point RDM

Willow 5 abandons raw coordinate tracking (X, Y, Z), which is highly susceptible to perspective distortion and camera distance. Instead, we use Relational Distance Matrix (RDM) mathematics based on a 76-point topological graph.

The 76-Point Topology

During extraction, the engine targets a maximum of 76 spatial keypoints per frame:

  • Head: 11 points (eyes, ears, nose, mouth)
  • Torso: 4 points (shoulders, hips)
  • Arms & Legs: 8 points (elbows, wrists, knees, ankles)
  • Feet: 4 points (heels, toes)
  • Hands: 42 points (21 joints per hand for extreme micro-dexterity)
  • Tracked Object (Node 76): 1 point (The physical object interacting with the human, mapped into the same 3D metric space).

Bifurcated Spatial Engines

Rather than tracking where a joint is in space, the engine calculates the distance between every single active joint and every other active joint using the formula N * (N - 1) / 2. Depending on the engineering use-case, this math is routed through one of two spatial engines:

Universal Models (Scale-Invariant)

Built for structural form and sequence recognition. The RDM distances are normalized against a skeletal scale (e.g., Torso Length). A pro-athlete’s signature becomes mathematically identical to a child’s, allowing universal action recognition across diverse populations.

Physics Models (Metric Space)

Built for exact metrology and robotics. Normalization is bypassed. The RDM calculates absolute physical distances in Meters. This mode utilizes Node 76 to evaluate human-object interactions (e.g., confirming a worker lifted a box exactly 0.5 meters high, or an athlete's bat speed reached exactly 35 m/s).

Continuous Subsequence DTW

Local edge SDKs run Continuous Subsequence Dynamic Time Warping (DTW). The runtime compares the live camera feed against the downloaded RDM matrix frame-by-frame. In Universal Mode, this evaluates the "Form" of the action. In Physics Mode, the DTW threshold acts as a strict Spatial Tolerance Boundary in absolute meters.

3. Deterministic Control: Bitmasks & Binary Headers

Hyperparameters in Willow 5 are not left up to the client application. They are securely baked directly into the model’s binary header. This guarantees that the physics engine behaves identically across millions of global edge devices.

The Zone Bitmask (Edge Compute Optimization)

To save critical CPU cycles on edge devices, models define exactly which tracking sensors to activate using a 32-bit Zone Bitmask. These are powers of 2 acting as binary switches, not point counts:

  • Head = 1 | Torso = 2 | Arms = 4 | Hands = 8 | Legs = 16 | Feet = 32 | Object = 64

Example: If a model only tracks the Torso and Arms, the bitmask is 6 (2 + 4). The Edge SDK reads this 6 and instantly disables hand and face tracking on the hardware camera, drastically reducing latency and battery drain.

The Versioned Little-Endian Headers

Every optimized edge file begins with a strict C-struct header (<IIffff or <IIfffff) depending on the spatial engine used:

// V40 Universal Header (24-Bytes)
UInt32 Version; // 40 (Scale-Invariant)
UInt32 Zone_Bitmask; // e.g., 6 (Torso + Arms)
Float32 Quantization_Scale;// Compression multiplier
Float32 Overlap_Tolerance; // NMS suppression limit (e.g., 0.25)
Float32 DTW_Sensitivity; // Topological slack / threshold
Float32 Tempo_Variance; // Speed gating limit (e.g., 0.20)
// V41 Physics Header (28-Bytes)
UInt32 Version; // 41 (Metric Space)
UInt32 Zone_Bitmask; // e.g., 70 (Torso + Arms + Object)
Float32 Calibration_Method;// 0=Manual, 1=YOLO, 2=Synthetic
Float32 Quantization_Scale;// Compression multiplier
Float32 Overlap_Tolerance; // NMS suppression limit
Float32 Spatial_Tolerance; // Boundary limit in Meters (e.g., 0.15m)
Float32 Tempo_Variance; // Absolute Velocity Gating (+/- %)

4. The JIT Export Gateway: Formats & Quantization

A core architectural advantage of Willow 5 is the Just-In-Time (JIT) Export Gateway. The cloud does not store edge-formatted files at rest. It securely stores a .bin file—the uncompressed, master float32 RDM matrix. When a developer queries the API, the Gateway dynamically compiles the requested archetype in milliseconds.

.int8 (Willow Native & Edge Optimized)

Our ultra-compressed format for C++ and Python edge devices. The Gateway calculates the absolute max value of the float32 master, generates a scale, and quantizes the matrix into 8-bit integers. This shrinks the model size by 75%, allowing for lightning-fast execution in RAM. It prepends the appropriate V40 or V41 binary header and delivers the payload.

.ONNX (Universal AI Graph)

The open standard for machine learning interoperability. The float matrix is packaged as a Constant tensor node. Calibration constants (DTW, Overlap, Tempo, Metric Modes) are embedded into the ONNX metadata_props, allowing Python developers to dynamically configure thresholds in ML pipelines like TensorRT or CoreML.

.h (Bare Metal C-Array) & JSON

For microcontrollers without an OS, the Gateway translates the float32 data into a static C-array (.h) with `#define` macros for the thresholds. For web applications and data scientists, it exports human-readable JSON or CSV formats containing the raw math.

5. Universal Hardware Compatibility

Willow 5 is a "Plug-and-Play" solution designed to bridge the gap between cloud intelligence and local execution. Our SDKs are optimized for:

NVIDIA Jetson (Edge)
NVIDIA Isaac Sim
Microsoft Azure
ARM Cortex Chips
Meta Quest 3
Apple Vision Pro

By decoupling the vision sensor from the kinetic reasoning, the same `.int8` model can be deployed on a high-performance Azure cloud cluster for large-scale video processing or pushed to an NVIDIA Jetson module on a mobile robot for zero-latency local execution.

6. Model Sourcing: Metric Calibration & Synthetic Data

Resilience at the edge depends on the quality of the "Ground Truth." Willow 5 supports diverse ingestion strategies to guarantee absolute mathematical precision:

Dynamic YOLOv26 Tracking

The user selects a standardized reference object (e.g., a Size 7 Basketball). The Cloud Oracle utilizes a 6-Dimensional Kalman Filter to track the object through motion blur, calculating a highly accurate pixels-to-meters scalar to establish a unified 3D metric universe.

Manual Pinhole Triangulation

For environments without tracked objects, the Oracle accepts specific Camera Height, Camera Distance, and Subject Height extrinsics. It performs orthographic skew correction to construct absolute Z-axis depth from raw 2D video.

Synthetic Pre-Scaled Import

Created from FBX, JSON, or NPY data exported from simulators like NVIDIA Isaac or Blender. Computer vision algorithms are bypassed entirely. These models are mathematically perfect and serve as the "Source Code" for humanoid robotics where an absolute mechanical standard is required.

Ready to deploy the Willow 5 Standard?

Contact our Engineering Team