Announcing Willow 5.3: The Transition from Motion Recognition to Spatial Metrology
Product Update • April 7, 2026
Today, we are releasing a major expansion to the Willow Dynamics Cloud Oracle and our Edge SDKs. For many applications, identifying what a human is doing - recognizing intent, form, and sequence - provides the exact spatial intelligence required. However, as our partners in robotics, occupational ergonomics, and elite sports science push into new frontiers, they require an additional layer of insight: the ability to measure exactly how an action is performed in absolute physical space.
To unlock this advanced functionality, we have expanded our core mathematical engine. We are introducing Physics Models (Metric Space) to operate alongside our flagship Universal Models (Scale-Invariant).
This update preserves the frictionless, scale-invariant action recognition our platform was built on, while providing a powerful new toolset that bridges the gap between computer vision and deterministic physical metrology.
The Core Shift: Two Spatial Engines
To understand this update, it helps to look at how the Willow engine mathematically interprets human movement. We use a Relational Distance Matrix (RDM), which calculates the distance between every active joint in the body frame-by-frame. How we process those distances defines the model.
1. Universal Models (Scale-Invariant)
This is the system our users are already familiar with. Universal models are designed to recognize Intent, Action, and Form.
In this mode, the engine dynamically calculates the length of the subject's torso and mathematically divides every other movement by that length. This normalizes the data. If you train a Universal Model to recognize a jumping jack, the engine only cares about the geometric proportions of the movement. A 5-foot child and a 6-foot-5 adult performing a jumping jack with identical form will yield a 100% mathematical match.
2. Physics Models (Metric Space)
Physics Models disable skeletal normalization entirely. Instead of analyzing proportions, the RDM calculates absolute real-world distances in meters.
In this mode, the model evaluates strict spatial constraints and physical velocities. If you train a Physics Model on a jumping jack, it is no longer looking for the idea of a jumping jack. It is verifying that a specific user’s hands reached an exact apex of 2.1 meters, and that their hands accelerated at exactly 4.5 m/s².
When deploying a Physics Model to an edge device, our Dynamic Time Warping (DTW) and Tempo settings stop acting as "form evaluators" and become strict Spatiotemporal Physics Gates.
Setting a DTW threshold to
0.15 instructs the SDK to reject any action that deviates from the golden path by more than 15 centimeters. Setting the Tempo variance to ±10% enforces a strict absolute velocity limit, allowing you to instantly flag an ergonomic hazard if a warehouse worker lifts a 50kg box too rapidly.Deep Dive: Node 76 and Human-Object Interaction
Historically, pose estimation algorithms suffer from a critical blind spot: they track the human, but ignore the physical environment. A worker lifting a heavy chassis and a worker pantomiming a lift look identical to a standard neural network.
Physics Mode introduces Node 76 (Index 75) to the Willow topological graph.
By defining a Dynamic Reference Object during model creation (e.g., a standardized tool or a sports ball), the Cloud Oracle boots a tracker alongside the pose estimator. We apply a 6-Dimensional Filter (tracking position, width, and velocity) to track the object through severe motion blur. The object's centroid is translated into absolute meters and mapped into the exact same 3D spatial universe as the human's hips.
The resulting .int8 signature mathematically fuses the human and the object. Your edge SDK can now prove, with mathematical certainty, the exact distance between the human's hands and the tool they are operating.
Metric Calibration & Cross-Validation
Deriving absolute 3D metric depth from 2D monocular video requires rigorous calibration. The Oracle now supports three ingestion strategies:
- Dynamic Tracking: The 6D Kalman Filter measures the pixel-width of a known object in the frame to derive a dynamic pixels-to-meters scalar.
- Manual Triangulation (Pinhole): For videos without trackable objects, engineers can explicitly define the Camera Height, Camera Distance, and Subject Height. The engine performs orthographic skew correction to triangulate depth.
-
Synthetic Import: Data scientists can upload pre-scaled
.fbx,.json, or.npyfiles from simulators like NVIDIA Isaac, completely bypassing computer vision extraction for mathematically perfect ground truth.
To ensure transparency, the interface now performs 3D Anatomical Unrolling. The engine calculates the 3D lengths of the subject's bones, unrolls them to ignore crouching or 2D foreshortening, and displays the subject's calculated physical height in the UI. If the AI calculates that a 6-foot subject is 6 feet tall, engineers possess mathematical proof that the metric scaling is accurate before exporting the model.
Research Mode (DSP Bypass)
Commercial computer vision relies heavily on Digital Signal Processing (DSP). Filters like Savitzky-Golay and Momentum Kalman tracking are used to smooth out camera jitter and interpolate missing frames, resulting in beautiful, fluid visual outputs.
However, for deep tech applications, this smoothing is destructive data tampering. Low-pass filters erase high-frequency micro-vibrations and peak impulse forces.
For our academic and robotics partners, we have introduced Research Mode. Enabling this toggle permanently disables the filters, smoothing, and interpolation. The resulting model exports contain the raw, unadulterated ground-truth reality of the camera sensor, allowing researchers to apply their own custom signal processing downstream.
Upgrading Your Edge SDKs (V5.2.8)
To support this architecture, the C++ and Python Edge SDKs have been updated to V5.2.8.
Models exported in Physics Mode are now compiled with a new 28-Byte V41 Binary Header. This expanded Little-Endian struct securely transmits the Calibration Method, Quantization Scale, Spatial Tolerance, and Zone Bitmask directly to the edge. The SDKs have been fortified with strict memory safeguards to safely route the 76-point topologies without array out-of-bounds errors, guaranteeing deterministic execution in under 1 millisecond on ARM silicon.
The V5.2.8 SDKs are fully backwards compatible with your existing V40 Universal models. To upgrade your fleet, pull the latest releases from our GitHub repositories, or review the updated API Documentation.