Controllers: Theory & Variants

Introduction to Path Integral Control

Model Predictive Path Integral (MPPI) control is a sampling-based algorithm rooted in the duality between Stochastic Optimal Control and Statistical Mechanics. Unlike gradient-based methods (e.g., iLQR), MPPI does not require differentiability of the dynamics or cost function, making it suitable for complex, discontinuous environments.

The information-theoretic basis

Consider the controlled stochastic system: \[ dx = f(x,t)\,dt + G(x,t)\,u\,dt + G(x,t)\,\Sigma^{1/2}\,d\omega \] where \(d\omega\) is Brownian motion. The objective is to minimise: \[ J(u) = \mathbb{E}\!\left[\phi(x_T) + \int_{t_0}^{T}\!\left(q(x,t) + \tfrac{1}{2}u^\top R\,u\right)dt\right] \]

Via the Feynman–Kac formula this reduces to minimising the KL-divergence between the controlled path distribution and the optimal one. The resulting importance-sampling update is: \[ u^*(t) = u_{\mathrm{nom}}(t) + \frac{\sum_{k=1}^K w_k\,\epsilon_k(t)}{\sum_{k=1}^K w_k} \] with Boltzmann weights: \[ w_k = \exp\!\left(-\tfrac{1}{\lambda}\bigl(S(\tau_k) - \rho\bigr)\right) \]

Symbol	Meaning
\(K\)	Number of parallel rollouts (`num_samples`)
\(T\)	Horizon length (`horizon`)
\(\lambda\)	Temperature (`lambda_`)
\(S(\tau_k)\)	Total cost of rollout \(k\)
\(\epsilon_k\)	Sampled noise perturbation

1. Standard MPPI

Header: include/mppi/controllers/mppi.cuh Python class: QuadrotorMPPI

The standard formulation samples i.i.d. Gaussian noise in control space: \[ \epsilon_k(t) \sim \mathcal{N}(0,\,\Sigma) \]

Algorithm

Sample \(K\) noise sequences \(\epsilon_k \sim \mathcal{N}(0, \Sigma)\).
Rollout \(x_{t+1} = f(x_t,\,u_{\mathrm{nom},t} + \epsilon_{k,t})\).
Evaluate cost \(J_k = \sum_t c(x_t, u_t) + \phi(x_T)\).
Reweight \(w_k \propto \exp(-J_k / \lambda)\), computed on-GPU via CUB.
Update \(u_{\mathrm{nom}} \leftarrow u_{\mathrm{nom}} + \sum_k w_k\,\epsilon_k\).
Shift nominal sequence one step forward.

Characteristics

Cheapest to evaluate; maximum high-frequency exploration.
Trajectories can be discontinuous if \(\lambda\) is small.

2. Smooth MPPI (S-MPPI)

Header: include/mppi/controllers/smppi.cuh Python class: QuadrotorSMPPI

Status: Python bindings are available. The exploration mechanism has a known scaling issue — integrate_actions_kernel multiplies noise by \(\Delta t\), giving effective variance \(\sigma^2 \Delta t^2 \approx 0\) for small timesteps. Tracking diverges until this is fixed. See issue #30.

S-MPPI lifts optimisation into the velocity (rate-of-change) of the control signal, enforcing \(C^1\) continuity.

Theory

Define an auxiliary control-velocity \(v\) such that \(\dot{u} = v\). In discrete time: \[ u_{t+1} = u_t + v_t\,\Delta t \]

Noise is sampled in velocity space: \[ \delta v_k \sim \mathcal{N}(0,\,\Sigma_v) \]

and integrated to produce smooth perturbations: \[ \epsilon_{k,t} = \sum_{i=0}^{t} \delta v_{k,i}\,\Delta t \]

An additional smoothness penalty can be added to the cost: \[ J_{\mathrm{smooth}} = J_{\mathrm{base}} + \gamma \sum_{t=0}^{T-1}\|u_{t+1} - u_t\|^2 \]

Characteristics

\(C^1\)-continuous controls; reduced actuator wear.
Slower reaction to sudden obstacles due to control inertia.
Noise attenuation bug pending fix (issue #30).

3. Kernel MPPI (K-MPPI)

Header: include/mppi/controllers/kmppi.cuh Python class: QuadrotorKMPPI

K-MPPI samples colored noise from a Gaussian Process defined by an RBF kernel, reducing the effective search dimensionality from \(T \cdot n_u\) to \(M \cdot n_u\) where \(M \ll T\).

Theory

Perturbations \(\epsilon(t)\) lie in an RKHS defined by: \[ k(t_i, t_j) = \sigma_f^2 \exp\!\left(-\frac{(t_i - t_j)^2}{2\ell^2}\right) \]

Rather than sampling the full GP, we use \(M\) support points (knots):

Sample latent noise \(\theta \in \mathbb{R}^{M \times n_u} \sim \mathcal{N}(0, I)\).
Compute the interpolation matrix \(W = K_{TM}\,K_{MM}^{-1}\) (done once at init).
Expand: \(\epsilon = W\theta \in \mathbb{R}^{T \times n_u}\).

The weight update applies directly to \(\theta\), keeping optimisation in the low-dimensional latent space.

Characteristics

Guaranteed smooth trajectories — bandwidth \(\ell\) is derived automatically from support-point spacing.
~6× faster per step than standard MPPI for the same \(K\) (fewer noise DOFs).
Slightly higher RMSE than MPPI on lemniscate tracking (0.79 m vs 0.60 m, 30 s trial).
Only user-tunable knob: num_support_pts \(M\) (more points → finer-grained control, slower).

4. B-Spline MPPI

Header: include/mppi/controllers/bspline_mppi.cuh

Parameterises the control trajectory as a cubic B-spline, guaranteeing \(C^2\) continuity (continuous acceleration). Noise is sampled at the B-spline control points.

Smoother than K-MPPI (one higher continuity order).
Python bindings not yet exposed.

5. Informative MPPI (I-MPPI)

Header: include/mppi/controllers/i_mppi.cuh Python classes: QuadrotorIMPPI, DI3IMPPI

See the dedicated I-MPPI page for theory and usage.

Summary comparison

Feature	MPPI	S-MPPI	K-MPPI	B-Spline MPPI	I-MPPI
Control variable	\(u\)	\(\dot{u}\) (vel.)	\(\theta\) (latent)	B-spline cpts	\(u\) + info cost
Noise type	White	Brownian	GP/RBF	B-spline GP	White
Smoothness	Low	\(C^1\)	\(C^\infty\) (RBF)	\(C^2\)	Low
Compute/step	baseline	~baseline	~6× faster	—	~baseline
Python exposed	✓	✓ (#30)	✓	—	✓
Quadrotor RMSE (30 s)	0.60 m	—	0.79 m	—	—