Controllers: Theory & Variants

Introduction to Path Integral Control

Model Predictive Path Integral (MPPI) control is a sampling-based algorithm rooted in the duality between Stochastic Optimal Control and Statistical Mechanics. Unlike gradient-based methods (e.g., iLQR), MPPI does not require differentiability of the dynamics or cost function, making it suitable for complex, discontinuous environments.

The information-theoretic basis

Consider the controlled stochastic system: \[ dx = f(x,t)\,dt + G(x,t)\,u\,dt + G(x,t)\,\Sigma^{1/2}\,d\omega \] where \(d\omega\) is Brownian motion. The objective is to minimise: \[ J(u) = \mathbb{E}\!\left[\phi(x_T) + \int_{t_0}^{T}\!\left(q(x,t) + \tfrac{1}{2}u^\top R\,u\right)dt\right] \]

Via the Feynman–Kac formula this reduces to minimising the KL-divergence between the controlled path distribution and the optimal one. The resulting importance-sampling update is: \[ u^*(t) = u_{\mathrm{nom}}(t) + \frac{\sum_{k=1}^K w_k\,\epsilon_k(t)}{\sum_{k=1}^K w_k} \] with Boltzmann weights: \[ w_k = \exp\!\left(-\tfrac{1}{\lambda}\bigl(S(\tau_k) - \rho\bigr)\right) \]

Symbol Meaning
\(K\) Number of parallel rollouts (num_samples)
\(T\) Horizon length (horizon)
\(\lambda\) Temperature (lambda_)
\(S(\tau_k)\) Total cost of rollout \(k\)
\(\epsilon_k\) Sampled noise perturbation

1. Standard MPPI

Header: include/mppi/controllers/mppi.cuh Python class: QuadrotorMPPI

The standard formulation samples i.i.d. Gaussian noise in control space: \[ \epsilon_k(t) \sim \mathcal{N}(0,\,\Sigma) \]

Algorithm

  1. Sample \(K\) noise sequences \(\epsilon_k \sim \mathcal{N}(0, \Sigma)\).
  2. Rollout \(x_{t+1} = f(x_t,\,u_{\mathrm{nom},t} + \epsilon_{k,t})\).
  3. Evaluate cost \(J_k = \sum_t c(x_t, u_t) + \phi(x_T)\).
  4. Reweight \(w_k \propto \exp(-J_k / \lambda)\), computed on-GPU via CUB.
  5. Update \(u_{\mathrm{nom}} \leftarrow u_{\mathrm{nom}} + \sum_k w_k\,\epsilon_k\).
  6. Shift nominal sequence one step forward.

Characteristics

  • Cheapest to evaluate; maximum high-frequency exploration.
  • Trajectories can be discontinuous if \(\lambda\) is small.

2. Smooth MPPI (S-MPPI)

Header: include/mppi/controllers/smppi.cuh Python class: QuadrotorSMPPI

Status: Python bindings are available. The exploration mechanism has a known scaling issue — integrate_actions_kernel multiplies noise by \(\Delta t\), giving effective variance \(\sigma^2 \Delta t^2 \approx 0\) for small timesteps. Tracking diverges until this is fixed. See issue #30.

S-MPPI lifts optimisation into the velocity (rate-of-change) of the control signal, enforcing \(C^1\) continuity.

Theory

Define an auxiliary control-velocity \(v\) such that \(\dot{u} = v\). In discrete time: \[ u_{t+1} = u_t + v_t\,\Delta t \]

Noise is sampled in velocity space: \[ \delta v_k \sim \mathcal{N}(0,\,\Sigma_v) \]

and integrated to produce smooth perturbations: \[ \epsilon_{k,t} = \sum_{i=0}^{t} \delta v_{k,i}\,\Delta t \]

An additional smoothness penalty can be added to the cost: \[ J_{\mathrm{smooth}} = J_{\mathrm{base}} + \gamma \sum_{t=0}^{T-1}\|u_{t+1} - u_t\|^2 \]

Characteristics

  • \(C^1\)-continuous controls; reduced actuator wear.
  • Slower reaction to sudden obstacles due to control inertia.
  • Noise attenuation bug pending fix (issue #30).

3. Kernel MPPI (K-MPPI)

Header: include/mppi/controllers/kmppi.cuh Python class: QuadrotorKMPPI

K-MPPI samples colored noise from a Gaussian Process defined by an RBF kernel, reducing the effective search dimensionality from \(T \cdot n_u\) to \(M \cdot n_u\) where \(M \ll T\).

Theory

Perturbations \(\epsilon(t)\) lie in an RKHS defined by: \[ k(t_i, t_j) = \sigma_f^2 \exp\!\left(-\frac{(t_i - t_j)^2}{2\ell^2}\right) \]

Rather than sampling the full GP, we use \(M\) support points (knots):

  1. Sample latent noise \(\theta \in \mathbb{R}^{M \times n_u} \sim \mathcal{N}(0, I)\).
  2. Compute the interpolation matrix \(W = K_{TM}\,K_{MM}^{-1}\) (done once at init).
  3. Expand: \(\epsilon = W\theta \in \mathbb{R}^{T \times n_u}\).

The weight update applies directly to \(\theta\), keeping optimisation in the low-dimensional latent space.

Characteristics

  • Guaranteed smooth trajectories — bandwidth \(\ell\) is derived automatically from support-point spacing.
  • ~6× faster per step than standard MPPI for the same \(K\) (fewer noise DOFs).
  • Slightly higher RMSE than MPPI on lemniscate tracking (0.79 m vs 0.60 m, 30 s trial).
  • Only user-tunable knob: num_support_pts \(M\) (more points → finer-grained control, slower).

4. B-Spline MPPI

Header: include/mppi/controllers/bspline_mppi.cuh

Parameterises the control trajectory as a cubic B-spline, guaranteeing \(C^2\) continuity (continuous acceleration). Noise is sampled at the B-spline control points.

  • Smoother than K-MPPI (one higher continuity order).
  • Python bindings not yet exposed.

5. Informative MPPI (I-MPPI)

Header: include/mppi/controllers/i_mppi.cuh Python classes: QuadrotorIMPPI, DI3IMPPI

See the dedicated I-MPPI page for theory and usage.


Summary comparison

Feature MPPI S-MPPI K-MPPI B-Spline MPPI I-MPPI
Control variable \(u\) \(\dot{u}\) (vel.) \(\theta\) (latent) B-spline cpts \(u\) + info cost
Noise type White Brownian GP/RBF B-spline GP White
Smoothness Low \(C^1\) \(C^\infty\) (RBF) \(C^2\) Low
Compute/step baseline ~baseline ~6× faster ~baseline
Python exposed ✓ (#30)
Quadrotor RMSE (30 s) 0.60 m 0.79 m