Analytic parameter derivatives#
Gradient-based optimisers (Levenberg-Marquardt, L-BFGS) and HMC
samplers need the gradient of the model w.r.t. the parameters at every
iteration. Finite-differencing the orbit works but is expensive and
loses accuracy; automatic differentiation through JIT-compiled numba
code is not supported in general. MeepMeep takes a third route:
hand-derived analytic gradients shipped as sibling routines next to
each evaluator. Each _d call costs only a few times what the
value-only call costs, the gradient is exact up to floating-point
error, and the result drops straight into a fitter or sampler.
In concrete terms, every quantity that the Taylor backend evaluates is
also exposed in a _d-suffixed variant that returns the value
alongside its analytic partial derivatives w.r.t. the seven orbital
parameters
where \(t_c\) is the transit-centre time (time of inferior
conjunction). The leading axis of every dc tensor follows this
ordering. Note the sign: the orbit position depends on the elapsed
time \(t_\mathrm{obs} - t_c\), so the \(t_c\) partial is the
negative of the partial w.r.t. the expansion point/expansion-time argument te
passed to the solver, \(\partial / \partial t_c = -\,\partial / \partial t_k\). This page documents how those derivatives are
computed, the explicit formulas at each stage, and the practical
regime in which they are accurate — useful when you are verifying the
math, extending the backend with a new observable, or debugging a
gradient mismatch.
The two-layer chain#
The gradient computation splits into two layers. The boundary between them is exactly where the analytic difficulty sits: everything that depends on Kepler’s equation lives in Layer A and is computed once per expansion point; the per-evaluator math in Layer B is just polynomial manipulation and one-line chain rules.
Layer A — derivative coefficients. The solvers
solve2d_d()andsolve3d_d()produce the Taylor coefficient matrixcof shape(D, 5)and a parameter-derivative tensordcof shape(7, D, 5). The elementdc[k, d, n]is \(\partial c[d, n] / \partial \theta_k\). All non-trivial calculus lives here: Kepler’s equation, the orbital-plane state, the rotation into the sky frame.Layer B — evaluator propagation. Every
_devaluator (positions, distances, velocities, RVs, phase-curve outputs) takescanddcand reduces them to the final quantity together with its 7-vector gradient (the seventh entry being the longitude of the ascending node). The reductions are either trivial polynomial evaluations or simple chain-rule applications.
The two layers are documented separately below.
Layer A: derivative coefficients#
The walk-through below mirrors
solve2d_d() step by
step; the 3D solver
solve3d_d() follows the
same structure with one extra row in the final rotation matrix.
The parameter indexing throughout is
k |
Parameter |
Comment |
|---|---|---|
0 |
\(t_c\) |
Transit-centre time [days], inferior conjunction |
1 |
\(p\) |
Orbital period |
2 |
\(a\) |
Scaled semi-major axis |
3 |
\(i\) |
Inclination |
4 |
\(e\) |
Eccentricity |
5 |
\(w\) |
Argument of periastron |
6 |
\(\Omega\) |
Longitude of the ascending node |
The first six parameters drive Kepler’s equation and the orbital-plane state; their analytic partials are built with the length-6 working arrays described below. The seventh, \(\Omega\), is a constant rotation of the sky-plane \((x, y)\) about the line of sight and does not enter the Kepler solve. It is applied as a post-processing rotation of the assembled coefficients: the six Kepler-parameter rows are rotated by \(R(\Omega)\), and the new \(\Omega\) row is \(R'(\Omega)\) applied to the unrotated position coefficients (the line-of-sight \(z\) row of the \(\Omega\) derivative is zero).
Step 1 — auxiliary partials#
Several auxiliary quantities depend on only one parameter each, so their gradient vectors are sparse:
These feed every later step. The code stores each as a length-6 array with one non-zero entry; the surrounding loops therefore mostly carry zeros until eccentricity and orientation enter the chain.
Step 2 — mean anomaly at transit#
The mean anomaly at the moment of inferior conjunction,
\(M_\text{tr}(e, w)\), is non-trivial in \(e\) and \(w\).
The helper
mean_anomaly_at_transit_with_derivatives()
returns the value and the two partials in closed form (derived by
implicit differentiation of an arctangent expression for the
eccentric anomaly at transit). Schematically,
with \(\partial M_\text{tr} / \partial e\) and
\(\partial M_\text{tr} / \partial w\) formed by differentiating
this composite. The solver stores the result in doffset[4] and
doffset[5].
Step 3 — mean anomaly and its gradient#
The mean anomaly at the expansion-point time \(t_k\) (the solver’s te
argument, measured relative to the transit centre, with
\(t_k = t_\mathrm{obs} - t_c\) at evaluation) is
so, differentiating w.r.t. the parameter vector (slot 0 is the transit-centre time \(t_c\), with \(\partial M/\partial t_c = -\,\partial M/\partial t_k\)),
with the other two entries (a, i) zero.
Step 4 — eccentric anomaly via implicit differentiation#
Kepler’s equation
is solved numerically by
ea_from_ma() (Newton-
Raphson). Once \(E\) is known, differentiating both sides with
respect to a generic parameter \(\theta_k\) gives
so
Here \(\delta_{k=4}\) is the Kronecker delta selecting the eccentricity slot. The \((1 - e \cos E)^{-1}\) factor is the same Jacobian that appears in the Newton-Raphson iteration of the forward solve, so it is already at hand.
The partials of \(\sin E\) and \(\cos E\) then follow trivially by the chain rule.
Step 5 — orbital-plane position and velocity#
In the orbital plane, with \(\xi\) along the line to periastron and \(\eta\) perpendicular to it,
Each of these is a product/quotient of factors whose partials are
either already in scope (steps 1, 3, 4) or zero. The solver expands
each derivative as a sum of product-rule terms and stores the result
in dr[k], dxi[k], deta[k], dv_xi[k], dv_eta[k].
Step 6 — higher-order Taylor terms#
The 5th-order Taylor expansion needs position, velocity, acceleration, jerk, and snap at the expansion point. Rather than differentiating \(r(t)\) and the angles directly, the solver expresses each higher derivative in terms of \((\xi, \eta)\), \((v_\xi, v_\eta)\), and the Newtonian gravitational acceleration.
Define the auxiliary radial scalar
so the acceleration in the orbital plane is \((a_\xi, a_\eta) = u\,(\xi, \eta)\). Two further radial scalars are needed:
with \(v^2 = v_\xi^2 + v_\eta^2\). From these, the jerk and snap in the orbital plane are
Each of \(u, \dot u, \ddot u\) and the resulting jerk and snap vectors is differentiated by the product rule. The recurring building blocks are \(\partial r^{-n} / \partial \theta_k = -n\, r^{-n-1}\, \partial r / \partial \theta_k\), which the solver computes once per inverse power and reuses.
Step 7 — rotation into the sky frame#
The 2D sky-plane projection is a constant rotation depending on \((i, w)\):
The 3D solver adds a third row, \((\sin w\, \sin i,\; \cos w\, \sin i)\), yielding the line-of-sight component. The non-zero entries of the \(\partial m_{rc} / \partial \theta_k\) arrays are immediate from step 1.
For each Taylor order \(n \in \{0, 1, 2, 3, 4\}\) and each spatial dimension \(d\), the stored coefficient is
where \(q^{(n)}\) is the \(n\)-th orbital-plane vector (\((\xi, \eta)\) for \(n=0\), \((v_\xi, v_\eta)\) for \(n=1\), acceleration for \(n=2\), and so on). The factorial pre-scaling means the coefficient is the actual Taylor coefficient, not the raw derivative; consequently the evaluator does no factorial divisions.
The derivative coefficient follows from the product rule on the rotation:
Only \(R\) depends on \((i, w)\); only \(q^{(n)}\) carries
the dependence on \((t_c, p, a, e)\). The output dcf is the
tensor whose entries are exactly the right-hand side of this boxed
identity.
Layer B: evaluator propagation#
Once c and dc are in hand, every quantity the backend can
evaluate is a closed-form function of them and the centered time
t. The propagation rules below are all that the _d evaluators
contain.
Position#
The Horner polynomial in c already gives the position. The
gradient is the corresponding Horner polynomial in dc[k, d, :]
for each parameter \(k\):
Implemented in
pos_cd(),
pos_cd(),
and their direct counterparts
pos_d() and
pos_d() (which epoch-fold
t first; the discrete epoch shift contributes no derivative).
Projected distance#
For \(d = \sqrt{p_x^2 + p_y^2}\), differentiating \(d^2\) gives
The same reduction is applied in 2D and 3D
(sep_cd(),
sep_cd()); both
treat \(d\) as the projected distance. The expression is
regular for \(d > 0\) and ill-defined at exactly zero projected
separation; transit modelling stays well clear of this geometric
singularity.
Z-coordinate#
The line-of-sight coordinate \(z\) is just the third row of the
position polynomial, so its gradient is the polynomial in
dc[k, 2, :]. See
zpos_cd().
Line-of-sight velocity#
The velocity polynomial is the term-by-term derivative of the position polynomial, with the factorial pre-scaling exactly cancelling the \(n\) in front of \(t^{n-1}\):
The gradient is the same polynomial pattern on dc. See
zvel_cd().
Radial velocity#
The radial velocity carries an additional parameter dependence through the conversion between the internal velocity (in \(R_\star / \text{day}\)) and the observed RV in \(\text{m s}^{-1}\):
Pulling the scalar \(s = K / n_z\) out front,
with closed-form non-zero entries
The \((t_c, w)\) partials of \(s\) vanish. Implemented in
rv_cd() and
rv_d().
Phase angle and 3D separation#
The whole-orbit dispatchers in
orbit3dd compose further chain
rules on top of the position gradient. The two recurring patterns are
the 3D separation
and the cosine of the phase angle \(\cos \alpha = -z / r\):
These appear in
star_planet_distance_od()
and
cos_alpha_od(). The
single-expansion-point Lambertian phase-curve evaluator
(lambert_phase_curve_cd(),
which the whole-orbit
lambert_phase_curve_od()
delegates to) forms the flux \((k/r)^2 A_g\, f(\cos\alpha)\), combining the
Lambert kernel
\(f(\cos\alpha) = (\sin\alpha + (\pi - \alpha)\cos\alpha)/\pi\)
(with its closed-form derivative \(\mathrm d f / \mathrm d \cos\alpha\))
and the inverse-square illumination \(1/r^2\) set by the instantaneous
star-planet distance. Its gradient therefore chains through both
\(\cos\alpha\) and \(r\), using the two derivatives above.
Multi-expansion-point propagation#
The orbit-spanning solver
solve3d_orbit_d()
applies
solve3d_d() once per
expansion point and stacks the results into arrays of shape (N, 3, 5) for
coeffs and (N, 7, 3, 5) for dcoeffs. The periodic-image
contract on the last expansion point is honoured for both arrays — they share
their final slot with the first.
Each multi-expansion-point _d dispatcher then performs the same single-expansion-point
chain rule documented above after a ep_table-driven expansion point lookup:
Epoch-fold the absolute time into a single period.
Look up the expansion-point index
ixfrom the time-to-expansion-point table.Subtract the expansion point phase to get the centered time.
Call the matching centered single-expansion-point
_devaluator withcoeffs[ix]anddcoeffs[ix].
No additional algebra is needed at the dispatcher level for positions, velocities, or distances; the chain rule for higher-level outputs (phase angle, Lambert curve, RV, light travel time) is applied identically to the single-expansion-point case but using the dispatched value and gradient.
Numerical regime and pitfalls#
Validity window. Each expansion point’s Taylor expansion is accurate within a region around the expansion point whose size depends on the orbit; near periastron of an eccentric orbit the window is narrowest. The gradient is accurate inside the same region — its truncation error has the same order as the value’s.
Projected-distance singularity. The chain rule for \(\partial d / \partial \theta\) diverges as \(d \to 0\). This is a geometric, not numerical, singularity (the direction of ascent is undefined when the projected separation vanishes). It is outside the transit-modelling regime.
Eccentricity edge cases. The implicit-differentiation form of \(\partial E / \partial \theta\) remains finite for all \(e \in [0, 1)\): the denominator \(1 - e \cos E\) is bounded below by \(1 - e > 0\). The small-eccentricity branch used in
eccentricity_vector()is a numerical convenience for the orientation calculation and does not enter the derivative chain documented here.Floating-point precision. All
_droutines run undernumba.njit()withfastmath=True. In practice this yields gradients agreeing with finite-difference checks to roughly \(10^{-9}\) relative error for typical transit parameters — the same envelope the value-only evaluators inhabit.Slot-0 convention. Slot 0 is the partial with respect to the transit-centre time \(t_c\). Because the orbit depends on the elapsed time \(t_\mathrm{obs} - t_c\), this equals the negative of the partial w.r.t. the solver’s expansion point/expansion-time argument
te: \(\partial / \partial t_c = -\,\partial / \partial t_k\). The sign is applied once at the source (dma[0] = -2\pi/pinsolve2d_d/solve3d_d) and propagates linearly through every evaluator, so all_d/_odoutputs report \(\partial / \partial t_c\) consistently.
Transit-centre vs periastron parametrisation#
The solver’s native gradient basis is the transit-centre parametrisation: slot 0 is \(\partial / \partial t_c\) and the eccentricity, argument-of- periastron, and period derivatives are taken holding \(t_c\) fixed.
When an Orbit is bound with the time of periastron
passage (set_pars(tp=...)), the gradients are instead returned in the
periastron parametrisation \((t_p, p, a, i, e, w, \lambda)\), with the
shape derivatives taken holding \(t_p\) fixed. The basis therefore follows
whichever timing parameter you supply: tc keeps the transit-centre basis,
tp switches to the periastron basis.
The two are related by the exact, parameter-dependent offset
so the periastron-basis gradient is the transit-centre gradient with multiples of the timing row added to the \(p\), \(e\), and \(w\) rows:
The \(a\), \(i\), and \(\lambda\) rows are unchanged, and slot 0 is
numerically identical in both bases (it equals \(\partial/\partial t_c =
\partial/\partial t_p\) when the other parameters are held fixed). This change of
basis is applied once to the per-expansion-point coefficient derivatives by
tc_to_tp_gradient(), so it propagates consistently to
every derivative-returning quantity (radial velocity, position, separation,
phase curves, …). The relevant mean-anomaly-at-transit terms come from
mean_anomaly_at_transit_with_derivatives().