Numerical Methods
This page describes the numerical algorithms used in QGYBJ+.jl, including the 2D pencil decomposition strategy for parallel execution.
Spatial Discretization
Horizontal: Pseudo-Spectral Method
The model uses a pseudo-spectral approach in the horizontal:
| Operation | Space | Method |
|---|---|---|
| Linear derivatives | Spectral | Multiply by ik_x, ik_y |
| Nonlinear products | Physical | Pointwise multiplication |
| Transform | Both | FFT / IFFT |
Advantages
- Spectral accuracy for smooth fields
- Efficient O(N log N) via FFT
- No numerical diffusion from derivatives
Dealiasing
Nonlinear products create aliasing errors. We use the radial 2/3 rule:
\[k_{max} = \frac{\min(N_x, N_y)}{3}\]
Modes with $k_x^2 + k_y^2 > k_{max}^2$ are set to zero after each nonlinear term. The radial cutoff ensures isotropic treatment of modes.
# Apply dealiasing mask (radial cutoff)
mask = dealias_mask(grid) # Returns 2D array of 0s and 1s
@. field_k *= maskHyperdiffusion Helper Functions
For dimensional simulations, use the helper functions to compute appropriate hyperdiffusion coefficients:
# Compute 4th order hyperdiffusion for 10-step e-folding at grid scale
hd = compute_hyperdiff_params(
nx=128, ny=128, Lx=70e3, Ly=70e3, dt=10.0,
order=4, efold_steps=10
)
# Returns: (ν=..., ilap=2, order=4)
# ν is in m⁴/s for 4th order (biharmonic ∇⁴)The coefficient is computed such that the grid-scale mode decays by a factor of $e$ in the specified number of time steps.
Vertical: Finite Differences
The vertical direction uses second-order centered differences:
\[\frac{\partial f}{\partial z}\bigg|_k \approx \frac{f_{k+1} - f_{k-1}}{2\Delta z}\]
\[\frac{\partial^2 f}{\partial z^2}\bigg|_k \approx \frac{f_{k+1} - 2f_k + f_{k-1}}{\Delta z^2}\]
For variable coefficients (stratification):
\[\frac{\partial}{\partial z}\left(a(z)\frac{\partial f}{\partial z}\right) \approx \frac{a_{k+1/2}(f_{k+1} - f_k) - a_{k-1/2}(f_k - f_{k-1})}{\Delta z^2}\]
Time Integration
Leapfrog with Robert-Asselin Filter and Integrating Factors
The primary time stepping scheme is leapfrog with Robert-Asselin filtering and integrating factors for hyperdiffusion.
Forward Euler (First Step)
The first step uses forward Euler to bootstrap the leapfrog scheme:
\[q^{n+1} = \left[ q^n - \Delta t \cdot J(\psi, q)^n + \Delta t \cdot D_q^n \right] \cdot e^{-\lambda \Delta t}\]
For the wave envelope (in real/imaginary form):
\[B_R^{n+1} = \left[ B_R^n - \Delta t \cdot J(\psi, B_R) - \Delta t \cdot \frac{N^2 k_h^2}{2 f_0} A_I + \Delta t \cdot \frac{1}{2} r_{BI} \right] \cdot e^{-\lambda_w \Delta t}\]
\[B_I^{n+1} = \left[ B_I^n - \Delta t \cdot J(\psi, B_I) + \Delta t \cdot \frac{N^2 k_h^2}{2 f_0} A_R - \Delta t \cdot \frac{1}{2} r_{BR} \right] \cdot e^{-\lambda_w \Delta t}\]
where $N^2$ is the buoyancy frequency squared and $f_0$ is the Coriolis parameter.
Leapfrog (Subsequent Steps)
Subsequent steps use centered leapfrog with integrating factors:
\[q^{n+1} = q^{n-1} \cdot e^{-2\lambda \Delta t} - 2\Delta t \cdot J(\psi, q)^n \cdot e^{-\lambda \Delta t} + 2\Delta t \cdot D_q^n \cdot e^{-2\lambda \Delta t}\]
\[B_R^{n+1} = B_R^{n-1} \cdot e^{-2\lambda_w \Delta t} - 2\Delta t \cdot \left[ J(\psi, B_R) + \frac{N^2 k_h^2}{2 f_0} A_I - \frac{1}{2} r_{BI} \right]^n \cdot e^{-\lambda_w \Delta t}\]
\[B_I^{n+1} = B_I^{n-1} \cdot e^{-2\lambda_w \Delta t} - 2\Delta t \cdot \left[ J(\psi, B_I) - \frac{N^2 k_h^2}{2 f_0} A_R + \frac{1}{2} r_{BR} \right]^n \cdot e^{-\lambda_w \Delta t}\]
Robert-Asselin Filter
The Robert-Asselin filter damps the computational mode that can grow with leapfrog:
\[\tilde{q}^n = q^n + \gamma \left( q^{n-1} - 2q^n + q^{n+1} \right)\]
where $\gamma \approx 0.001$ (typically very small to minimize physical mode damping).
# First step: Forward Euler
first_projection_step!(state, grid, params, plans; a=a_vec, dealias_mask=mask)
# Subsequent steps: Leapfrog with Robert-Asselin
leapfrog_step!(state_np1, state_n, state_nm1, grid, params, plans;
a=a_vec, dealias_mask=mask)Integrating Factor Method
The integrating factor $\lambda$ handles hyperdiffusion exactly:
\[\lambda = \nu_{h1} \left( |k_x|^{2 \cdot ilap1} + |k_y|^{2 \cdot ilap1} \right) + \nu_{h2} \left( |k_x|^{2 \cdot ilap2} + |k_y|^{2 \cdot ilap2} \right)\]
where (using Unicode parameters):
- $\nu_{h1}$ (
νₕ₁),ilap1: First hyperdiffusion operator (default: biharmonic with ilap1=2) - $\nu_{h2}$ (
νₕ₂),ilap2: Second hyperdiffusion operator (default: hyper-6 with ilap2=6)
The wave field has its own integrating factor $\lambda_w$ with potentially different coefficients.
Advantages of integrating factors:
- Hyperdiffusion is treated exactly (no stability restriction)
- Allows much larger time steps than explicit diffusion treatment
- Second-order accuracy preserved for advective terms
Second-Order IMEX-CNAB with Strang Splitting
For applications where the dispersion CFL constraint (dt ≤ 2f/N² ≈ 2s) is limiting, we provide a second-order IMEX-CNAB scheme that treats dispersion implicitly and uses Strang splitting for refraction.
The Challenge: Refraction Instability
The YBJ+ equation includes a refraction term $-(i/2)\zeta B$ which, when discretized with forward Euler, is unconditionally unstable:
\[|1 - i \Delta t \zeta/2| = \sqrt{1 + (\Delta t \zeta/2)^2} > 1\]
This amplifies energy regardless of time step size.
Solution: Strang Splitting + IMEX-CNAB
We use Strang splitting for refraction (second-order) combined with Adams-Bashforth 2 for advection (second-order):
Stage 1 - First Half-Refraction (Strang):
\[B^* = B^n \times \exp(-i \frac{\Delta t}{2} \frac{\zeta}{2})\]
Stage 2 - IMEX-CNAB for Advection + Dispersion:
\[B^{**} - \frac{\Delta t}{2} i \alpha_{\text{disp}} k_h^2 A^{n+1} = B^* + \frac{\Delta t}{2} i \alpha_{\text{disp}} k_h^2 A^* + \frac{3\Delta t}{2} N^n - \frac{\Delta t}{2} N^{n-1}\]
where $N^n = -J(\psi^n, B^n)$ is the advection tendency at time $n$, and Adams-Bashforth 2 extrapolation $\frac{3}{2}N^n - \frac{1}{2}N^{n-1}$ provides second-order accuracy.
Stage 3 - Second Half-Refraction (Strang):
\[B^{n+1} = B^{**} \times \exp(-i \frac{\Delta t}{2} \frac{\zeta}{2})\]
Since $|\exp(-i \theta)| = 1$ for real $\theta$, refraction is exactly energy-preserving.
Mean-Flow Update (q): The PV equation is advanced with Adams–Bashforth 2 on $-J(\psi,q) + \text{diffusion}$, using the integrating factor for hyperdiffusion. To keep the coupled system second-order, the second refraction half-step uses $\psi^{n+1}$ predicted from the updated $q$ (and $q^w$ when wave feedback is enabled).
Critical: Consistent A*
After applying refraction to get $B^*$, we must compute $A^* = (L^+)^{-1} B^*$ (not use $A^n$). Using $A^n$ with $B^*$ breaks the consistency required by IMEX-CN, causing instability.
Modified Elliptic Problem
Substituting $B = L^+ A$ into the IMEX-CN equation:
\[(L^+ - \beta) A^{n+1} = \text{RHS}\]
where $\beta = (\Delta t/2) \cdot i \cdot \alpha_{\text{disp}} \cdot k_h^2$.
This is a tridiagonal system for each $(k_x, k_y)$ mode, solved with the Thomas algorithm.
Temporal Accuracy
| Component | Method | Order |
|---|---|---|
| Refraction | Strang splitting | 2nd |
| Dispersion | Crank-Nicolson | 2nd |
| Advection | Adams-Bashforth 2 | 2nd |
| Overall | IMEX-CNAB | 2nd |
Note: The first time step uses forward Euler for advection (AB2 bootstrap), so the very first step is first-order.
Stability Summary
| Term | Treatment | Stability |
|---|---|---|
| Refraction | Exact integrating factor | Unconditionally stable |
| Dispersion | Implicit Crank-Nicolson | Unconditionally stable |
| Advection | Explicit Adams-Bashforth 2 | CFL: $\Delta t < \Delta x / U_{\max}$ |
For typical oceanographic parameters (U ≈ 0.3 m/s, dx ≈ 300m), this allows dt ≈ 20s vs dt ≈ 2s for explicit leapfrog—a 10x speedup.
Elliptic Inversions
Tridiagonal Systems
Both QG (q -> psi) and YBJ+ (B -> A) inversions lead to tridiagonal systems at each horizontal wavenumber (kx, ky):
\[a_k x_{k-1} + b_k x_k + c_k x_{k+1} = d_k\]
Thomas Algorithm
We solve these in O(N) operations using the Thomas algorithm:
Forward sweep:
for k = 2:nz
w = a[k] / b[k-1]
b[k] = b[k] - w * c[k-1]
d[k] = d[k] - w * d[k-1]
endBack substitution:
x[nz] = d[nz] / b[nz]
for k = nz-1:-1:1
x[k] = (d[k] - c[k] * x[k+1]) / b[k]
endKey Inversions
| Function | Solves | Physical Meaning |
|---|---|---|
invert_q_to_psi! | ∇²ψ + (f²/N²)∂²ψ/∂z² = q | PV to streamfunction |
invert_L⁺A_to_A! | L⁺A = input | Wave envelope to amplitude |
invert_helmholtz! | ∇²ϕ - λϕ = f | General Helmholtz |
FFT Implementation
Serial Mode: FFTW
We use FFTW with measured plans for optimal performance:
# Create optimized plans
plans = plan_transforms!(grid)
# Forward FFT (physical -> spectral)
fft_forward!(dst, src, plans)
# Backward FFT (spectral -> physical)
fft_backward!(dst, src, plans)Plan creation is expensive (~seconds) but execution is fast.
Parallel Mode: PencilFFTs
For MPI parallel execution, we use PencilFFTs which handles distributed FFTs:
# Create parallel FFT plans
plans = QGYBJplus.plan_mpi_transforms(grid, mpi_config)
# Same interface as serial
fft_forward!(dst, src, plans)
fft_backward!(dst, src, plans)PencilFFTs automatically handles the transposes needed for distributed FFTs.
2D Pencil Decomposition
The Challenge
The model requires two types of operations:
- Horizontal FFTs: Need consecutive x and y data
- Vertical solves: Need all z data at each (x,y) point
With 2D decomposition, no single configuration has all data local.
Solution: Dual Pencil Configurations
QGYBJ+.jl uses two pencil configurations:
| Configuration | Local Dimension | Distributed Dimensions | Use |
|---|---|---|---|
| xy-pencil | x | y, z | Horizontal FFTs |
| z-pencil | z | x, y | Vertical operations |
xy-pencil z-pencil
(x local) (z local)
┌─────────────────┐ ┌─────────────────┐
│ x: FULL │ │ x: distributed │
│ y: distributed │ <-----> │ y: distributed │
│ z: distributed │ transpose │ z: FULL │
└─────────────────┘ └─────────────────┘Transpose Operations
Functions requiring vertical operations follow this pattern:
function some_vertical_operation!(S, G; workspace=nothing)
# Check if 2D decomposition is active
need_transpose = G.decomp !== nothing && hasfield(typeof(G.decomp), :pencil_z)
if need_transpose
# 1. Transpose from xy-pencil to z-pencil
transpose_to_z_pencil!(workspace.field_z, S.field, G)
# 2. Perform vertical operation (z now fully local)
_vertical_operation_on_z_pencil!(workspace.result_z, workspace.field_z, ...)
# 3. Transpose result back to xy-pencil
transpose_to_xy_pencil!(S.result, workspace.result_z, G)
else
# Serial mode: direct vertical operation
_vertical_operation_direct!(S, G, ...)
end
endFunctions Using This Pattern
| Function | What it does | Needs z local? |
|---|---|---|
invert_q_to_psi! | PV inversion | Yes (tridiagonal) |
invert_L⁺A_to_A! | Wave amplitude recovery | Yes (tridiagonal) |
invert_helmholtz! | General Helmholtz | Yes (tridiagonal) |
compute_vertical_velocity! | Omega equation | Yes (tridiagonal) |
compute_ybj_vertical_velocity! | YBJ w formula | Yes (vertical derivative) |
dissipation_q_nv! | Numerical dissipation | Yes (vertical terms) |
sumL⁺A! | Sum L⁺A over depth | Yes (vertical sum) |
compute_sigma | YBJ sigma term | Yes (vertical operations) |
compute_A! | Compute A from L⁺A | Yes (vertical operations) |
omega_eqn_rhs! | RHS of omega equation | Yes (vertical derivatives) |
Workspace Arrays
To avoid repeated allocation, pre-allocate z-pencil workspace:
# Initialize once
workspace = QGYBJplus.init_mpi_workspace(grid, mpi_config)
# Contents:
# workspace.q_z, workspace.psi_z, workspace.B_z,
# workspace.A_z, workspace.C_z, workspace.work_z
# Pass to functions
invert_q_to_psi!(state, grid; a=a_vec, workspace=workspace)Jacobian/Advection Computation
Divergence Form (convol_waqg)
For advection terms like $J(\psi, q)$, QGYBJ+.jl uses the divergence form:
\[J(\psi, q) = \frac{\partial(uq)}{\partial x} + \frac{\partial(vq)}{\partial y} = ik_x \widehat{uq} + ik_y \widehat{vq}\]
where $u = -\partial\psi/\partial y$ and $v = \partial\psi/\partial x$ are the geostrophic velocities.
Algorithm (convol_waqg)
Precompute velocities in real space: $u_r$, $v_r$
Transform field to real space:
fft_backward!(qr, qk, plans)Compute products in real space:
uterm = u_r .* qr vterm = v_r .* qrTransform back and compute divergence:
fft_forward!(uterm_k, uterm, plans) fft_forward!(vterm_k, vterm, plans) J_k = im * kx .* uterm_k + im * ky .* vterm_kApply dealiasing:
J_k[.!dealias_mask] .= 0Normalize (for unnormalized FFT):
J_k ./= (nx * ny)
Conservation Properties
The pseudo-spectral advection conserves:
- Circulation: $\int J(\psi, q) \, dA = 0$
- Energy (to machine precision in inviscid limit)
- Enstrophy (to machine precision in inviscid limit)
Stability Constraints
CFL Condition
For advection terms:
\[\Delta t < \frac{\Delta x}{\max|u|} \approx \frac{L/N}{U}\]
Diffusion Stability
With integrating factors, there is no diffusion stability limit.
Without integrating factors, explicit diffusion requires:
\[\Delta t < \frac{\Delta x^{2p}}{2\nu}\]
For hyperdiffusion (p=4), this is very restrictive.
Recommended Time Steps
| Resolution | Typical dt |
|---|---|
| 64^3 | 0.001 - 0.01 |
| 128^3 | 0.0005 - 0.005 |
| 256^3 | 0.0002 - 0.002 |
| 512^3 | 0.0001 - 0.001 |
Memory Layout
Array Ordering
Julia uses column-major ordering (Fortran-style):
# Fast index first for cache efficiency
for k = 1:nz
for j = 1:ny
for i = 1:nx
field[k, i, j] = ...
end
end
endComplex Arrays
Spectral fields are stored as Array{ComplexF64, 3}:
# Spectral field dimensions
psi_k = zeros(ComplexF64, nx, ny, nz)PencilArrays (Parallel)
In parallel mode, arrays are PencilArray{T,3}:
# Access underlying data
data = parent(arr)
# Local dimensions
nx_local, ny_local, nz_local = size(data)Accuracy Verification
Order of Accuracy
| Component | Spatial Order | Temporal Order |
|---|---|---|
| Horizontal derivatives | Spectral | - |
| Vertical derivatives | 2nd | - |
| Elliptic solvers | 2nd (vertical) | - |
| Time stepping (Leapfrog) | - | 2nd |
| Integrating factors | - | Exact |
Conservation Tests
Run with inviscid settings to verify:
- Energy conservation (< 10^-10 relative change)
- Enstrophy conservation (< 10^-10 relative change)
# Check energy conservation
KE_initial = flow_kinetic_energy(state.u, state.v)
# ... run simulation ...
KE_final = flow_kinetic_energy(state.u, state.v)
println("Relative change: ", abs(KE_final - KE_initial) / KE_initial)Performance Optimization
Key Optimizations
- Pre-allocated work arrays: No allocations in time loop
- FFTW planning: Measured plans for optimal performance
- Loop fusion:
@.macro for element-wise operations - In-place operations: Minimize memory allocation
- Workspace reuse: Pre-allocated z-pencil arrays for transposes
Profiling
using Profile
# Profile time stepping
@profile for _ in 1:100
leapfrog_step!(state_np1, state_n, state_nm1, grid, params, plans;
a=a_vec, dealias_mask=mask, workspace=workspace)
end
Profile.print()Typical hotspots:
- FFT transforms (~40-50%)
- Tridiagonal solves (~20-30%)
- Transpose operations (~10-20% in parallel)
- Array operations (~10-20%)
Parallel Scaling
| Processes | Expected Speedup | Limiting Factor |
|---|---|---|
| 1-16 | Near linear | - |
| 16-64 | Good | Transpose overhead |
| 64-256 | Moderate | Communication |
| 256+ | Diminishing | Problem size dependent |
References
- Canuto, C., et al. (2006). Spectral Methods: Fundamentals in Single Domains. Springer.
- Durran, D. R. (2010). Numerical Methods for Fluid Dynamics. Springer.
- PencilArrays.jl documentation: https://jipolanco.github.io/PencilArrays.jl/
- PencilFFTs.jl documentation: https://jipolanco.github.io/PencilFFTs.jl/