MPI Parallelization

Run QGYBJ+.jl on distributed memory systems using 2D pencil decomposition.

When to Use

Recommended for grids ≥256³ or when memory is limited. For smaller problems, use threading: julia -t auto.

Quick Start

# parallel_run.jl
using MPI, PencilArrays, PencilFFTs, QGYBJplus

MPI.Init()
mpi_config = QGYBJplus.setup_mpi_environment()

params = default_params(Lx=1000e3, Ly=1000e3, Lz=5000.0, nx=256, ny=256, nz=128)
grid = QGYBJplus.init_mpi_grid(params, mpi_config)
plans = QGYBJplus.plan_mpi_transforms(grid, mpi_config)
state = QGYBJplus.init_mpi_state(grid, plans, mpi_config)
workspace = QGYBJplus.init_mpi_workspace(grid, mpi_config)

a_vec = a_ell_ut(params, grid)

for step in 1:1000
    invert_q_to_psi!(state, grid; a=a_vec, workspace=workspace)
    leapfrog_step!(state, state, state, grid, params, plans; a=a_vec, workspace=workspace)
end

MPI.Finalize()

Run with:

mpiexec -n 16 julia --project parallel_run.jl

Requirements

MPI parallel packages (MPI.jl, PencilArrays.jl, PencilFFTs.jl) are included as dependencies and installed automatically with QGYBJ+.jl.

System MPI library required:

  • macOS: brew install open-mpi
  • Ubuntu: apt install libopenmpi-dev

Scaling

ProcessesTopologyGrid Size
42×2128³
164×4256³
648×8512³

Use powers of 2 for optimal performance.

Key Concepts

2D Pencil Decomposition: Domain split across px × py process grid. z-dimension stays local for efficient vertical solves.

Workspace: Pre-allocate once to avoid repeated allocation:

workspace = QGYBJplus.init_mpi_workspace(grid, mpi_config)

State Copies: Use copy_state(S) not deepcopy(S) to preserve pencil topology.

Key Functions

FunctionPurpose
setup_mpi_environment()Initialize MPI config
init_mpi_grid()Create distributed grid
plan_mpi_transforms()Create PencilFFT plans
init_mpi_state()Create distributed state
init_mpi_workspace()Allocate workspace
copy_state()Copy state (preserves topology)
mpi_reduce_sum()Sum across processes

Global Reductions

local_ke = flow_kinetic_energy(state.u, state.v)
global_ke = QGYBJplus.mpi_reduce_sum(local_ke, mpi_config)
if mpi_config.is_root
    println("Total KE: $global_ke")
end

Job Scripts

SLURM

#!/bin/bash
#SBATCH --nodes=4 --ntasks-per-node=16
mpiexec -n 64 julia --project script.jl

Troubleshooting

ProblemSolution
Pencil topology mismatchUse copy_state(S) not deepcopy(S)
DeadlockAll ranks must call collective operations
SegfaultsUse size(parent(arr)) for array dimensions

See Troubleshooting for more details.