Parallel Particle Algorithm
This page provides detailed technical documentation of the parallel particle advection algorithm in QGYBJ+.jl.
Overview
The parallel particle algorithm enables efficient Lagrangian particle tracking on distributed-memory systems using MPI. Key features:
- 2D domain decomposition in x and y (px × py topology)
- Halo exchange in x/y (plus corners) for cross-boundary interpolation
- Automatic particle migration between MPI ranks
- Non-blocking MPI communication for efficiency
- Load balancing as particles redistribute
Domain Decomposition
2D Tile Decomposition
The physical domain [0, Lx] × [0, Ly] × [-Lz, 0] is partitioned into tiles across x and y according to the MPI process grid (px × py). Each rank owns a contiguous tile in x/y and the full z-range.
Example for a 2×2 topology:
┌─────────────────────────┬─────────────────────────┐
│ Rank (0,1) │ Rank (1,1) │
│ x∈[0, Lx/2), y∈[Ly/2, Ly) │ x∈[Lx/2, Lx), y∈[Ly/2, Ly) │
├─────────────────────────┼─────────────────────────┤
│ Rank (0,0) │ Rank (1,0) │
│ x∈[0, Lx/2), y∈[0, Ly/2) │ x∈[Lx/2, Lx), y∈[0, Ly/2) │
└─────────────────────────┴─────────────────────────┘- Each rank owns contiguous x and y ranges
- Full z dimension on each rank
- Particles belong to the rank containing their (x, y) position
Local Domain Calculation
function compute_local_domain(grid::Grid, rank::Int, nprocs::Int; topology=nothing)
# Determine process grid (px × py)
px, py = if topology !== nothing
topology
elseif grid.decomp !== nothing && hasfield(typeof(grid.decomp), :topology)
grid.decomp.topology
else
(nprocs, 1)
end
rank_x = rank % px
rank_y = rank ÷ px
nx_local = compute_local_size(grid.nx, px, rank_x)
ny_local = compute_local_size(grid.ny, py, rank_y)
x_start = compute_start_index(grid.nx, px, rank_x)
y_start = compute_start_index(grid.ny, py, rank_y)
dx = grid.Lx / grid.nx
dy = grid.Ly / grid.ny
x_start_phys = x_start * dx
x_end_phys = (x_start + nx_local) * dx
y_start_phys = y_start * dy
y_end_phys = (y_start + ny_local) * dy
return (
x_start = x_start_phys,
x_end = x_end_phys,
y_start = y_start_phys,
y_end = y_end_phys,
z_start = -grid.Lz,
z_end = 0.0,
nx_local = nx_local,
ny_local = ny_local,
px = px,
py = py,
rank_x = rank_x,
rank_y = rank_y
)
endExample: 256×256 grid points, topology (2,2) → local 128×128 tiles per rank.
Halo Exchange
Purpose
When a particle is near a domain boundary, velocity interpolation requires data from the neighboring rank. Halo regions (ghost cells) store copies of neighbor data.
Halo Width Requirements
The halo width is automatically determined by the interpolation method:
| Interpolation | Stencil | Halo Width | Accuracy |
|---|---|---|---|
| TRILINEAR | 2×2×2 | 1 cell | O(h²) |
| TRICUBIC | 4×4×4 | 2 cells | O(h⁴) |
| QUINTIC | 6×6×6 | 3 cells | O(h⁶) |
| ADAPTIVE | varies | 3 cells | varies |
The required_halo_width() function computes this automatically:
using QGYBJplus: required_halo_width, TRILINEAR, TRICUBIC, QUINTIC
required_halo_width(TRILINEAR) # → 1
required_halo_width(TRICUBIC) # → 2
required_halo_width(QUINTIC) # → 3Extended Array Structure
┌─────────────────────────────────────────────────────────────┐
│ EXTENDED ARRAY LAYOUT (local + halos) │
│ │
│ y (north) │
│ ▲ │
│ │ ┌───────────────────────────────┐ │
│ │ │ top halo │ │
│ │ ├───────────┬───────────┬───────┤ │
│ │ │ left halo │ local │ right │ │
│ │ │ │ data │ halo │ │
│ │ ├───────────┴───────────┴───────┤ │
│ │ │ bottom halo │ │
│ │ └───────────────────────────────┘ │
│ │
│ hw = halo_width (depends on interpolation method) │
│ Extended size: (nz_local, nx_local + 2*hw, ny_local + 2*hw)│
└─────────────────────────────────────────────────────────────┘Communication Pattern (Periodic Boundaries)
For periodic domains, neighbor relationships wrap in x and y. In 2D, each rank has up to 8 neighbors:
NW(7) --- N(4) --- NE(8)
| | |
W(1) --- local --- E(2)
| | |
SW(5) --- S(3) --- SE(6)With periodic_x/periodic_y enabled, edge neighbors wrap around (e.g., west of rankx=0 is rankx=px-1). For py==1, only W/E neighbors are used.
Implementation
function exchange_velocity_halos!(halo_info, u_field, v_field, w_field)
# 1. Copy local data to center of extended arrays
copy_local_to_extended!(halo_info, u_field, v_field, w_field)
# 2. Pack boundary data into send buffers (W/E/S/N and corners)
# For py==1, only W/E buffers are used.
# 3. Post non-blocking receives from all neighbors (up to 8)
# 4. Send to neighbors (non-blocking)
# 5. Wait for receives and unpack into halo regions
MPI.Waitall(recv_reqs)
# 6. Wait for sends to complete
MPI.Waitall(send_reqs)
endBuffer Layout
Each buffer contains packed velocity components:
Buffer layout: [u₁, v₁, w₁, u₂, v₂, w₂, ..., uₙ, vₙ, wₙ]
where n depends on the buffer type:
- X-direction buffers: halo_width × ny_local × nz_local
- Y-direction buffers: nx_local × halo_width × nz_local
- Corner buffers: halo_width × halo_width × nz_localParticle Migration
When Migration Occurs
After advection, particles may have moved outside their owning rank's domain:
┌────────────────────────────────────────────────────────────────────────┐
│ PARTICLE CROSSING BOUNDARY │
│ │
│ Before advection: │
│ ┌───────────────────────┬───────────────────────┐ │
│ │ RANK 0 │ RANK 1 │ │
│ │ • │ │ │
│ │ ↗ │ │ │
│ │ Particle moving │ │ │
│ │ toward boundary │ │ │
│ └───────────────────────┴───────────────────────┘ │
│ │
│ After advection: │
│ ┌───────────────────────┬───────────────────────┐ │
│ │ RANK 0 │ RANK 1 │ │
│ │ │ • │ │
│ │ │ ↑ Particle now in │ │
│ │ │ Rank 1's domain │ │
│ └───────────────────────┴───────────────────────┘ │
│ │
│ → Particle must be migrated from Rank 0 to Rank 1 │
└────────────────────────────────────────────────────────────────────────┘The same logic applies across y-boundaries when py > 1.
Migration Algorithm
function migrate_particles!(tracker)
particles = tracker.particles
local_domain = tracker.local_domain
# 1. Clear send buffers
for buf in tracker.send_buffers
empty!(buf)
end
# 2. Identify particles to keep vs migrate
keep_indices = Int[]
for i in 1:particles.np
x = particles.x[i]
y = particles.y[i]
target_rank = find_target_rank(x, y, tracker)
if target_rank == tracker.rank
# Particle stays local
push!(keep_indices, i)
else
# Package particle for migration: [x, y, z, u, v, w]
particle_data = [
particles.x[i], particles.y[i], particles.z[i],
particles.u[i], particles.v[i], particles.w[i]
]
append!(tracker.send_buffers[target_rank + 1], particle_data)
end
end
# 3. Remove migrated particles from local arrays
particles.x = particles.x[keep_indices]
particles.y = particles.y[keep_indices]
# ... same for z, u, v, w
particles.np = length(keep_indices)
# 4. Exchange particle data via MPI
exchange_particles!(tracker)
# 5. Add received particles to local collection
add_received_particles!(tracker)
end
function find_target_rank(x, y, tracker)
# Handle periodic boundaries
x_periodic = tracker.config.periodic_x ? mod(x, tracker.Lx) : x
y_periodic = tracker.config.periodic_y ? mod(y, tracker.Ly) : y
# Convert to grid indices (0-based)
ix = floor(Int, x_periodic / tracker.dx)
iy = floor(Int, y_periodic / tracker.dy)
ix = min(ix, tracker.nx - 1)
iy = min(iy, tracker.ny - 1)
# Determine process grid (px × py)
px, py = tracker.local_domain === nothing ?
compute_process_grid(tracker.nprocs) :
(tracker.local_domain.px, tracker.local_domain.py)
# Map indices to rank coordinates (handles uneven division)
rank_x = _rank_for_index(ix, tracker.nx, px)
rank_y = _rank_for_index(iy, tracker.ny, py)
return rank_y * px + rank_x
endAll-to-All Communication
function exchange_particles!(tracker)
comm = tracker.comm
nprocs = tracker.nprocs
# 1. Exchange particle counts (how many to send to each rank)
send_counts = [length(tracker.send_buffers[i]) ÷ 6 for i in 1:nprocs]
recv_counts = MPI.Alltoall(send_counts, comm)
# 2. Point-to-point sends/receives for particle data
for other_rank in 0:nprocs-1
if other_rank == tracker.rank
continue
end
# Send to other_rank
if !isempty(tracker.send_buffers[other_rank + 1])
MPI.Send(tracker.send_buffers[other_rank + 1], other_rank, tag=0, comm)
end
# Receive from other_rank
if recv_counts[other_rank + 1] > 0
n_values = recv_counts[other_rank + 1] * 6
recv_data = Vector{T}(undef, n_values)
MPI.Recv!(recv_data, other_rank, tag=0, comm)
tracker.recv_buffers[other_rank + 1] = recv_data
end
end
endComplete Parallel Timestep
┌──────────────────────────────────────────────────────────────────────────────┐
│ PARALLEL PARTICLE ADVECTION TIMESTEP │
│ │
│ ┌────────────────────────────────────────────────────────────────────────┐ │
│ │ STEP 1: UPDATE VELOCITY FIELDS │ │
│ │ │ │
│ │ • Compute geostrophic velocities (distributed FFT): │ │
│ │ û = -i·kᵧ·ψ̂, v̂ = i·kₓ·ψ̂ │ │
│ │ │ │
│ │ • Solve omega equation for w (tridiagonal in z): │ │
│ │ ∇²w + (f²/N²)∂²w/∂z² = (2f/N²)·J(ψ_z, ∇²ψ) │ │
│ │ │ │
│ │ • Add wave velocity + Stokes drift (full Jacobian): │ │
│ │ u += Re(LA) + Im(J₀)/f₀, v += Im(LA) - Re(J₀)/f₀ │ │
│ │ w += -2·Im(K₀)/f₀ (Wagner & Young 2016, eq. 3.16a-3.20) │ │
│ │ │ │
│ │ • Exchange velocity halos (MPI non-blocking) │ │
│ └────────────────────────────────────────────────────────────────────────┘ │
│ ↓ │
│ ┌─────────────────────────────────────────────────────────────────┐ │
│ │ STEP 2: ADVECT MEAN POSITIONS (GLM framework) │ │
│ │ │ │
│ │ For each local particle: │ │
│ │ 1. Interpolate QG velocity at mean position (X, Y, Z) │ │
│ │ • Use extended arrays (halos) if near boundary │ │
│ │ • Trilinear/Tricubic/Quintic interpolation │ │
│ │ │ │
│ │ 2. Time integration (Euler): │ │
│ │ • X_{n+1} = X_n + dt·u_QG │ │
│ │ │ │
│ │ 3. Reconstruct wave displacement (diagnostic): │ │
│ │ • ξ = Re{(LA/(-if)) × e^{-ift}} │ │
│ │ • Physical position: x = X + ξ │ │
│ └─────────────────────────────────────────────────────────────────┘ │
│ ↓ │
│ ┌─────────────────────────────────────────────────────────────────┐ │
│ │ STEP 3: MIGRATE PARTICLES │ │
│ │ │ │
│ │ 1. Identify particles outside local domain │ │
│ │ 2. Pack outgoing particles into send buffers │ │
│ │ 3. MPI.Alltoall - exchange particle counts │ │
│ │ 4. MPI.Send/Recv - transfer particle data │ │
│ │ 5. Unpack incoming particles into local arrays │ │
│ └─────────────────────────────────────────────────────────────────┘ │
│ ↓ │
│ ┌─────────────────────────────────────────────────────────────────┐ │
│ │ STEP 4: APPLY BOUNDARY CONDITIONS │ │
│ │ │ │
│ │ Horizontal (periodic): x = mod(x, Lx), y = mod(y, Ly) │ │
│ │ │ │
│ │ Vertical (reflective): │ │
│ │ if z > 0: z = -z, w = -w │ │
│ │ if z < -Lz: z = -2·Lz - z, w = -w │ │
│ └─────────────────────────────────────────────────────────────────┘ │
│ ↓ │
│ ┌─────────────────────────────────────────────────────────────────┐ │
│ │ STEP 5: SAVE TRAJECTORIES (if save_interval reached) │ │
│ │ │ │
│ │ Option A: Each rank saves local particles independently │ │
│ │ Option B: Gather all particles to rank 0, unified output │ │
│ └─────────────────────────────────────────────────────────────────┘ │
└──────────────────────────────────────────────────────────────────────────────┘Velocity Data Flow: QG + Wave
This section explains how particles obtain the combined QG flow and wave Stokes drift velocities in parallel mode.
Overview
Particles are advected by the total velocity field:
u_total = u_QG + u_wave + u_Stokes
v_total = v_QG + v_wave + v_Stokes
w_total = w_QG + w_Stokeswhere:
- QG velocities: Geostrophic flow from streamfunction ψ, vertical from omega equation
- Wave velocity: Backrotated wave orbital velocity Re(LA), Im(LA) from YBJ+ equation
- Stokes drift: Wave-induced drift from full Jacobian (Wagner & Young 2016)
Data Flow Diagram
┌──────────────────────────────────────────────────────────────────────────────────┐
│ VELOCITY DATA FLOW FOR PARTICLE ADVECTION │
│ │
│ ┌─────────────────────────────────────────────────────────────────────────────┐ │
│ │ STEP 1: advect_particles!(tracker, state, grid, dt) │ │
│ │ ↓ │ │
│ │ calls update_velocity_fields!(tracker, state, grid) │ │
│ └─────────────────────────────────────────────────────────────────────────────┘ │
│ ↓ │
│ ┌─────────────────────────────────────────────────────────────────────────────┐ │
│ │ STEP 2: compute_total_velocities!(state, grid; plans=tracker.plans, ...) │ │
│ │ │ │
│ │ ┌───────────────────────────────────────────────────────────────────────┐ │ │
│ │ │ compute_velocities!(S, G) → QG velocities │ │ │
│ │ │ │ │ │
│ │ │ HORIZONTAL (geostrophic): │ │ │
│ │ │ û = -i·kᵧ·ψ̂ → u_QG = -∂ψ/∂y │ │ │
│ │ │ v̂ = i·kₓ·ψ̂ → v_QG = +∂ψ/∂x │ │ │
│ │ │ │ │ │
│ │ │ VERTICAL (omega equation): │ │ │
│ │ │ ∇²w + (f²/N²)∂²w/∂z² = (2f/N²)·J(ψ_z, ∇²ψ) │ │ │
│ │ │ → Tridiagonal solve per (kₓ,kᵧ) → w_QG │ │ │
│ │ │ │ │ │
│ │ │ State.u = u_QG, State.v = v_QG, State.w = w_QG │ │ │
│ │ └───────────────────────────────────────────────────────────────────────┘ │ │
│ │ ↓ │ │
│ │ ┌───────────────────────────────────────────────────────────────────────┐ │ │
│ │ │ compute_wave_velocities!(S, G) → Wave velocity + Stokes drift │ │ │
│ │ │ │ │ │
│ │ │ WAVE VELOCITY (Asselin & Young 2019, eq 1.2): │ │ │
│ │ │ LA = B + (k_h²/4)·A in spectral space (YBJ+ relation) │ │ │
│ │ │ u_wave = Re(LA), v_wave = Im(LA) │ │ │
│ │ │ │ │ │
│ │ │ HORIZONTAL STOKES DRIFT (Wagner & Young 2016, eq 3.16a, 3.18): │ │ │
│ │ │ J₀ = (LA)*·∂_{s*}(LA) - (f₀²/N²)·(∂_{s*}A_z*)·∂_z(LA) │ │ │
│ │ │ u_S = Im(J₀)/f₀, v_S = -Re(J₀)/f₀ │ │ │
│ │ │ │ │ │
│ │ │ VERTICAL STOKES DRIFT (Wagner & Young 2016, eq 3.19-3.20): │ │ │
│ │ │ w_S = -2·Im(K₀)/f₀ where K₀ = M*_z·M_{ss*} - M*_{s*}·M_{sz} │ │ │
│ │ │ │ │ │
│ │ │ State.u += u_wave + u_S → u_QG + u_wave + u_S (TOTAL) │ │ │
│ │ │ State.v += v_wave + v_S → v_QG + v_wave + v_S (TOTAL) │ │ │
│ │ │ State.w += w_S → w_QG + w_S (TOTAL) │ │ │
│ │ └───────────────────────────────────────────────────────────────────────┘ │ │
│ └─────────────────────────────────────────────────────────────────────────────┘ │
│ ↓ │
│ ┌─────────────────────────────────────────────────────────────────────────────┐ │
│ │ STEP 3: Copy to tracker (LOCAL data only in parallel) │ │
│ │ │ │
│ │ u_data = parent(state.u) # LOCAL portion of distributed array │ │
│ │ v_data = parent(state.v) │ │
│ │ w_data = parent(state.w) │ │
│ │ │ │
│ │ tracker.u_field .= u_data # Copy TOTAL velocities to tracker │ │
│ │ tracker.v_field .= v_data │ │
│ │ tracker.w_field .= w_data │ │
│ └─────────────────────────────────────────────────────────────────────────────┘ │
│ ↓ │
│ ┌─────────────────────────────────────────────────────────────────────────────┐ │
│ │ STEP 4: Halo Exchange (PARALLEL MODE ONLY) │ │
│ │ │ │
│ │ exchange_velocity_halos!(halo_info, u_field, v_field, w_field) │ │
│ │ │ │
│ │ Rank 0 Rank 1 Rank 2 │ │
│ │ [u,v,w] → [u,v,w] → [u,v,w] │ │
│ │ ↓ ↓ ↓ │ │
│ │ [H|local|H] [H|local|H] [H|local|H] │ │
│ │ └──send──→──recv──┘──send──→──recv──┘ │ │
│ │ │ │
│ │ Extended arrays now contain: │ │
│ │ • Left halo: neighbor's RIGHT edge velocities │ │
│ │ • Local: own velocity data (QG + wave) │ │
│ │ • Right halo: neighbor's LEFT edge velocities │ │
│ └─────────────────────────────────────────────────────────────────────────────┘ │
│ ↓ │
│ ┌─────────────────────────────────────────────────────────────────────────────┐ │
│ │ STEP 5: Particle Interpolation │ │
│ │ │ │
│ │ For each particle at (x, y, z): │ │
│ │ (u, v, w) = interpolate_velocity_at_position(x, y, z, tracker) │ │
│ │ │ │
│ │ • Uses extended arrays (local + halos) │ │
│ │ • TRILINEAR / TRICUBIC / QUINTIC interpolation │ │
│ │ • Returns TOTAL velocity (QG + wave) at particle position │ │
│ └─────────────────────────────────────────────────────────────────────────────┘ │
└──────────────────────────────────────────────────────────────────────────────────┘Key Implementation Details
1. State Contains TOTAL Velocities
After compute_total_velocities!, the State arrays contain the sum of QG and wave contributions:
# In compute_total_velocities!:
compute_velocities!(S, G; ...) # Sets S.u, S.v, S.w to QG values
compute_wave_velocities!(S, G; ...) # ADDS Stokes drift to S.u, S.v, S.w2. Parallel Data Distribution (PencilArrays)
In MPI parallel mode:
state.u,state.v,state.ware distributed arrays (PencilArrays)- Each rank owns a tile:
[x_start, x_end] × [y_start, y_end] × [-Lz, 0] parent(state.u)extracts only the local portion
# In update_velocity_fields!:
u_data = parent(state.u) # Shape: (nz, nx_local, ny_local) - LOCAL data only
tracker.u_field .= u_data # Copy to tracker's workspace3. Wave Data Source
The wave amplitude A and its derivatives come from the simulation state:
| Field | Location | Description |
|---|---|---|
State.A | Spectral | Wave amplitude Â(kₓ, kᵧ, z) |
State.C | Spectral | Az = ∂A/∂z (set by `invertL⁺AtoA!`) |
The wave velocity and Stokes drift computation (Wagner & Young 2016, eq. 3.16a-3.20):
- Computes LA = L⁺A + (k_h²/4)A in spectral space (wave velocity amplitude)
- Computes ∂{s*}(LA), ∂{s}(A_z), ∂_z(LA) in spectral space
- For vertical Stokes: computes A{zz}, ΔH(Az), A{zs}, A_{zzs}
- Transforms all fields to physical space
- Computes wave velocity: uwave = Re(LA), vwave = Im(LA)
- Computes horizontal Stokes via full Jacobian J₀: uS = Im(J₀)/f₀, vS = -Re(J₀)/f₀
- Computes vertical Stokes via K₀: w_S = -2·Im(K₀)/f₀
- Adds all contributions to existing QG velocities
4. Halo Exchange Enables Cross-Boundary Interpolation
After halo exchange, each rank's extended arrays contain:
Extended Array: halos in x and y around the local tile
(left/right + bottom/top + corners)This allows particles near domain boundaries to interpolate velocities using neighbor data without additional MPI communication during advection.
Code Path Summary
advect_particles!(tracker, state, grid, dt)
└─→ update_velocity_fields!(tracker, state, grid)
├─→ compute_total_velocities!(state, grid)
│ ├─→ compute_velocities!() # QG: u,v from ψ; w from omega eqn
│ └─→ compute_wave_velocities!() # Wave velocity LA = L⁺A + (k_h²/4)A
├─→ tracker.u_field .= parent(state.u) # Copy LOCAL QG velocity
└─→ exchange_velocity_halos!() # Fill halos from neighbors
└─→ advect_euler!() # GLM mean position advection
├─→ interpolate_velocity_at_position() # Uses extended arrays
└─→ reconstruct_wave_displacement() # ξ = Re{(LA/(-if)) × e^{-ift}}CFL Stability for Particle Advection
Halo Constraint
For Euler time integration with the GLM framework, particles should not move beyond the halo region in a single timestep. If a particle moves beyond the halo region, velocity interpolation becomes inaccurate (values are clamped to boundary).
Stability requirement:
max_velocity × dt < halo_width × dxwhere:
max_velocity: Maximum expected flow velocitydt: Timestephalo_width: Ghost cell count (1, 2, or 3 depending on interpolation)dx: Grid spacing
Validation Function
Use validate_particle_cfl to check if your timestep is appropriate:
using QGYBJplus: validate_particle_cfl
# Estimate maximum velocity in your simulation
max_velocity = 0.1 # m/s
# Check if timestep is safe
if !validate_particle_cfl(tracker, max_velocity, dt)
@warn "Timestep may exceed halo region; consider reducing dt"
endRecommendations
- Use higher-order interpolation: TRICUBIC (hw=2) or QUINTIC (hw=3) provide larger halo regions
- Reduce timestep: If dt is too large for the flow velocities
- Use Euler integration: Only evaluates velocity at current position (no intermediate stages)
Example CFL calculation:
# Grid: 256 points, Lx = 2π, using TRICUBIC (hw=2)
dx = 2π / 256 ≈ 0.0245
halo_width = 2
safe_displacement = halo_width * dx ≈ 0.049
# If max velocity = 0.1 m/s, max safe dt:
dt_max = safe_displacement / max_velocity ≈ 0.49Scalability Analysis
Communication Costs
| Operation | Data Volume | Frequency |
|---|---|---|
| Halo exchange | O((nxlocal + nylocal) × nz × halo_width × 3) | Every timestep |
| Migration Alltoall | O(nprocs) integers | Every timestep |
| Migration Send/Recv | O(Np_crossing × 6) floats | Every timestep |
Scaling Characteristics
Strong Scaling (fixed problem size, varying ranks):
- Velocity computation: ~O(N/P) per rank
- Particle advection: ~O(Np/P) per rank
- Communication: O(boundary_size) - increases relative importance
Weak Scaling (fixed load per rank):
- Ideal scaling if particles are uniformly distributed
- Load imbalance if particles cluster in few ranks
Load Balancing Considerations
Particle load can become imbalanced if:
- Flow concentrates particles (e.g., in eddies)
- Initial distribution is non-uniform
- Some ranks have many boundary crossings
Mitigation strategies:
- Use sufficiently many particles for statistical averaging
- Initialize particles uniformly
- Monitor particle count per rank
Data Structures
HaloInfo
mutable struct HaloInfo{T}
halo_width::Int
# Extended arrays with halos (x and y)
u_extended::Array{T,3} # Size: (nz_local, nx_local + 2*hw, ny_local + 2*hw)
v_extended::Array{T,3}
w_extended::Array{T,3}
# Local data position in extended arrays
local_start::NTuple{3,Int} # (1, hw+1, hw+1)
local_end::NTuple{3,Int} # (nz_local, hw+nx_local, hw+ny_local)
# Neighbor ranks (W, E, S, N, SW, SE, NW, NE)
neighbors::Vector{Int}
# Communication buffers (x/y edges + corners)
send_west::Vector{T}
send_east::Vector{T}
recv_west::Vector{T}
recv_east::Vector{T}
send_south::Vector{T}
send_north::Vector{T}
recv_south::Vector{T}
recv_north::Vector{T}
send_sw::Vector{T}
send_se::Vector{T}
send_nw::Vector{T}
send_ne::Vector{T}
recv_sw::Vector{T}
recv_se::Vector{T}
recv_nw::Vector{T}
recv_ne::Vector{T}
# MPI + topology info
comm::Any
rank::Int
nprocs::Int
nx_global::Int
ny_global::Int
nz_global::Int
nx_local::Int
ny_local::Int
nz_local::Int
px::Int
py::Int
rank_x::Int
rank_y::Int
periodic_x::Bool
periodic_y::Bool
is_2d_decomposition::Bool
endParticleTracker (Parallel Fields)
mutable struct ParticleTracker{T}
# ... common fields ...
# MPI configuration
comm::Any # MPI.COMM_WORLD
rank::Int # This process's rank
nprocs::Int # Total processes
is_parallel::Bool # true if nprocs > 1
# Domain decomposition
local_domain::NamedTuple # (x_start, x_end, y_start, y_end, nx_local, ny_local, px, py, rank_x, rank_y, ...)
# Halo exchange
halo_info::HaloInfo{T} # Extended arrays + buffers
# Migration buffers (one per rank)
send_buffers::Vector{Vector{T}} # Length: nprocs
recv_buffers::Vector{Vector{T}} # Length: nprocs
# I/O configuration
is_io_rank::Bool # true if rank == 0
gather_for_io::Bool # Gather particles to rank 0 for output
endUsage Example
using MPI
using QGYBJplus
# Initialize MPI
MPI.Init()
# Set up parallel configuration
mpi_config = setup_mpi_environment()
# Create distributed grid and state
par = default_params(Lx=2π, Ly=2π, Lz=1.0, nx=64, ny=64, nz=32)
grid = init_mpi_grid(par, mpi_config)
plans = plan_mpi_transforms(grid, mpi_config)
state = init_mpi_state(grid, plans, mpi_config)
# Create particle configuration
particle_config = create_particle_config(
x_min = 0.0, x_max = 2π,
y_min = 0.0, y_max = 2π,
z_level = π,
nx_particles = 100,
ny_particles = 100,
interpolation_method = TRILINEAR
)
# Create particle tracker with parallel support
tracker = ParticleTracker(particle_config, grid, mpi_config)
initialize_particles!(tracker, particle_config)
# Main simulation loop (advance fluid state separately)
for step in 1:nsteps
current_time = step * par.dt
advect_particles!(tracker, state, grid, par.dt, current_time)
end
# Save trajectories (rank 0 gathers and writes, or each rank writes independently)
if mpi_config.is_root || !tracker.gather_for_io
write_particle_trajectories("particles_rank$(mpi_config.rank).nc", tracker)
end
MPI.Finalize()Debugging Tips
Check Particle Distribution
# Print particle count per rank
println("Rank $rank has $(tracker.particles.np) particles")
MPI.Barrier(comm)Verify Halo Exchange
# After halo exchange, check extended array values
if rank == 0
println("Left halo (should be from neighbor): ", halo_info.u_extended[1:2, 1, 1])
println("Local data start: ", halo_info.u_extended[3:4, 1, 1])
endMonitor Migration
# Track particles crossing boundaries
n_migrated = sum(length.(tracker.send_buffers)) ÷ 6
println("Rank $rank migrating $n_migrated particles")See Also
- Particle Advection - General particle documentation
- MPI Parallelization - Overall parallel architecture
- API Reference - Function documentation