NCI

ACCESS-OM2 Performance Analysis

Marshall Ward
National Computational Infrastructure

Model configuration

Resolution 3600 × 2700 × 75 (KDS)
Timestep 300 s (288 step / day)
CPUs MOM 4358 (80 × 75, masked)
CICE 1200 (40 × 30)
MATM 1
Runtime 10 day (2880 step)

Main loop runtimes

Component Runtime (s) Per step (s)
ACCESS-OM2 2023.2 0.70
MOM 1980.6 0.69
CICE 1673.3 0.58
MATM <76.5 -
(~3% profiler overhead)

MOM timestep distribution

MOM ocean core scaling

MOM ocean core distribution

MOM SBC "scaling"

zero_net_water_coupler = .true.

CICE load imbalance

Roundrobin distribution

CPUs CPU hours Eff.
1200 (unmasked) 674.4 -
960 (masked) 548.6 18.7%
Major contributions by Nic and Russ

Land mask efficiency

Block size Ocean CPUs Land ratio
180 × 180 266 of 300 11.3%
90 × 180 514 of 600 14.3%
90 × 90 960 of 1200 20.0%
45 × 90 1849 of 2400 23.0%
45 × 45 3515 of 4800 26.8%
1 × 1 6.1M of 9.7M 37.4%

CICE runtime distribution

CICE core scaling

Stability report

Mid-run hangs: "Too many retries..."
  • OpenMPI 1.10.x

    -mca pml yalla

  • OpenMPI 2.x

    Requires rewrite of mpp_global_field
    (nearly complete)

Stability issues

Initialisation hangs

  • OASIS: MPI_Comm_split
  • MOM: MPI_Comm_group

The investigation continues...

Summary

  • Both submodels can be scaled up ~2-4x
  • No noticeable coupler overhead
  • MOM scaling limited by SBC, "2d physics"
  • CICE land masking: ~5-10% efficiency
  • Communicator stability is still an issue

We can already scale up: we need more cores!

TODO

  • MOM's SBC performance bottleneck
  • Ice load balancing
  • Investigate Intel MPI, MVAPICH
  • Perhaps Parallel IO???