April 2025
Features
Billing:
Items now billed for include Container runtime (inlcuding per attached GPU), data transfer to and from bucket, and data storage (including Datasets and Artifacts).
Billing history page now presents a summary of expenditure for the last day, month, and total by product type.
Self Healing
Burst experiments that result in
strong_fail
will attempt to re-launch up to a maximum of 3 re-tries.Experiments implementing checkpointing using the
AtomicDirectory
checkpointer fromcycling_utils
will have the latest checkpoint available to resume from on re-launch.
Fixes
Artifacts:
Checkpoints already shipped to the Artifact store are not shipped again.
Checkpoint artifacts are sorted by recency with the latest at the top of the page.
Last updated