Strong Docs
  • Welcome
  • Getting Started
    • 1. Registration and VPN
    • 2. Setting up your development environment
    • 3. Hello World
  • Basic Concepts
    • Organisation & Teams
    • Containers
    • Projects
    • Datasets
    • Launching Experiments
    • Experiment States
    • Artifacts
    • ISC Commands (CLI)
    • Resuming Experiments
    • Billing
  • Advanced
    • Clusters
    • Destinations
    • BYO Cloud API Keys
    • Cluster health logs
  • Training with ISC
    • Deep dive tutorial
    • Data Parallel Scaling
  • Use Cases
    • More Examples & Demos
  • Change Log
    • May 2025
    • April 2025
    • March 2025
    • February 2025
    • January 2025
    • December 2024
    • November 2024
    • October 2024
    • September 2024
    • August 2024
    • July 2024
Powered by GitBook
On this page
  • Why You Might Want to do This:
  • Steps to Resume
  1. Basic Concepts

Resuming Experiments

PreviousISC Commands (CLI)NextBilling

Last updated 3 months ago

Why You Might Want to do This:

  • If your job failed, but had checkpoint progress you want to make use of

  • If your job is still going/succeeded, but you want to launch a new job using one of its checkpoints

Steps to Resume

A → Retrieving your latest checkpoint

  1. Click the green "Outputs" for the experiment you'd like to resume from.

  1. Find the line which says "checkpoints".

  2. Click the "Sync to Workstation" button on the far right side of that line.

  3. Wait for notification in the bottom right to change to "Syncing Complete!".

B → Add to ISC file and relaunch!

  • Resume Training:

    • Retrieve the artifact ID for the checkpoint from the UI.

    • Pass the ID into the ISC file using the argument: input_artifact_id_list.

Example ISC file:

isc_project_id = "sameAsBefore"
... existing details (replace this line with the rest of your ISC file)
input_artifact_id_list = [ "ID_OF_CHECKPOINT_ARTIFACT_IN_UI" ]

Relaunch experiment and your experiment should resume! 🎉