Resuming Experiments
Last updated
Last updated
If your job failed, but had checkpoint progress you want to make use of
If your job is still going/succeeded, but you want to launch a new job using one of its checkpoints
Click the green "Outputs" for the experiment you'd like to resume from.
Find the line which says "checkpoints".
Click the "Sync to Workstation" button on the far right side of that line.
Wait for notification in the bottom right to change to "Syncing Complete!".
Resume Training:
Retrieve the artifact ID for the checkpoint from the UI.
Pass the ID into the ISC file using the argument: input_artifact_id_list
.
Example ISC file:
Relaunch experiment and your experiment should resume! 🎉