More Examples & Demos

The following examples further demonstrate how to implement interruptibility in distributed training scripts using checkpointing, atomic saving, and stateful samplers.

These examples are being actively developed to achieve [1] interruptibility in distributed training, [2] verified completion of a full training run, and [3] achievement of benchmark performance published by others (where applicable). Each example published below is annotated with its degree of completion. Examples annotated with [0] are "coming soon".

Hello World

Title

Description

Model

Status

Link

Fashion MNIST

Hello World

CNN

[3]

isc-demos/fashion_mnist

ImageNet

Image classification

ResNet50

[3]

isc-demos/imagenet-resnet50

DeepSeek

Large Language Models

DeepSeek-R1 Distillation

[2]

isc-demos/deepseek

Chess Hackathon

Regression

Various

[3]

chess-hackathon

CIFAR100

Image classification

ResNet50

[2]

isc-demos/cifar100-resnet50

pytorch-image-models (timm)

(from https://github.com/huggingface/pytorch-image-models)

Title

Description

Model

Status

Link

resnet50

Image classification

ResNet50

[2]

isc-demos/pytorch-image-models

resnet152

Image classification

ResNet152

[2]

isc-demos/pytorch-image-models

efficientnet_b0

Image classification

EfficientNet B0

[2]

isc-demos/pytorch-image-models

efficientnet_b7

Image classification

EfficientNet B7

[2]

isc-demos/pytorch-image-models

efficientnetv2_s

Image classification

EfficientNetV2 S

[2]

isc-demos/pytorch-image-models

efficientnetv2_xl

Image classification

EfficientNetV2 XL

[2]

isc-demos/pytorch-image-models

vit_base_patch16_224

Image classification

VIT Base Patch16 224

[2]

isc-demos/pytorch-image-models

vit_large_patch16_224

Image classification

VIT Large Patch16 224

[2]

isc-demos/pytorch-image-models

Torchvision segmentation

(from https://github.com/pytorch/vision/tree/main/references/segmentation)

Title

Description

Model

Status

Link

fcn_resnet101

Image segmentation

ResNet101

[2]

isc-demos/tv-segmentation

deeplabv3_mobilenet_v3_large

Image segmentation

MobileNetV3 Large

[2]

isc-demos/tv-segmentation

Torchvision detection

(from https://github.com/pytorch/vision/tree/main/references/detection)

Title

Description

Model

Status

Link

maskrcnn_resnet101_fpn

Object detection

Mask RCNN (ResNet101 FPN)

[2]

isc-demos/tv-detection

retinanet_resnet101_fpn

Object detection

RetinaNet (ResNet101 FPN)

[2]

isc-demos/tv-detection

Detectron2

(from https://github.com/facebookresearch/detectron2)

Title

Description

Model

Status

Link

detectron2

TBC

Detectron2

[2]

isc-demos/detectron2

detectron2_densepose

TBC

Detectron2

[2]

isc-demos/detectron2/projects/densepose

Large Language Models (LLM)

Title

Description

Model

Status

Link

Llama2

LoRA

Llama2

[0]

isc-demos/llama2

Mistral

TBC

Mistral

[0]

isc-demos/mistral

PreviousUse Cases NextGDPR

Last updated 3 months ago