slipa routes your fine-tuning job to the lowest-priced available spot GPU across
RunPod, Lambda Labs, Vast.ai, and more. Checkpoints survive eviction. You get back
the weights.
swap mid-training when a cheaper instance opens up
How it works
One command. The cheapest available GPU. Checkpoint-robust.
You don't pick a provider, a region, or a SKU. You pick a model, a dataset, and a
budget. slipa does the rest — including recovering from spot evictions without
losing training progress.
Scoring weights each quote by dataset size, tokens/sec for your model/method,
empirical eviction rate over the last 7 days, and cold-start latency. A $0.79/hr
quote with 3× the eviction rate is not cheaper — and slipa knows that.
▲
Manifest-verified resumption
Every checkpoint writes a manifest after its files. Resume walks newest-to-oldest
and skips any checkpoint whose sizes don't match. Upload failures trip an abort
gate instead of silently corrupting state. Eviction signals trigger a synchronous
adapter-only save inside the SIGTERM→SIGKILL window.
↻
Mid-training rebid
Every 5 minutes the workflow polls the market for a replacement. When the best
alternative beats the current instance by ≥25% on remaining cost, slipa migrates
mid-job — from the latest verified checkpoint, on a new provider, without losing
progress.
Supported providers
Five spot markets, one bidder.
Adapters for RunPod, Lambda Labs, Vast.ai, TensorDock, and Paperspace. Polled every
60 seconds, cached in Redis under a distributed lock so a thousand jobs don't
thunder the provider APIs. Add your own by dropping a file in
providers/adapters/.
Beta access
Free during preview. $50/day soft cap per user.
Leave your email and we'll send an API key when a beta slot opens. No credit card,
no commitment.