flash train. Flash runs the job on managed infrastructure, supervises it, and
streams checkpoints and logs back to you. This guide covers the config and the
run lifecycle; see the configuration reference for
every field.
Pick a base model
Flash trains a LoRA adapter on top of a supported base model. List the catalog of base model ids and their parameter sizes (see Supported models for algorithms, reasoning, and pricing):Choose SFT or GRPO
sft: supervised fine-tuning, for when you already have the answers. The model imitates the prompt/answer pairs in your environment’s dataset.grpo: reinforcement learning, for when there’s no fixed answer to copy. Your environment’s reward scores each completion.
Anatomy of a config
Infrastructure is managed
You choose the model, algorithm, environment, and[train] settings. Flash
chooses the training resources, retry path, checkpointing, and artifact storage
for the run.
Validate before you submit
--dry-run parses and validates the config locally and prints the resolved job
spec:
Submit the run
flash train follows the logs until the run finishes. Useful flags:
| Flag | Effect |
|---|---|
--background | Submit and return immediately instead of following logs |
--dry-run | Validate locally without submitting |
--cost | Print the pre-flight USD cost and exit (no submit) |
--set key=value | Override a config value (repeatable) |
--config other.toml | Deep-merge additional TOML for config composition (repeatable) |
Cost and billing
Flash checks your org’s prepaid balance against the pre-flight estimate before a run is submitted, then bills a successful run at the quoted Flash cost; a run cancelled after training starts is repriced to the steps it reached, and setup time is reported but not billed. Preview the estimate before you submit:Monitor a run
Ctrl-C during flash train just detaches you. The run keeps going on Freesolo.
After training
When a run reachesdone, serve it with
flash deploy and talk to it with flash chat.