flash train --cost prints a pre-flight estimate before you submit a run:
The submit-time quote is the amount Flash checks against your prepaid org
balance. Successful runs are billed at the quoted Flash cost. Cancelled runs
are repriced to the training progress they actually reached.
What affects training cost
The main levers are the ones you control in the config:- Base model. Smaller models are cheaper and faster for smoke tests.
- Algorithm. SFT usually costs less than GRPO, which samples and scores model completions before each update.
- Steps or epochs. More SFT
epochs, more GRPOsteps, or a highermax_stepscap increases cost. - Sequence length. Larger
max_lengthandmax_tokensincrease work per example. Keep them large enough for your prompt and answer, but do not oversize them by default. - Batch size and dataset size. For SFT, cost scales with the number of examples trained over and the batch/epoch settings.
- GRPO group size.
group_sizecontrols how many completions are sampled per prompt. Larger groups give a stronger advantage estimate but cost more. - Reward latency. If your GRPO reward calls an external model or service, slow grading can increase wall-clock time.
Reading the preview
A preview includes the model, algorithm, setup estimate, training estimate, billable training time, and total:[train] settings, run --cost
again.
Charges and cancellations
- A run that completes successfully is billed at the submitted quote.
- A run cancelled before training starts is not charged for training.
- A run cancelled after training starts is repriced to the steps it reached.
- Setup time is reported separately and is not billed as training time.
flash status <run-id>shows the current or final cost record.
Serving billing
Serving is billed per token after deployment. Prompt, completion, and cached prompt token rates are listed in Supported models. Prefix caching is automatic; the cached prompt-token rate applies to a reused prefix. See Billing for how it works. Tear down deployments you are done using:Lowering cost
- Start with a smaller base model while validating the environment and reward.
- Run short smoke tests before scaling steps or epochs.
- For SFT, keep a held-out split and avoid extra epochs once held-out quality stops improving.
- For GRPO, lower
group_sizeandmax_tokensuntil the reward/data wiring is proven. - For thinking runs, raise
max_tokensonly enough for reasoning plus the final answer. - Use
flash checkpoints <run-id>and deploy a good intermediate checkpoint instead of assuming the final step is always best.