Skip to main content
flash train --cost prints a pre-flight estimate before you submit a run:
flash train config.toml --cost
Use it before any non-trivial run. The estimate is catalog-deterministic and runs locally: it validates the config, estimates the training loop, prints the quoted USD cost, and exits without submitting. For an uncapped SFT run it loads the environment to count its training examples.
The submit-time quote is the amount Flash checks against your prepaid org balance. Successful runs are billed at the quoted Flash cost. Cancelled runs are repriced to the training progress they actually reached.

What affects training cost

The main levers are the ones you control in the config:
  • Base model. Smaller models are cheaper and faster for smoke tests.
  • Algorithm. SFT usually costs less than GRPO, which samples and scores model completions before each update.
  • Steps or epochs. More SFT epochs, more GRPO steps, or a higher max_steps cap increases cost.
  • Sequence length. Larger max_length and max_tokens increase work per example. Keep them large enough for your prompt and answer, but do not oversize them by default.
  • Batch size and dataset size. For SFT, cost scales with the number of examples trained over and the batch/epoch settings.
  • GRPO group size. group_size controls how many completions are sampled per prompt. Larger groups give a stronger advantage estimate but cost more.
  • Reward latency. If your GRPO reward calls an external model or service, slow grading can increase wall-clock time.
Setup and cold start time are shown for observability, but the training charge is based on the billable training loop.

Reading the preview

A preview includes the model, algorithm, setup estimate, training estimate, billable training time, and total:
Run        : Qwen/Qwen3.5-0.8B  [GRPO, 100 steps]
Setup      : 9.5 min (not billed)
Per step   : 11.06 s
Train      : 18.4 min
Wall clock : 0.47 h
Billable   : 0.31 h (training only)
TOTAL      : $0.21
Treat the preview as the quote for the config you submit. If you edit the environment, dataset, model, algorithm, or [train] settings, run --cost again.

Charges and cancellations

  • A run that completes successfully is billed at the submitted quote.
  • A run cancelled before training starts is not charged for training.
  • A run cancelled after training starts is repriced to the steps it reached.
  • Setup time is reported separately and is not billed as training time.
  • flash status <run-id> shows the current or final cost record.

Serving billing

Serving is billed per token after deployment. Prompt, completion, and cached prompt token rates are listed in Supported models. Prefix caching is automatic; the cached prompt-token rate applies to a reused prefix. See Billing for how it works. Tear down deployments you are done using:
flash undeploy <run-id>

Lowering cost

  • Start with a smaller base model while validating the environment and reward.
  • Run short smoke tests before scaling steps or epochs.
  • For SFT, keep a held-out split and avoid extra epochs once held-out quality stops improving.
  • For GRPO, lower group_size and max_tokens until the reward/data wiring is proven.
  • For thinking runs, raise max_tokens only enough for reasoning plus the final answer.
  • Use flash checkpoints <run-id> and deploy a good intermediate checkpoint instead of assuming the final step is always best.

Why a later quote can change

The quote can change when you change the config, publish different environment contents, update dataset size, or submit after catalog/pricing updates. The authoritative number is the quote returned for the run you actually submit.