> ## Documentation Index
> Fetch the complete documentation index at: https://freesolo.co/docs/llms.txt
> Use this file to discover all available pages before exploring further.

# Cost and billing

> How to preview Flash training cost, what affects it, and how charges are applied.

`flash train --cost` prints a pre-flight estimate before you submit a run:

```bash theme={null}
flash train config.toml --cost
```

Use it before any non-trivial run. The estimate is catalog-deterministic and
runs locally: it validates the config, estimates the training loop, prints the
quoted USD cost, and exits without submitting. For an uncapped SFT run it loads
the environment to count its training examples.

<Note>
  The submit-time quote is the amount Flash checks against your prepaid org
  balance. Successful runs are billed at the quoted Flash cost. Cancelled runs
  are repriced to the training progress they actually reached.
</Note>

## What affects training cost

The main levers are the ones you control in the config:

* **[Base model](/reference/models).** Smaller models are cheaper and faster for smoke tests.
* **[Algorithm](/guides/training#choose-sft-or-grpo).** SFT usually costs less than GRPO, which samples and scores
  model completions before each update.
* **Steps or epochs.** More SFT `epochs`, more GRPO `steps`, or a higher
  `max_steps` cap increases cost.
* **Sequence length.** Larger `max_length` and `max_tokens` increase work per
  example. Keep them large enough for your prompt and answer, but do not
  oversize them by default.
* **Batch size and dataset size.** For SFT, cost scales with the number of
  examples trained over and the batch/epoch settings.
* **GRPO group size.** `group_size` controls how many completions are sampled per
  prompt. Larger groups give a stronger advantage estimate but cost more.
* **Reward latency.** If your GRPO reward calls an external model or service,
  slow grading can increase wall-clock time.

Setup and cold start time are shown for observability, but the training charge
is based on the billable training loop.

## Reading the preview

A preview includes the model, algorithm, setup estimate, training estimate,
billable training time, and total:

```text theme={null}
Run        : Qwen/Qwen3.5-0.8B  [GRPO, 100 steps]
Setup      : 9.5 min (not billed)
Per step   : 11.06 s
Train      : 18.4 min
Wall clock : 0.47 h
Billable   : 0.31 h (training only)
TOTAL      : $0.21
```

Treat the preview as the quote for the config you submit. If you edit the
environment, dataset, model, algorithm, or `[train]` settings, run `--cost`
again.

## Charges and cancellations

* A run that completes successfully is billed at the submitted quote.
* A run cancelled before training starts is not charged for training.
* A run cancelled after training starts is repriced to the steps it reached.
* Setup time is reported separately and is not billed as training time.
* `flash status <run-id>` shows the current or final cost record.

## Serving billing

Serving is billed per token after deployment. Prompt, completion, and cached
prompt token rates are listed in [Supported models](/reference/models#serving-prices).

Prefix caching is automatic; the cached prompt-token rate applies to a reused
prefix. See [Billing](/guides/deploy-and-chat#billing) for how it works.

Tear down deployments you are done using:

```bash theme={null}
flash undeploy <run-id>
```

## Lowering cost

* Start with a smaller base model while validating the [environment](/guides/environments) and reward.
* Run short smoke tests before scaling steps or epochs.
* For SFT, keep a held-out split and avoid extra epochs once held-out quality
  stops improving.
* For GRPO, lower `group_size` and `max_tokens` until the reward/data wiring is
  proven.
* For thinking runs, raise `max_tokens` only enough for reasoning plus the final
  answer.
* Use `flash checkpoints <run-id>` and deploy a good intermediate checkpoint
  instead of assuming the final step is always best.

## Why a later quote can change

The quote can change when you change the config, publish different environment
contents, update dataset size, or submit after catalog/pricing updates. The
authoritative number is the quote returned for the run you actually submit.