> ## Documentation Index
> Fetch the complete documentation index at: https://freesolo.co/docs/llms.txt
> Use this file to discover all available pages before exploring further.

# Training

> Write a config, submit a managed run, and follow it to completion.

A training run is described by a single TOML config and submitted with
`flash train`. Flash runs the job on managed infrastructure, supervises it, and
streams checkpoints and logs back to you. This guide covers the config and the
run lifecycle; see the [configuration reference](/reference/configuration) for
every field.

## Pick a base model

Flash trains a [LoRA adapter](/how-flash-works) on top of a supported base
model. List the catalog of base model ids and their parameter sizes (see
[Supported models](/reference/models) for algorithms, reasoning, and pricing):

```bash theme={null}
flash models
```

Set your choice at the top of the config:

```toml theme={null}
model = "Qwen/Qwen3.5-4B"
```

## Choose SFT or GRPO

<CodeGroup>
  ```toml SFT theme={null}
  algorithm = "sft"
  ```

  ```toml GRPO (RL) theme={null}
  algorithm = "grpo"
  ```
</CodeGroup>

* **`sft`**: supervised fine-tuning, for when you already have the answers. The
  model imitates the prompt/answer pairs in your environment's [dataset](/guides/datasets).
* **`grpo`**: reinforcement learning, for when there's no fixed answer to copy.
  Your environment's reward scores each completion.

Both are driven by the same [environment](/guides/environments). See
[how Flash works](/how-flash-works) for the difference.

## Anatomy of a config

```toml theme={null}
model = "Qwen/Qwen3.5-4B"
algorithm = "sft"
# thinking = true        # opt into reasoning mode (thinking-capable models only)

[environment]
id = "your-org/your-env"

[train]
epochs = 3                           # SFT is epoch-driven; GRPO is step-driven (set steps = N instead)
max_examples = 1000                  # SFT: rows to train on (set to your dataset size)
lora_rank = 32
lora_alpha = 64
# learning_rate = 1e-4
# batch_size = 8
# group_size = 8                     # GRPO: completions sampled per prompt

[wandb]
# project = "my-project"             # optional Weights & Biases logging
```

## Infrastructure is managed

You choose the model, algorithm, environment, and `[train]` settings. Flash
chooses the training resources, retry path, checkpointing, and artifact storage
for the run.

## Validate before you submit

`--dry-run` parses and validates the config locally and prints the resolved job
spec:

```bash theme={null}
flash train config.toml --dry-run
```

## Submit the run

```bash theme={null}
flash train config.toml
```

By default `flash train` follows the logs until the run finishes. Useful flags:

| Flag                  | Effect                                                         |
| --------------------- | -------------------------------------------------------------- |
| `--background`        | Submit and return immediately instead of following logs        |
| `--dry-run`           | Validate locally without submitting                            |
| `--cost`              | Print the pre-flight USD cost and exit (no submit)             |
| `--set key=value`     | Override a config value (repeatable)                           |
| `--config other.toml` | Deep-merge additional TOML for config composition (repeatable) |

```bash theme={null}
# override values at submit time
flash train config.toml --set train.steps=300 --set train.lora_rank=16
```

## Cost and billing

Flash checks your org's prepaid balance against the pre-flight estimate before a
run is submitted, then bills a successful run at the quoted Flash cost; a run
cancelled after training starts is repriced to the steps it reached, and setup
time is reported but not billed. Preview the estimate before you submit:

```bash theme={null}
flash train config.toml --cost
```

See [Cost and billing](/reference/cost-model) for what affects cost, how to read
the preview output, and how cancellations are repriced.

## Monitor a run

`Ctrl-C` during `flash train` just detaches you. The run keeps going on Freesolo.

```bash theme={null}
flash runs                # all your runs: state, cost, model
flash status <run-id>     # status JSON, including the cost record
flash status <run-id> --follow  # poll status until completion, without replaying logs
flash log <run-id>              # print the full log snapshot
flash log <run-id> --follow     # stream logs until completion
flash cancel <run-id>     # cancel a run
```

## After training

When a run reaches `done`, serve it with
[`flash deploy`](/guides/deploy-and-chat) and talk to it with `flash chat`.