Skip to main content
Most flash errors print one clean line; add the global --debug flag before the subcommand (e.g. flash --debug train config.toml) for the full traceback.

Installation & CLI

The CLI installs a single flash command, but its install location has to be on your PATH.
  • If you installed with uv tool install freesolo-flash, make sure uv’s tool bin directory is on your PATH (run uv tool update-shell, then restart your shell).
  • Confirm the install with flash version.
The CLI is published to PyPI as freesolo-flash, the bare flash name belongs to an unrelated project. Reinstall the right one:
uv tool install freesolo-flash

Authentication

Every command authenticates with a Freesolo API key, verified against Freesolo at login.
  • Create a key in your dashboard at freesolo.co.
  • Log in once: flash login --api-key <your-key> (or set FREESOLO_API_KEY instead of passing --api-key).
  • Confirm who the stored key resolves to: flash whoami.
By default the CLI talks to https://api.freesolo.co. To target a different deployment, set --freesolo-url (or FREESOLO_BASE_URL) at login.

Environments

Make sure the folder you push contains an environment.py file with a load_environment() function that returns a Freesolo environment, then:
flash env push --name math math
It prints the published id (your-org/math) to put in your config’s [environment] id. If you pass --name namespace/name, the namespace must match your Freesolo org; otherwise pass a bare name such as math.
Managed training workers have the Freesolo SDK available when they import a published environment. Your local Python environment does not get that SDK automatically from the flash CLI. Install it locally when you run or test environment.py directly, or when you use a command such as flash train --cost that loads the environment to count examples. --dry-run validates only the config and does not import the environment:
uv pip install freesolo
To pull a published env into your project for local work, use flash env pull your-org/your-env.
flash train --dry-run validates the config, but it does not prove every remote environment dependency is installed. If flash log <run-id> shows your environment.py failed while importing a task library, add that package to [environment].pip, publish the environment, and submit again:
[environment]
id = "your-org/your-env"
pip = ["math-verify>=0.8.0"]
Only list packages your environment imports. Flash does not install worker dependencies from a pyproject.toml, requirements.txt, or lockfile bundled with the environment; keep managed-run dependencies in [environment].pip.
[environment].pip is for task dependencies imported by your environment. Do not pin Flash’s managed training stack there, such as torch, trl, vllm, peft, or bitsandbytes, unless your environment directly imports that package. Extra pins can conflict with the worker’s tested training recipe. Remove the pin and resubmit.
Pull the specific file you need instead of the whole environment:
flash env pull your-org/your-env environment.py -o environment.py
flash env pull your-org/your-env dataset/eval.jsonl -o eval.jsonl
Keep published environments focused on source, small sidecars, and datasets needed by the run. Do not publish virtualenvs, local caches, model weights, or generated artifacts.
Use flash env pull to inspect the exact packaged file:
flash env pull your-org/your-env dataset/train.jsonl -o train.jsonl
For clean A/B experiments, publish changed datasets under a fresh env name so old runs, new runs, and local files are easy to tell apart.
If your environment module shares a name with an installed Python package, it can shadow or be shadowed by that package. Keep helper module names distinct from installed packages.
The [environment] id must be a published Freesolo environment id, produced by flash env push, for example your-org/your-env. A local file path is not a valid id, so publish it first or reference an existing published id. Use flash env pull your-org/your-env only when you want a local copy to edit or inspect.

Configuration

model must be one of the ids in the curated catalog. List the valid ids:
flash models
Managed runs train catalog models only. See Supported models.
Flash rejects unknown config sections and [train] keys at parse time. Check the key against the configuration reference and validate locally:
flash train config.toml --dry-run
algorithm must be sft (the default) or grpo. Fix the value and re-validate with --dry-run.

Run fit and resource use

Expected. GRPO samples multiple completions, scores them, and updates from that group of attempts. For the same model, it usually needs more room and costs more than SFT. If a GRPO run is too expensive or too large, use a smaller model, reduce group_size, reduce max_tokens, reduce max_length, or start with SFT.
The usual causes are long context, long generated completions, a large GRPO group_size, or a larger base model than the task needs. Reduce max_length, max_tokens, or group_size, or switch to a smaller model. If you recently enabled thinking = true, remember that reasoning and the final answer share the same token budget.

Training runs

Flash checks your prepaid org balance against the pre-flight estimate before submit, then bills successful runs at the quoted Flash cost. Cancelled runs are repriced to the training steps they reached, and setup time is reported separately without being billed. Preview the estimate before you submit:
flash train config.toml --cost
Add funds or reduce the run estimate before submitting again. Smaller base models, fewer GRPO steps, lower group_size, shorter max_tokens, and SFT smoke tests are the fastest ways to lower the pre-flight estimate.
The task is too hard for the model at its current ability: if no rollout ever scores, there’s nothing for GRPO to reinforce. Try a stronger/larger base model, make the task easier to get started, or double-check your reward actually returns a positive reward for good answers.
Usually the reward is not discriminative: it scores almost everything the same. Make the reward function separate better answers from worse ones so GRPO has a spread of scores to learn from. See Environments.
When thinking = true, the model emits a reasoning trace before its answer, and that trace counts against the same max_tokens budget as the answer. A max_tokens tuned for a non-thinking run is usually too small once reasoning is added: the reasoning eats the budget and the actual answer is truncated or never emitted. If your reward parses the answer (e.g. extracts a JSON object), it then sees nothing and scores ~0 across the board — even though the model is “working”.Fixes:
  • Raise max_tokens so the reasoning and the answer both fit (e.g. a task that needs ~200 answer tokens may need max_tokens = 2048 with thinking on), and make sure max_length is large enough to hold the prompt plus that budget.
  • Optionally set thinking_length_penalty_coef to nudge the model toward shorter reasoning so the answer reliably lands inside the budget.
  • Score the answer text by default. In thinking mode response_text remains string-compatible answer text and also exposes response_text.completion, response_text.thinking, and response_text.raw for rewards that intentionally inspect reasoning.
The same trap applies to any reasoning model you call as an LLM judge from a reward: give the judge call enough max_tokens or it returns empty content and the judge silently scores 0.
In Qwen3.5 thinking mode, the chat template treats prior and next assistant turns differently: it strips literal <think> blocks from non-final assistant history, then pre-opens <think>\n in the next generation prompt. A naive multi-turn SFT transcript that puts <think>...</think> in every assistant turn can therefore train on a tag layout that inference will never render. The symptom is doubled, missing, or misplaced thinking tags, or an adapter that behaves differently in training-style evals than it does when served.Fixes:
  • For message-shaped multi-turn SFT targets, keep intermediate assistant turns as the actual code, tool, or action text only.
  • Put <think>...</think> plus the final answer only in the final assistant target.
  • Do not add a second opener for the template’s pre-opened <think>\n. Flash’s completion-only SFT masking uses the longest shared token prefix, so that pre-opened tag is treated as prompt text.
No. Ctrl-C during flash train just detaches you; the run keeps going on Freesolo. Re-follow it any time, and cancel explicitly if you mean to:
flash runs              # state and cost of your runs
flash log <run-id> -f       # re-follow the logs
flash status <run-id>       # status and cost JSON
flash cancel <run-id>       # stop it
Expected. Cancellation waits for the managed worker to stop and clean up before confirming, which can take several minutes. The CLI waits this teardown out; the run is marked cancelled when it completes.
Runs are supervised by Flash: a stall watchdog plus bounded auto-retry that resumes from the last streamed checkpoint when possible. If the run ultimately succeeds, the charge remains the quoted Flash cost. Watch logs with flash log <run-id> -f or poll status with flash status <run-id> -f. If the same shape repeatedly fails before useful metrics, reduce max_length, max_tokens, or group_size.

Serving

A run cancelled or preempted before finalizing has no final adapter, so a plain flash deploy <run-id> cannot serve it. The error lists the run’s saved checkpoint steps and the exact command to use — deploy one of them instead:
flash checkpoints <run-id>
flash deploy <run-id>/step-<N>
Deploy registers and warms the adapter on Freesolo’s managed serving service. Large models can take a few minutes before the endpoint is ready. See Deploy & chat.
Serving is billed per token for requests. Prompt, completion, and cached prompt token rates are listed in Supported models. flash undeploy <run-id> deregisters the adapter.
Deployments are OpenAI-compatible. Use the endpoint from flash deployments as the base_url and the <run-id> as the model. The OpenAI SDK requires an api_key, and the serving endpoint requires it — pass your Freesolo API key (the same key flash login uses); serving authorizes every request against the org that owns the adapter, so a placeholder key is rejected (401/403). See Use it from your own code.

Getting help

Still stuck? Add --debug to surface the full traceback, then reach out through freesolo.co/contact with the run id (flash runs) and the failing command.