Skip to main content
Datasets live inside the environment. A Flash config points at one published environment id, and the environment’s load_environment() function returns the dataset, prompt builder, and reward logic that Flash uses for SFT, GRPO, and local validation.
[environment]
id = "your-org/your-env"

Task records

Author dataset rows with input and output:
Dataset keyDescription
inputPrompt text for the model.
outputTarget answer or gold completion. For SFT this can be a scalar answer, { "messages": [...] }, or a bare list of chat messages.
metadataOptional dict preserved on example.metadata and available to scoring.
Which column the model learns from depends on the algorithm. SFT trains directly on output, the gold answer. GRPO (RL) uses only input: the model generates its own answers from the prompt and learns from your reward, so output is optional there and is read only if your score_response uses it as a reference.
load_task_examples(...) accepts a local file path or an iterable of records.
  • File formats: .jsonl, .json, .csv, .txt, or .bson.
  • Field mapping: input -> example.input, output -> example.output, metadata -> example.metadata.
  • Original row: the untouched record stays available as example.record.
dataset/train.jsonl
{"input":"What is 2 + 2?","output":"4"}
{"input":"What is 3 + 5?","output":"8"}
Each row must be input plus an optional output; alternate prompt or target key names are not accepted. Records are canonicalized to exactly input/output/metadata.
When Flash builds your training records it keeps only input/output/metadata and silently drops every other top-level key before the row reaches a training worker. Anything your scorer needs beyond the gold output string (a puzzle’s initial_board, the oracle_ids a retrieval must return, unit tests to check code against, a grading rubric) has to live under metadata, or it is gone with no runtime warning.

Message-shaped SFT targets

For SFT, output is the gold completion appended after the environment’s prompt messages. A scalar output becomes one assistant message. If you need to teach a multi-turn trajectory or native tool calling, set output to {"messages": [...]} or to a bare list of chat messages. Flash preserves those assistant, tool-call, tool-result, and reply messages when it builds the SFT example.
dataset/train.jsonl
{"input":"Refund my last order.","output":{"messages":[{"role":"assistant","content":null,"tool_calls":[{"id":"call_refund","type":"function","function":{"name":"refund_order","arguments":"{\"order\":\"last\"}"}}]},{"role":"tool","tool_call_id":"call_refund","content":"{\"ok\":true}"},{"role":"assistant","content":"Your last order has been refunded."}]}}
Use Environment.sft_completion(example) if your environment needs to synthesize or transform the gold completion before SFT.

Validate thinking-model SFT targets

SFT on a thinking model (thinking = true) expects each gold completion to literally contain a <think>...</think> block. Catch missing blocks locally before submitting a run:
from freesolo.datasets import load_dataset, warn_missing_think_tags

dataset = load_dataset("dataset/train.jsonl")
missing = warn_missing_think_tags(dataset.examples)  # UserWarning + list of offending ids
It returns the ids of offending examples and emits a UserWarning naming the first few. Unlabeled records (no output) are skipped.

Load sidecars

Read packaged files relative to __file__. That works locally and when the environment runs on a worker.
environment.py
from pathlib import Path
from freesolo.datasets.records import load_task_examples
from freesolo.environments import EnvironmentSingleTurn

ROOT = Path(__file__).parent


class MathEnv(EnvironmentSingleTurn):
    def __init__(self, *, split: str = "train") -> None:
        # read a packaged dataset file relative to environment.py
        self.dataset = load_task_examples(ROOT / "dataset" / f"{split}.jsonl")
The rest of the env class (build_prompt_messages, score_response) is covered in Environments. Then select the split from your Flash config:
[environment]
id = "your-org/math"

[environment.params]
split = "eval"
[environment.params] values are passed to your load_environment(**kwargs). split is also honored by Flash itself: for an environment packaged with dataset files, split = "eval" selects dataset/eval.jsonl (or .json) as the dataset Flash trains on — SFT targets and GRPO problem selection alike. If the environment packages a default train split but the requested split file does not exist, the run fails at load time instead of silently falling back to train.jsonl. An explicit dataset_path param takes precedence over split.

What gets uploaded

For a local environment directory, flash env push includes:
  • environment.py, always at the artifact root.
  • Sibling Python helper files.
  • Sidecar directory named dataset.
  • Common sibling data files such as .jsonl, .json, .csv, .txt, .md, .parquet, .tsv, .yaml, and .yml.
Workspace metadata, cache directories, virtualenvs, and version-control files are skipped. Keep the artifact small: environment uploads are capped at 64 MB compressed and 256 MB uncompressed. For large corpora, keep the data in an external store and pass the identifier or URL through [environment.params].