load_environment() function returns the
dataset, prompt builder, and reward logic that Flash uses for SFT, GRPO, and
local validation.
Task records
Author dataset rows withinput and output:
| Dataset key | Description |
|---|---|
input | Prompt text for the model. |
output | Target answer or gold completion. For SFT this can be a scalar answer, { "messages": [...] }, or a bare list of chat messages. |
metadata | Optional dict preserved on example.metadata and available to scoring. |
Which column the model learns from depends on the algorithm. SFT trains
directly on
output, the gold answer. GRPO (RL) uses only input: the
model generates its own answers from the prompt and learns from your reward,
so output is optional there and is read only if your score_response uses
it as a reference.load_task_examples(...) accepts a local file path or an iterable of records.
- File formats:
.jsonl,.json,.csv,.txt, or.bson. - Field mapping:
input->example.input,output->example.output,metadata->example.metadata. - Original row: the untouched record stays available as
example.record.
dataset/train.jsonl
input plus an optional output; alternate prompt or target
key names are not accepted. Records are canonicalized to exactly
input/output/metadata.
Message-shaped SFT targets
For SFT,output is the gold completion appended after the environment’s
prompt messages. A scalar output becomes one assistant message. If you need to
teach a multi-turn trajectory or native tool calling, set output to
{"messages": [...]} or to a bare list of chat messages. Flash preserves those
assistant, tool-call, tool-result, and reply messages when it builds the SFT
example.
dataset/train.jsonl
Environment.sft_completion(example) if your environment needs to
synthesize or transform the gold completion before SFT.
Validate thinking-model SFT targets
SFT on a thinking model (thinking = true) expects each gold completion to
literally contain a <think>...</think> block. Catch missing blocks locally
before submitting a run:
UserWarning naming the
first few. Unlabeled records (no output) are skipped.
Load sidecars
Read packaged files relative to__file__. That works locally and when the
environment runs on a worker.
environment.py
build_prompt_messages, score_response) is covered
in Environments.
Then select the split from your Flash config:
[environment.params] values are passed to your load_environment(**kwargs).
split is also honored by Flash itself: for an environment packaged with
dataset files, split = "eval" selects dataset/eval.jsonl (or .json) as
the dataset Flash trains on — SFT targets and GRPO problem selection alike.
If the environment packages a default train split but the requested split
file does not exist, the run fails at load time instead of silently falling
back to train.jsonl. An explicit dataset_path param takes precedence over
split.What gets uploaded
For a local environment directory,flash env push includes:
environment.py, always at the artifact root.- Sibling Python helper files.
- Sidecar directory named
dataset. - Common sibling data files such as
.jsonl,.json,.csv,.txt,.md,.parquet,.tsv,.yaml, and.yml.
[environment.params].