Concept
What is a Behavior Package?
A Behavior Package is a versioned bundle that captures everything about how an LLM should behave. Instead of copying prompts between repos and wiring up evals from scratch each time, you install a Behavior Package the same way you would an npm or PyPI dependency.
A package can be made up of any combination of these typed specs:
PromptSpec— system prompts, templates, examplesFineTuneSpec— training recipes and configsEvalSpec— eval suites and benchmarksSafetySpec— safety policies and guardrailsLicenseSpec— usage terms and attributionInstallSpec— runtime setup and dependenciesEvidencePack— provenance and audit artifacts
Specs
What is PromptSpec?
A PromptSpec defines the system prompt, message templates, and example dialogues a model uses to produce consistent output. It is the smallest reusable unit on WrapperHub.
A PromptSpec includes:
- System prompt with placeholders for runtime variables
- Few-shot examples used at inference time
- Compatible models and provider-specific overrides
- Optional formatting and tool-call schemas
Specs
What is FineTuneSpec?
A FineTuneSpec is a portable training recipe. It encodes the dataset reference, hyperparameters, target base model(s), and the loss/eval signals used during training.
Because the recipe is portable, the same package can run on multiple providers (OpenAI fine-tuning API, Anthropic, your own cluster) and produce reproducible results.
Specs
What is EvalSpec?
An EvalSpec is a runnable bundle of evaluation cases. It captures inputs, expected behaviors, judging rubrics (heuristic or model-as-judge), and the reproducibility seed.
EvalSpec is the foundation of Green-Badge QA — a package only earns a green badge when its evals pass against the listed compatible models.
Quality
What is Green-Badge QA?
Green-Badge QA is the quality signal shown next to every package. A green badge means a package has:
- Passed every required eval at the published version
- Cleared a prompt-injection and safety scan
- A signed manifest and matching artifact hashes
- A declared, machine-readable license
If any of those checks fails, the badge degrades to review or failing, and consumers see the difference at install time.
Reference
Example manifest
Every Behavior Package ships with a top-level manifest.yaml. The minimal form looks like:
name: customer-support-classifier version: 0.1.0 type: fine_tune_recipe base_model: gpt-4.1-mini evals: - label_accuracy - format_following - pii_leakage safety: package_signing: required artifact_hashes: required license: private-commercial
See the customer-support-classifier package for a working example.