Documentation

Behavior Packages, end to end.

Everything you need to know to author, install, and govern a Behavior Package on Joseki WrapperHub.

Concept

What is a Behavior Package?

A Behavior Package is a versioned bundle that captures everything about how an LLM should behave. Instead of copying prompts between repos and wiring up evals from scratch each time, you install a Behavior Package the same way you would an npm or PyPI dependency.

A package can be made up of any combination of these typed specs:

  • PromptSpec — system prompts, templates, examples
  • FineTuneSpec — training recipes and configs
  • EvalSpec — eval suites and benchmarks
  • SafetySpec — safety policies and guardrails
  • LicenseSpec — usage terms and attribution
  • InstallSpec — runtime setup and dependencies
  • EvidencePack — provenance and audit artifacts

Specs

What is PromptSpec?

A PromptSpec defines the system prompt, message templates, and example dialogues a model uses to produce consistent output. It is the smallest reusable unit on WrapperHub.

A PromptSpec includes:

  • System prompt with placeholders for runtime variables
  • Few-shot examples used at inference time
  • Compatible models and provider-specific overrides
  • Optional formatting and tool-call schemas

Specs

What is FineTuneSpec?

A FineTuneSpec is a portable training recipe. It encodes the dataset reference, hyperparameters, target base model(s), and the loss/eval signals used during training.

Because the recipe is portable, the same package can run on multiple providers (OpenAI fine-tuning API, Anthropic, your own cluster) and produce reproducible results.

Specs

What is EvalSpec?

An EvalSpec is a runnable bundle of evaluation cases. It captures inputs, expected behaviors, judging rubrics (heuristic or model-as-judge), and the reproducibility seed.

EvalSpec is the foundation of Green-Badge QA — a package only earns a green badge when its evals pass against the listed compatible models.

Quality

What is Green-Badge QA?

Green-Badge QA is the quality signal shown next to every package. A green badge means a package has:

  • Passed every required eval at the published version
  • Cleared a prompt-injection and safety scan
  • A signed manifest and matching artifact hashes
  • A declared, machine-readable license

If any of those checks fails, the badge degrades to review or failing, and consumers see the difference at install time.

Reference

Example manifest

Every Behavior Package ships with a top-level manifest.yaml. The minimal form looks like:

manifest.yamlyaml
name: customer-support-classifier
version: 0.1.0
type: fine_tune_recipe
base_model: gpt-4.1-mini
evals:
  - label_accuracy
  - format_following
  - pii_leakage
safety:
  package_signing: required
  artifact_hashes: required
license: private-commercial

See the customer-support-classifier package for a working example.