Joseki WrapperHub

Concept

What is a Behavior Package?

A Behavior Package is a versioned bundle that captures everything about how an LLM should behave. Instead of copying prompts between repos and wiring up evals from scratch each time, you install a Behavior Package the same way you would an npm or PyPI dependency.

A package can be made up of any combination of these typed specs:

PromptSpec — system prompts, templates, examples
FineTuneSpec — training recipes and configs
EvalSpec — eval suites and benchmarks
SafetySpec — safety policies and guardrails
LicenseSpec — usage terms and attribution
InstallSpec — runtime setup and dependencies
EvidencePack — provenance and audit artifacts

Specs

What is PromptSpec?

A PromptSpec defines the system prompt, message templates, and example dialogues a model uses to produce consistent output. It is the smallest reusable unit on WrapperHub.

A PromptSpec includes:

System prompt with placeholders for runtime variables
Few-shot examples used at inference time
Compatible models and provider-specific overrides
Optional formatting and tool-call schemas

Specs

What is FineTuneSpec?

A FineTuneSpec is a portable training recipe. It encodes the dataset reference, hyperparameters, target base model(s), and the loss/eval signals used during training.

Because the recipe is portable, the same package can run on multiple providers (OpenAI fine-tuning API, Anthropic, your own cluster) and produce reproducible results.

Specs

What is EvalSpec?

An EvalSpec is a runnable bundle of evaluation cases. It captures inputs, expected behaviors, judging rubrics (heuristic or model-as-judge), and the reproducibility seed.

EvalSpec is the foundation of Green-Badge QA — a package only earns a green badge when its evals pass against the listed compatible models.

Quality

What is Green-Badge QA?

Green-Badge QA is the quality signal shown next to every package. A green badge means a package has:

Passed every required eval at the published version
Cleared a prompt-injection and safety scan
A signed manifest and matching artifact hashes
A declared, machine-readable license

If any of those checks fails, the badge degrades to review or failing, and consumers see the difference at install time.

Reference

Example manifest

Every Behavior Package ships with a top-level manifest.yaml. The minimal form looks like:

manifest.yamlyaml

name: customer-support-classifier
version: 0.1.0
type: fine_tune_recipe
base_model: gpt-4.1-mini
evals:
  - label_accuracy
  - format_following
  - pii_leakage
safety:
  package_signing: required
  artifact_hashes: required
license: private-commercial

See the customer-support-classifier package for a working example.

Behavior Packages, end to end.

What is a Behavior Package?

What is PromptSpec?

What is FineTuneSpec?

What is EvalSpec?

What is Green-Badge QA?

Example manifest