Concept
What is a Behavior Package?
A Behavior Package is a versioned bundle that captures everything about how an LLM should behave. Instead of copying prompts between repos and wiring up evals from scratch each time, you install a Behavior Package the same way you would an npm or PyPI dependency.
A package can be made up of any combination of these typed specs:
PromptSpec— system prompts, templates, examplesFineTuneSpec— training recipes and configsEvalSpec— eval suites and benchmarksSafetySpec— safety policies and guardrailsLicenseSpec— usage terms and attributionInstallSpec— runtime setup and dependenciesEvidencePack— provenance and audit artifacts
Specs
What is PromptSpec?
A PromptSpec defines the system prompt, message templates, and example dialogues a model uses to produce consistent output. It is the smallest reusable unit on WrapperHub.
A PromptSpec includes:
- System prompt with placeholders for runtime variables
- Few-shot examples used at inference time
- Compatible models and provider-specific overrides
- Optional formatting and tool-call schemas
Specs
What is FineTuneSpec?
A FineTuneSpec is a portable training recipe. It encodes the dataset reference, hyperparameters, target base model(s), and the loss/eval signals used during training.
Because the recipe is portable, the same package can run on multiple providers (OpenAI fine-tuning API, Anthropic, your own cluster) and produce reproducible results.
Specs
What is EvalSpec?
An EvalSpec is a runnable bundle of evaluation cases. It captures inputs, expected behaviors, judging rubrics (heuristic or model-as-judge), and the reproducibility seed.
Eval results help consumers understand how a package performs against the listed compatible models before they adopt it in production.
Trust & data
Data packs: hosted vs self-hosted
A data pack is the dataset, schema, eval examples, citations, and version metadata bundled with a Behavior Package so others can reproduce the same workflow—not just the prompt text.
Joseki supports two complementary modes:
- Hosted on Joseki — you upload a file per wrapper version. We store a
sha256digest, size, declared license, optional provenance JSON, optional eval-set reference, and a malware scan status (today: synchronous heuristics; production path is async vendor scanning). Downloads respectvisibility:inheritfollows the wrapper's public/private flag;publicorprivateoverride who may fetch bytes. Version rows are immutable, so the pack is pinned to that release. - External (self-hosted) — for enterprises, regulated teams, or large blobs, you keep custody in
s3,r2,gcs,github, a plainurl, or aprivate_api. The registry stores the URI, optional checksum,access_policyhint (signed URL, OAuth, API key, or user-managed), license string, and a logicalversionfor the data artifact. Consumers resolve access under their own credentials—Joseki never becomes the default custodian of sensitive data unless you opt into hosted mode.
Runs can optionally carry data_pack_sha256 (must match the hosted blob when one exists) plus a reproducibility_context JSON object (eval case ids, input fingerprints, schema versions) so audits and liveness reports line up with exact inputs.
API: list versions GET /wrappers/{id}/versions, trust view GET /wrappers/{id}/versions/{vid}/data-pack, set external manifest PUT .../data-pack/manifest, upload hosted bytes POST .../data-pack/hosted (multipart), download GET .../data-pack/download.
data_pack:
mode: external
source_type: s3
uri: s3://my-org-bucket/prompt-packs/tax-2026/v3/dataset.parquet
checksum: sha256:8f434346648f6b96df89dda901c5176b10a6d83961dd3c1ac88b59b2dc09aa8ad
checksum_algorithm: sha256
access_policy: signed_url
license: CC-BY-4.0
version: "2026.03.1"
provenance:
schema_uri: https://example.org/schemas/tax-v1.json
citations:
- https://www.irs.gov/publications/p17
eval_set_uri: https://example.org/evals/tax-pack-v3.jsonlReference
Example manifest
Every Behavior Package ships with a top-level manifest.yaml. The minimal form looks like:
name: customer-support-classifier version: 0.1.0 type: fine_tune_recipe base_model: gpt-4.1-mini evals: - label_accuracy - format_following - pii_leakage safety: package_signing: required artifact_hashes: required license: private-commercial
See the customer-support-classifier package for a working example.
Not every package is a fine-tune. A prompt_template package, for instance, exposes runtime variables and a battery of evals that prove the generated artifact actually works. A toy "build me Pac-Man" package looks like:
name: pacman-builder version: 0.2.1 type: prompt_template description: >- Generates a runnable, single-file Pac-Man clone in the requested language. Includes maze layout, ghost AI behaviors, scoring, and a level-completion check. base_model: - gpt-4.1 - claude-4.6-sonnet inputs: - target_language # python | javascript | typescript - canvas_size # e.g. "28x31" (classic arcade dimensions) - difficulty # easy | classic | nightmare - ghost_ai # chase | scatter | mixed outputs: - source_files # map of filename -> file body - run_instructions # how to launch the game locally evals: - render_smoke # generated code starts without runtime errors - ghost_ai_correctness # ghosts respect their assigned mode - score_logic # dot/pellet/ghost points match arcade rules - level_completion # clearing all dots ends the level safety: package_signing: required artifact_hashes: required no_external_calls: required # generated game must run fully offline license: apache-2.0
The shape of the manifest is the same — only the type, the declared inputs/outputs, and the eval suite change to match what this package actually produces.