PH-LLM — native LLM inference program¶

Status: Planning (Wave 0 scaffold landed)
Vision: world-studio-vision.md
RFC: specs/lillm-rfc.md
Package: packages/li-llm (import llm)

Overview¶

PH-LLM delivers native Li LLM inference for World Studio agentic workflows. Complements PH-ML (classical DL) and li-studio-ai (orchestration, not inference).

Native-first: CPU decode path in Wave 1; GPU via @gpu + LKIR in Wave 2. No PyTorch runtime in v1 — weights via HF safetensors / GGUF export.

Work packages¶

WP	Deliverable	Depends	Gate	Status
WP-LLM-01	BPE/byte-level tokenizer + vocab load	std strings	smoke roundtrip	stub
WP-LLM-02	Safetensors + GGUF loader (f32/f16; no MoE v1)	WP-LLM-01	load Llama-3.2-1B-class fixture	stub
WP-LLM-03	Transformer graph (`llm.attn`, `llm.ffn`, RoPE)	`li-ml` matmul	forward vs reference logits ULP	stub
WP-LLM-04	KV-cache + decode loop (`llm_generate`)	WP-LLM-03	greedy decode smoke	stub
WP-LLM-05	CPU perf via `li-ml`/`linalg` matmul	PH-ML WP-ML-04	tier-3 bench row	planned
WP-LLM-06	GPU matmul/attention via `@gpu` + LKIR	PH-HW WP-HW-06	tier-3 GPU column	planned
WP-LLM-07	HF → safetensors export doc + `lillm import` CLI	WP-LLM-02	one documented model path	planned
WP-LLM-08	Trusted backend (Ollama/OpenAI-compatible) — Wave 2	`li-httpd`	audit + `[trusted]` fence	optional

Not in v1¶

Fine-tuning, autograd, MoE, multimodal.

Studio integration¶

packages/li-studio-ai — studio_ai_complete → llm_generate when weights loaded
Fallback: Cursor SDK cloud when no local weights
See studio-cursor-sdk-rfc.md

Milestones (from battle plan)¶

Milestone	When	Evidence
M3	Week 16	lillm loads + greedy-decodes 1B model on CPU (tier-3 correctness row)
M4	Week 20	Agent apply_patch loop in Studio (WP-AG-04)
M6	Month 12	lillm GPU decode + sim_rl training visible

Tracker linkage¶

Add PH-LLM row to PH-world-studio-program.md after Wave 1 CPU matmul (PH-ML Wave 1).