YAML in.
Fine-tuned model out.
ForgeLM is the config-driven LLM fine-tuning toolkit for teams that ship into regulated environments.
Six post-training paradigms behind one declarative interface.
Supervised fine-tuning on instruction pairs
Direct Preference Optimization on chosen/rejected pairs
Reference-free preference learning, lower memory
Kahneman-Tversky on binary feedback signals
Odds Ratio Preference Optimization, single pass
Group Relative Policy Optimization for reasoning RL
Three things separate ForgeLM from notebook-first frameworks.
Your run is a YAML file under version control.
Every run emits an Annex IV artifact.
When a benchmark drops below the floor, ForgeLM discards the trained artefacts and exits non-zero so CI gates fail loudly.
From data ingestion through evaluation to deployment artifacts.
PDF, DOCX, EPUB, TXT, Markdown → SFT-ready JSONL.
PII, secrets, leakage, language detect, near-duplicates.
4-bit NF4 quantization with PEFT adapters.
Llama Guard with confidence-weighted scoring.
Plug-in lm-evaluation-harness tasks.
DeepSpeed ZeRO presets, FSDP, multi-GPU configs.
Pre-flight memory estimator.
Streaming REPL with /reset, /save, /temperature, /system commands. Per-response Llama Guard routing is planned for an upcoming Pro CLI release.
GGUF export with 6 quant levels.
Use from forgelm import … to embed ForgeTrainer, audit_dataset, verify_audit_log, and verify_annex_iv_artifact calls inside your own Python pipelines.
Audit-trail, change-management, data-lineage, and supply-chain evidence the deployer's auditor asks for. Software is aligned, not certified.
Article 17 erasure with forgelm purge and Article 15 access with forgelm reverse-pii, both wired into the audit log.
A SFT → DPO run with safety eval, benchmark gate, and Annex IV export.
model: name_or_path: "Qwen/Qwen2.5-7B-Instruct" load_in_4bit: true max_length: 4096 lora: r: 16 alpha: 32 method: "dora" # lora / DoRA / PiSSA / rsLoRA data: dataset_name_or_path: "data/policies.jsonl" training: trainer_type: "sft" # sft / dpo / simpo / kto / orpo / grpo num_train_epochs: 3 per_device_train_batch_size: 2 learning_rate: 2.0e-5 output_dir: "./checkpoints/policy-bot" evaluation: benchmark: enabled: true tasks: ["hellaswag", "arc_easy", "truthfulqa"] min_score: 0.65 # aggregate floor; run fails (exit 3) below this safety: enabled: true classifier: "meta-llama/Llama-Guard-3-8B" severity_thresholds: { critical: 0.0, high: 0.01 } require_human_approval: true # Article 14 oversight gate compliance: provider_name: "Acme Inc." risk_classification: "limited-risk" webhook: url: "${SLACK_WEBHOOK}"
$ forgelm --config configs/policy-bot.yaml --dry-run # validate $ forgelm --config configs/policy-bot.yaml --fit-check # VRAM check $ forgelm --config configs/policy-bot.yaml # run
Drop your regulatory corpus into forgelm ingest.
SFT on past tickets, DPO on agent feedback.
forgelm quickstart customer-supportMulti-dataset mix, SFT + ORPO, GGUF export.
forgelm quickstart code-assistantBuilt-in shaping rewards or your own reward function.
forgelm quickstart grpo-mathPick a template, point it at your data, watch the audit report go green.