Risk management
A continuous, iterative process: identify foreseeable risks, evaluate, mitigate.
ForgeLM was built for teams that have to defend their training pipeline to a regulator.
Each card maps a Regulation (EU) 2024/1689 article to the ForgeLM module that fulfils it.
A continuous, iterative process: identify foreseeable risks, evaluate, mitigate.
Quality, relevance, representativeness criteria. Bias detection, PII redaction. Optional Presidio ML-NER for unstructured PII; optional Croissant 1.0 dataset card emitted alongside the audit.
Annex IV's nine required §1-9 sections — auto-populated from the run.
Append-only event log over training start/end, eval gates, auto-revert decisions, human-approval gates, GDPR Article 17 erasure, and Article 15 access-request queries.
Auto-generated model card (capabilities, intended use, training data summary) plus deployer instructions, written every successful run.
Optional evaluation.require_human_approval: true moves the trained checkpoint into a staging directory; forgelm approve / reject + forgelm approvals drive the dual-controlled sign-off, all stamped into the audit log.
Benchmark gates per-task floor; if the post-train number drops below the floor, the run fails (exit 3).
QMS SOPs in docs/qms/ cover model training, data management, incident response, change management, plus access control / encryption-at-rest / risk treatment / Statement of Applicability.
--pii-mask redacts emails, phones, credit cards (Luhn-validated), IBAN, TR national ID, DE Personalausweis, FR SSN, US SSN before chunks land in JSONL.
forgelm reverse-pii scans masked JSONL corpora for plaintext residuals or salted SHA-256 digests of a data-subject identifier and reports every row of every split where that subject appears, satisfying the access-request side of the GDPR's data-subject rights.
forgelm purge deletes a data-subject's rows from training corpora and stamps data.erasure_requested + data.erasure_completed events into the append-only audit log so post-deletion runs are forensically linked to the erasure event.
Software cannot be ISO 27001 or SOC 2 certified — only organisations can. ForgeLM is aligned with the controls an auditor asks the deployer about: it produces the audit-trail, change-management, data-lineage, and supply-chain evidence on every run, ready to drop into your audit binder.
Append-only audit_log.jsonl per run with HMAC + SHA-256 hash chain + genesis manifest sidecar. forgelm verify-audit validates the chain end-to-end. HMAC integrity is active when FORGELM_AUDIT_SECRET is KMS-managed and rotated per output-dir lifecycle.
Article 14 staging gate (forgelm approve / reject) plus human_approval.required/granted/rejected audit events plus forgelm verify-audit chain integrity — every model promotion is dual-controlled and forensically attributed.
data_provenance.json (SHA-256 fingerprint + size + mtime + HF Hub revision pin) and data_governance_report.json (collection method, annotation process, known biases, personal-data flag, DPIA completion) cover the lineage questions an auditor asks first.
CycloneDX 1.5 SBOM emitted per release tag for every (OS x Python-version) cell of the publish matrix. Wave 4 added pip-audit nightly + bandit in CI for static + dynamic security scanning of the build itself.
For the full audit cookbook, including 8 audit-floor questions and a 93-control coverage map, see docs/guides/iso_soc2_deployer_guide.md.
After every successful (or failed) run, this is what lands in your output directory.
checkpoints/policy-bot/ # training.output_dir from your YAML ├── audit_log.jsonl # Article 12 append-only event log ├── audit_log.jsonl.manifest.json # SHA-256 sidecar over the audit log ├── compliance/ # Annex IV bundle (Articles 9-17) │ ├── annex_iv_metadata.json # Article 11 technical documentation (flat JSON) │ ├── compliance_report.json # Full Article 9-17 manifest — superset of the per-artefact files │ ├── training_manifest.yaml # Human-readable summary (model, data, params, final metrics) │ ├── data_provenance.json # Data lineage extracted from the manifest (Article 10) │ ├── risk_assessment.json # Article 9 risk-management output (when risk_assessment block is set) │ ├── data_audit_report.json # Article 10 data governance — quality/PII/bias scan (forgelm audit) │ └── data_governance_report.json # Article 10 governance summary (sources, licenses) └── final_model/ # PEFT adapter or merged weights ├── README.md # Article 13 model card (HuggingFace-compatible) — auto-generated ├── deployer_instructions.md # Article 13 downstream-deployer guidance └── model_integrity.json # SHA-256 over weights/adapter for Article 15
Compliance is a sociotechnical discipline; software can prepare evidence but cannot certify a system on your behalf.
The compliance module, audit log schema, and Annex IV generator are all in forgelm/compliance.py.