Name: ForgeLM
Author: ForgeLM

Article-by-article mapping

Where each obligation lives in the code

Each card maps a Regulation (EU) 2024/1689 article to the ForgeLM module that fulfils it.

Article 9

Risk management

A continuous, iterative process: identify foreseeable risks, evaluate, mitigate.

forgelm/safety.py · forgelm/compliance.py

Article 10

Data & data governance

Quality, relevance, representativeness criteria. Bias detection, PII redaction. Optional Presidio ML-NER for unstructured PII; optional Croissant 1.0 dataset card emitted alongside the audit.

forgelm/data_audit/ · forgelm/ingestion.py

Article 11

Technical documentation

Annex IV's nine required §1-9 sections — auto-populated from the run.

compliance/annex_iv_metadata.json

Article 12

Record-keeping (logs)

Append-only event log over training start/end, eval gates, auto-revert decisions, human-approval gates, GDPR Article 17 erasure, and Article 15 access-request queries.

forgelm/compliance.py · audit_log.jsonl

Article 13

Transparency

Auto-generated model card (capabilities, intended use, training data summary) plus deployer instructions, written every successful run.

final_model/README.md · final_model/deployer_instructions.md

Article 14

Human oversight

Optional evaluation.require_human_approval: true moves the trained checkpoint into a staging directory; forgelm approve / reject + forgelm approvals drive the dual-controlled sign-off, all stamped into the audit log.

forgelm approve · forgelm reject · forgelm approvals

Article 15

Accuracy & robustness

Benchmark gates per-task floor; if the post-train number drops below the floor, the run fails (exit 3).

forgelm/benchmark.py

Article 16-17

Conformity & QMS

QMS SOPs in docs/qms/ cover model training, data management, incident response, change management, plus access control / encryption-at-rest / risk treatment / Statement of Applicability.

docs/qms/

GDPR

PII redaction at ingest

--pii-mask redacts emails, phones, credit cards (Luhn-validated), IBAN, TR national ID, DE Personalausweis, FR SSN, US SSN before chunks land in JSONL.

forgelm/data_audit/ · forgelm/data_audit/_pii_regex.py

GDPR Art. 15

Right of access

forgelm reverse-pii scans masked JSONL corpora for plaintext residuals or salted SHA-256 digests of a data-subject identifier and reports every row of every split where that subject appears, satisfying the access-request side of the GDPR's data-subject rights.

forgelm reverse-pii

GDPR Art. 17

Right to erasure

forgelm purge deletes a data-subject's rows from training corpora and stamps data.erasure_requested + data.erasure_completed events into the append-only audit log so post-deletion runs are forensically linked to the erasure event.

forgelm purge

ISO 27001 · SOC 2 Type II

Aligned, not certified

Software cannot be ISO 27001 or SOC 2 certified — only organisations can. ForgeLM is aligned with the controls an auditor asks the deployer about: it produces the audit-trail, change-management, data-lineage, and supply-chain evidence on every run, ready to drop into your audit binder.

Pillar 1

Audit trail

Append-only audit_log.jsonl per run with HMAC + SHA-256 hash chain + genesis manifest sidecar. forgelm verify-audit validates the chain end-to-end. HMAC integrity is active when FORGELM_AUDIT_SECRET is KMS-managed and rotated per output-dir lifecycle.

audit_log.jsonl · forgelm verify-audit

Pillar 2

Change control

Article 14 staging gate (forgelm approve / reject) plus human_approval.required/granted/rejected audit events plus forgelm verify-audit chain integrity — every model promotion is dual-controlled and forensically attributed.

forgelm approve · forgelm verify-audit

Pillar 3

Data lineage

data_provenance.json (SHA-256 fingerprint + size + mtime + HF Hub revision pin) and data_governance_report.json (collection method, annotation process, known biases, personal-data flag, DPIA completion) cover the lineage questions an auditor asks first.

data_provenance.json · data_governance_report.json

Pillar 4

Supply chain

CycloneDX 1.5 SBOM emitted per release tag for every (OS x Python-version) cell of the publish matrix. Wave 4 added pip-audit nightly + bandit in CI for static + dynamic security scanning of the build itself.

SBOM · pip-audit · bandit

For the full audit cookbook, including 8 audit-floor questions and a 93-control coverage map, see docs/guides/iso_soc2_deployer_guide.md.

What you get on disk

A run's compliance footprint

After every successful (or failed) run, this is what lands in your output directory.

checkpoints/policy-bot/

checkpoints/policy-bot/             # training.output_dir from your YAML
├── audit_log.jsonl                 # Article 12 append-only event log
├── audit_log.jsonl.manifest.json   # SHA-256 sidecar over the audit log
├── compliance/                     # Annex IV bundle (Articles 9-17)
│   ├── annex_iv_metadata.json      # Article 11 technical documentation (flat JSON)
│   ├── compliance_report.json      # Full Article 9-17 manifest — superset of the per-artefact files
│   ├── training_manifest.yaml      # Human-readable summary (model, data, params, final metrics)
│   ├── data_provenance.json        # Data lineage extracted from the manifest (Article 10)
│   ├── risk_assessment.json        # Article 9 risk-management output (when risk_assessment block is set)
│   ├── data_audit_report.json      # Article 10 data governance — quality/PII/bias scan (forgelm audit)
│   └── data_governance_report.json # Article 10 governance summary (sources, licenses)
└── final_model/                    # PEFT adapter or merged weights
    ├── README.md                   # Article 13 model card (HuggingFace-compatible) — auto-generated
    ├── deployer_instructions.md    # Article 13 downstream-deployer guidance
    └── model_integrity.json        # SHA-256 over weights/adapter for Article 15

Honest limits

What ForgeLM does not claim

Compliance is a sociotechnical discipline; software can prepare evidence but cannot certify a system on your behalf.

ForgeLM generates Annex IV-style technical documentation from a run.
The audit log is append-only by convention and SHA-256-anchored.
The PII / secrets regex sets are conservative by design.
Llama Guard's S1-S14 categories are an opinionated taxonomy.

Want a deeper look?

The compliance module, audit log schema, and Annex IV generator are all in forgelm/compliance.py.

View compliance.py Talk to us