Building a compliant AI system requires tooling across thirteen categories, from pipeline orchestration through to incident management. This reference maps every tool to its compliance function, licence model, and procedural alternative, enabling engineering teams to plan their stack and budget holders to assess commercial implications.
The AI Compliance Tooling Reference maps over fifty tools across thirteen categories to their EU AI Act compliance functions. Pipeline orchestration (Dagster, Airflow, Prefect), experiment tracking (MLflow, Weights & Biases), and data versioning (DVC, LakeFS, Delta Lake) provide the foundation for reproducible, auditable AI development. Data validation and fairness evaluation tools (Great Expectations, Fairlearn, Evidently AI, NannyML) feed the testing and monitoring requirements of Articles 9, 10, and 72. Security scanning (Semgrep, Trivy, Snyk) and policy enforcement (OPA, Conftest) embed compliance into the development pipeline. Production monitoring (Prometheus, Grafana, Datadog) and incident management (PagerDuty, Zendesk) support the post-market monitoring and serious incident reporting obligations. Every tool has an identified open-source or procedural alternative, ensuring compliance is achievable regardless of budget. The reference is designed to be read alongside the Compliance Maturity Model for sequenced adoption planning.
Building a compliant AI system requires tooling across thirteen categories, from pipeline orchestration through to incident management.
Building a compliant AI system requires tooling across thirteen categories, from pipeline orchestration through to incident management. No single vendor covers the full landscape, and most organisations assemble a stack from open-source foundations supplemented by commercial services where operational maturity demands it. Every tool referenced in the Practical Implementation Guide maps to a specific compliance function, and each has a procedural alternative for organisations that cannot deploy it.
The tooling categories mirror the AISDP module structure. Pipeline orchestration tools (Dagster, Apache Airflow, Prefect, Kubeflow Pipelines) automate ML workflows with audit-grade lineage. Experiment tracking tools (MLflow, Weights & Biases) create the reproducibility records that reviewers examine. Data versioning tools (DVC, LakeFS, Delta Lake) ensure every training dataset is immutably linked to the model version it produced.
The licence model determines long-term cost, control, and audit independence.
The licence model determines long-term cost, control, and audit independence. Open-source tools (marked OSS in the reference table) provide full source access, enabling organisations to verify that the tool itself does not introduce compliance risks. Commercial tools offer managed operations and support but create vendor dependencies that must be managed through procurement contracts and exit planning.
For each category, the guide identifies a recommended tool and at least one procedural alternative. The procedural alternative is always available: it is a manual or spreadsheet-based process that achieves the same compliance outcome, albeit with higher ongoing effort. Organisations should treat automated tooling as the target state and procedural alternatives as interim measures during adoption.
Data quality and fairness evaluation are the foundation of trustworthy AI compliance.
Data quality and fairness evaluation are the foundation of trustworthy AI compliance. Great Expectations provides declarative data quality testing, enabling teams to define expectations as code and integrate them into the ci cd pipeline. When the tool cannot be deployed, manual data quality checks following the procedural alternative achieve the same validation outcome.
Security scanning and policy enforcement must be embedded in the development pipeline, not bolted on before deployment.
Security scanning and policy enforcement must be embedded in the development pipeline, not bolted on before deployment. Open Policy Agent (OPA) evaluates declarative policies written in the Rego language, enabling policy-as-code enforcement across the governance pipeline. Conftest extends this to infrastructure-as-code and configuration testing.
Static analysis with Semgrep supports custom AI compliance rules alongside standard security patterns. Trivy handles container and filesystem vulnerability scanning. Snyk and pip-audit cover dependency vulnerability scanning, with pip-audit serving as the open-source fallback. Secret detection uses detect-secrets or GitGuardian to prevent credential leakage into repositories.
Post-market monitoring under Article 72 requires continuous collection and analysis of production data.
Post-market monitoring under Article 72 requires continuous collection and analysis of production data. Prometheus collects metrics and manages alerting rules. Grafana provides visualisation and dashboards for the oversight interfaces that operators and compliance teams use daily. Datadog offers a unified commercial alternative covering the full observability stack.
Incident alerting and on-call management use PagerDuty or Opsgenie to route alerts through escalation chains that match the oversight pyramid described in . The procedural alternative (email and phone alerting) works for small teams but does not scale to multi-system portfolios.
Compliance-grade deployment requires progressive delivery and full auditability.
Compliance-grade deployment requires progressive delivery and full auditability. ArgoCD provides GitOps continuous delivery for Kubernetes, ensuring that every deployment is traceable to a specific commit. Argo Rollouts and Flagger enable canary and blue-green deployment strategies, where a new version receives a small percentage of traffic while automated analysis compares metrics before full rollout.
Helm packages Kubernetes deployments. Terraform and Pulumi provision infrastructure as code, creating the audit trail that demonstrates infrastructure state at any point in time. Feature flag management through LaunchDarkly, Unleash, or Flagsmith enables the capability: instant propagation of a kill switch when a system must be halted.
Organisational readiness requires tooling beyond the engineering pipeline.
Organisational readiness requires tooling beyond the engineering pipeline. Learning management systems (Moodle, Docebo, TalentLMS) deliver and track the AI literacy training required under Article 4, maintaining the records that demonstrate compliance during inspection.
Low-code platforms (Retool, Appsmith) accelerate development of internal oversight interfaces, enabling rapid construction of the monitoring dashboards and override controls that human oversight requires under Article 14. Custom development is the alternative but carries higher cost and longer delivery timelines.
Start with the compliance functions that carry the highest enforcement risk and work outward.
Start with the compliance functions that carry the highest enforcement risk and work outward. Risk management, data governance, and version control are prerequisites for every other compliance activity. Pipeline orchestration and CI/CD integration come next, enabling automated evidence generation. Monitoring and incident management complete the operational layer.
For each tool category, assess current maturity against the procedural alternative baseline. If the organisation is already performing the compliance function manually, the adoption question is one of efficiency and reliability, not capability. If the compliance function is not being performed at all, the tool adoption and the process establishment happen in parallel.
Yes. Every compliance function has an open-source or manual procedural alternative. Open-source stacks built on MLflow, DVC, Fairlearn, OPA, Prometheus, and Grafana cover the core requirements. The trade-off is higher ongoing manual effort for functions that commercial tools automate.
Start with the highest enforcement risk: version control for code, data, and models (DVC or LakeFS), then CI/CD with model validation gates, then production monitoring. These three establish the evidence generation pipeline that every other compliance activity depends on.
Vector databases (Pinecone, Weaviate, Qdrant, Chroma) store embeddings for RAG systems. Compliance implications include data lineage for retrieval corpora, grounding verification for accuracy under Article 15, and transparency of the retrieval process under Article 13.
Tools across 13 categories: pipeline orchestration, experiment tracking, data versioning, data validation, fairness evaluation, governance platforms, policy enforcement, security scanning, monitoring, deployment, IaC, feature flags, and secret management.
Open-source tools provide source access for audit independence. Commercial tools offer managed operations. Every category has an OSS option and a manual procedural alternative. Infrastructure costs are shared across the portfolio, not per system.
Fairlearn for metrics and mitigation algorithms, Evidently AI for drift and fairness reporting, NannyML for performance estimation without ground truth labels, and Great Expectations for data quality testing.
Prometheus for metrics collection, Grafana for dashboards, Datadog as unified commercial alternative, PagerDuty/Opsgenie for incident alerting, and Zendesk/ServiceNow for deployer communication and SLA tracking.
ArgoCD for GitOps delivery, Argo Rollouts/Flagger for progressive canary deployments, Terraform/Pulumi for infrastructure as code, and LaunchDarkly/Unleash for feature flags enabling break-glass capability.
Budget holders should note that infrastructure costs are largely shared across the AI system portfolio, not duplicated per system. The initial investment is front-loaded; ongoing costs are primarily storage and compute for monitoring data. Compliance Maturity Model describes the progression from manual processes to embedded automation.
Fairlearn delivers fairness metrics and mitigation algorithms, computing the srr and other bias measures that the AISDP must document. Evidently AI and NannyML monitor data drift and model performance in production, feeding the post-market monitoring system required under Article 72. NannyML's distinguishing feature is performance estimation without ground truth labels, useful when outcome data arrives with significant delay.
For governance platforms that unify risk management and compliance reporting, Credo AI and Holistic AI offer commercial solutions. Organisations that cannot justify the investment use spreadsheet templates and manual governance workflows as the procedural alternative.
The procedural alternative for each security tool is manual review, but manual review at scale is unreliable. Organisations should prioritise automating security scanning early, as retroactive security assessments cost more to remediate and delay deployment. Cybersecurity for AI Systems covers the full DevSecOps integration pattern.
Zendesk and ServiceNow handle deployer communication and ticket management for the serious incident reporting process required under Article 73. SLA tracking ensures that reporting timelines are met. The incident management tooling should integrate with the monitoring stack so that threshold breaches automatically create tickets with the relevant context.
HashiCorp Vault manages secrets with lease-based access, ensuring credentials rotate automatically and access is revoked when no longer needed. For RAG systems, vector databases (Pinecone, Weaviate, Qdrant, Chroma) store and retrieve embeddings, while LLM frameworks (LangChain, LangGraph) and observability tools (LangSmith) manage the application layer. RAG-Specific Compliance covers the vector database compliance considerations.
Architecture documentation tools like Structurizr generate C4 model diagrams from code, keeping system architecture documentation current as the system evolves. The alternative is manual diagramming with draw.io, which works but drifts from reality without disciplined updates. Development Architecture and Explainability covers the architecture documentation requirements in detail.
The guide recommends reading the tooling reference alongside the Compliance Maturity Model to sequence adoption against the five maturity levels. Level 1 (Awareness) organisations focus on governance roles and system classification. Level 2 (Foundation) organisations deploy version control and basic CI/CD. Level 3 (Structured) organisations integrate the full testing and monitoring stack. Levels 4 and 5 refine and optimise.
CTO of Standard Intelligence. Leads platform engineering and contributes to the PIG series technical content.
Data and Data Governance
Technical Documentation
Record-Keeping
Human Oversight
Accuracy, Robustness, Cybersecurity