- An Express server listens for POST calls at `/llm-proxy`. - Each request gets an OpenTelemetry Span tracking latency, model, token counts, and security flags. - A simple regex-based injection detector blocks dangerous prompts immediately. - Validated requests get forwarded to the actual LLM API. Adjust the endpoint for your providers. The regex detector here is just a starting point. Our production proxy layers heuristics and ML classifiers for better injection detection, all open-sourced [here](https://github.com/ai4u-labs/ai-compliance-proxy).

Building an EU AI Act Compliance Proxy for Real-Time LLM API Monitoring#

If you're deploying a high-risk AI system in the EU or serving EU users, real-time oversight isn’t optional—it's mandatory. Come August 2, 2026, the EU AI Act enforcement kicks in, with fines soaring up to €35 million or 7% of your global revenue if you’re out of line. This isn’t just theoretical—apps handling tens of thousands of LLM API calls daily need strict and continuous monitoring right now.

We've built a real-time AI compliance proxy processing over 50K calls per day using GPT-4.1-mini and Claude Opus 4.6. It keeps latency tight, around 180ms overhead, and costs just $0.12 per 1,000 calls in monitoring expenses. Plus, it’s vendor-agnostic. OpenTelemetry with AI-specific semantic conventions powers our observability.

This article covers what the EU AI Act requires, why real-time monitoring is essential, the architecture behind compliance proxies, a hands-on proxy build with code, live prompt injection detection, how to run it in production, common pitfalls, and future-proofing advice.

Overview of EU AI Act Requirements for High-Risk AI Systems#

The EU AI Act is the European Union’s first all-in-one regulation targeting AI systems deemed high-risk for society. It demands strict transparency, risk management, and continuous oversight.

It specifically covers AI used in hiring, credit scoring, biometric identification, education, critical infrastructure, and more—those flagged in Annex III. The key mandates include:

Continuous, real-time compliance monitoring
Transparent audit trails for every model call
Built-in defenses against prompt injections and manipulation
Complete documentation of model versions, inputs/outputs, and latency metrics

Official EU sources (like visioncompliance.eu) confirm that enforcement starts August 2, 2026. Missing compliance means facing fines up to €35 million or 7% of annual revenue—whichever is higher.

Why These Rules Matter#

The idea is to avoid harm from biased or manipulated AI decisions, especially in sensitive areas. You need to supply regulators with proof on the spot: “Here’s every API call made, token counts, latency, and evidence we blocked prompt injections.”

Why Real-Time API Monitoring is Critical#

Adding compliance after deployment isn’t workable. Major AI projects often hit over 50,000 LLM calls daily, scaling fast. Waiting hours or days to audit logs invites risk.

Monitoring in real time means capturing every call live—logging prompt content, model version, token counts, latency, and security alerts. This setup lets you:

Spot suspicious prompt manipulations as they happen
Maintain tamper-proof, detailed audit trails
Feed accurate data to compliance reports without slowing apps down

Research from zylos.ai shows OpenTelemetry, enhanced with AI-specific semantics, is the best option for this level of observability.

Latency matters a lot—if monitoring adds more than 200ms on average, user experience suffers. Our proxy adds just about 180ms per call, balancing speed and compliance perfectly.

Designing an Open-Source AI Compliance Proxy: Architecture and Components#

Here’s the architecture we use, built for high throughput and airtight compliance:

Component	Description	Why We Use It
Reverse Proxy API Layer	Intercepts all LLM API calls before they hit underlying models	Central control point for enforcement
OpenTelemetry Tracer	Captures semantic spans with data on models, tokens, latency	Vendor-neutral, standardized telemetry
Prompt Injection Detector	Applies regex and heuristics live on prompts	Early detection of attacks, boosts security
Logging & Storage	Streams logs to scalable backends like ELK or ClickHouse	Enables searchable audit trails and replayability
Model Orchestrator	Routes requests dynamically to GPT-4.1-mini or Claude Opus 4.6	Supports multi-model setups and load balancing

Node.js works well here for async network I/O and a rich ecosystem. Our OpenTelemetry spans carry attributes like model.name, tokens.input, and security.issue—crucial for compliance validation.

Step-by-Step Implementation of the Proxy with Code Examples#

Here’s a compact, functional example that does the essentials: real-time monitoring, prompt injection detection, and forwarding calls.

javascript
Loading...

What’s Going On?#

An Express server listens for POST calls at /llm-proxy.
Each request gets an OpenTelemetry Span tracking latency, model, token counts, and security flags.
A simple regex-based injection detector blocks dangerous prompts immediately.
Validated requests get forwarded to the actual LLM API. Adjust the endpoint for your providers.

The regex detector here is just a starting point. Our production proxy layers heuristics and ML classifiers for better injection detection, all open-sourced here.

Detecting and Logging LLM API Calls Accurately#

Token counting and versioning are compliance essentials. Our approach:

Token counting: Use tokenizer libraries from the models to count tokens accurately—not just prompt length in characters. EU auditors check token use for data minimization.
Model versioning: Tag every call with the exact model version, like gpt-4.1-mini or Claude Opus 4.6. No vague references.
Latency measuring: Start timing before forwarding and stop once the response arrives.
Audit logs: Push spans to scalable backends like ElasticSearch or ClickHouse using a structured JSON schema that follows OpenTelemetry semantic conventions tailored for LLMs.

We use semantic conventions recommended by zylos.ai to keep naming consistent and simplify compliance reporting.

Prompt Injection involves attackers sneaking malicious instructions into prompts to manipulate AI output.

Integrating the Compliance Proxy into Production#

Deploying a proxy live requires caution.

Our AI 4U Labs approach:

Start with shadow mode: Proxy logs all traffic but doesn’t block anything yet. Collect telemetry for about two weeks.
Tune prompt injection detection: Use production data to refine detectors and reduce false positives.
Roll out blocking gradually: Enable blocking suspicious calls with a quick rollback plan.
Keep an eye on costs: At $0.12 per 1,000 calls, 50K daily calls cost about $6/day—a fraction compared to fines.
Set up alerts: Trigger notifications on sudden injection spikes or latency issues via Slack or PagerDuty.

Our typical deployment puts the proxy as a Kubernetes sidecar, auto-scaling smoothly with traffic to avoid bottlenecks.

Limitations and Important Points#

Open-source tooling is evolving. No perfect, plug-and-play compliance proxies exist yet. Plan to improve your pipeline over time.
Prompt injection detection is tricky. Attackers keep innovating. Stay ready to update heuristics and ML models regularly.
Latency vs. thoroughness is a balancing act. Minimizing delay while logging everything completely takes effort.
Cost considerations: Monitoring 1 million calls a month runs about $120 in overhead—factor this into budgeting.
Data privacy matters: Compliance logs should anonymize or pseudonymize data where possible to meet GDPR alongside the AI Act.

Resources for Staying Current on AI Regulations#

EU AI Act Updates – Official info and ongoing changes
AI Compliance Toolbox – Details on high-risk AI categories
OpenTelemetry for AI – AI-specific telemetry standards
AI Security & Observability – Tools for prompt injection defense and audit trails

Definitions#

EU AI Act: The EU’s legal framework governing high-risk AI systems, with mandatory compliance and penalties.

Prompt Injection: When attackers insert harmful instructions into prompts to manipulate model behavior.

OpenTelemetry: An open-source framework for collecting distributed tracing and metrics data, now adapted for AI telemetry.

Frequently Asked Questions#

What defines a "high-risk AI system" under the EU AI Act?#

High-risk AI systems are those used in critical areas like hiring, credit scoring, biometrics, education, and infrastructure, specifically listed in Annex III of the EU AI Act.

Can I use this compliance proxy with multiple LLM providers?#

Absolutely. Our design supports routing calls dynamically to models like GPT-4.1-mini and Claude Opus 4.6, capturing telemetry for each.

How does real-time monitoring affect API latency?#

We average about 180ms extra latency per API call in production. With efficient async I/O and batching, user experiences remain smooth.

Is prompt injection detection reliable?#

Basic regex catches obvious attacks. For real security, layered heuristics and ML models are essential. Our open-source proxy integrates both and evolves with emerging threats.

Building compliant AI solutions with the EU AI Act in mind? AI 4U Labs delivers production-ready AI apps in 2–4 weeks. Let’s chat.

Building an EU AI Act Compliance Proxy for Real-Time LLM API Monitoring