- Spin up or down workers as demand changes - Isolate and retry failures without bringing down the whole system - Smaller, focused services mean easier debugging and upgrades In production, we use RabbitMQ or Azure Service Bus to route: - Incoming user requests - Tool call triggers - Response handlers - Persistent memory updates

Agentic AI Architecture with .NET: Building Autonomous Systems#

Q: Why does preserved thinking mode beat traditional prompting?

Preserved thinking keeps context alive over 10,240 tokens, so you don’t resend full histories. This improves coherence and cuts API token usage by about 30% (source: docs.z.ai).

Q: Can I build agentic AI without microservices?

You can, but monoliths struggle to handle real-world scale. Event-driven microservices boost resilience, lower latency, and isolate failures—key for supporting millions of users.

Q: What’s the benefit of JSON Schema with tool integration?

It provides compile-time validation in .NET, catching errors early and making troubleshooting smoother. Without it, tool call errors become a costly headache.

Q: How much does streaming output improve user experience?

Streaming slashes the wait for initial responses from 3–5 seconds to under 200ms, greatly enhancing engagement and system smoothness (source: docs.z.ai). --- Building something agentic with .NET? AI 4U Labs delivers production-grade AI apps in 2-4 weeks. ---

Agentic AI isn’t some far-off future idea anymore — it’s running real autonomous systems that handle workflows, automate marketing, and orchestrate enterprise processes with almost no human input. What makes these systems tick? Solid architecture and tools that support extensive multi-turn reasoning and real-time reactions. At Z.AI, we build these using .NET every day, running GLM-5 with preserved thinking modes to keep context across 10,000+ tokens, all while leveraging JSON Schema-driven tool integration that hits 150–300ms average API latency.

Let’s get straight to the point: here’s how to design and launch autonomous AI systems that scale well, stay coherent, and keep costs low.

What is Agentic AI?#

Agentic AI refers to AI systems that don’t just reply — they set their own goals, break down complex tasks, and execute workflows without needing step-by-step human guidance. These agents act proactively and think across multiple turns and long horizons.

In action, agentic AI can power everything from content generation pipelines to dynamic, real-time routing of tasks across APIs like automated marketing campaigns.

Why Use .NET for Agentic AI?#

.NET and C# fit agentic AI development perfectly because:

Performance & Scalability: .NET smoothly handles heavy asynchronous streaming and event-driven microservices with minimal overhead. For example, our GLM-5 clients run over a million sessions using preserved thinking modes and 10,240-token windows, which demand efficient thread and state management.
Ecosystem & Integration: Seamless Azure cloud deployment, mature HTTP clients for JSON Schema validation, and dependable dependency injection make hooking up external tools and microservices rock solid.
Strong Typing & Reliability: Catching schema mismatches before hitting APIs is crucial in production. Defining strict JSON Schemas in .NET ensures no malformed calls get through when using GLM-5’s tool calling.

OpenAI’s pricing documentation shows that by preserving context and cutting down token resubmissions, complex workflows can trim API costs by about 30%. We keep token budgets below 10k per session with GLM-5’s preserved thinking mode, which translates roughly to $0.15 per 1,000 tokens (source: docs.z.ai).

Key Terms#

Agentic AI: AI that autonomously sets goals, breaks down tasks, and executes them across multi-turn workflows.
Preserved Thinking Mode: GLM-5’s feature that keeps reasoning context alive over 10k+ tokens, letting long conversations continue without losing track or restarting prompts.
Streaming Output: Delivers partial results in real time, cutting perceived latency from seconds down to under 200ms (source: docs.z.ai).

Designing Intelligent Decision Engines#

At the center of autonomous AI is the decision engine — it:

Sets high-level goals
Breaks them into manageable sub-tasks
Calls external tools using JSON Schema-defined APIs
Collects streamed results
Updates persistent memory with what happened

Here's how it works using GLM-5’s Interleaved Thinking Mode, which lets the agent reason while simultaneously making tool calls, instead of pausing at each step.

Core Components#

Component	Role	Benefit
GLM-5 Model Client	Drives reasoning & triggers tools	Enables autonomous thinking
JSON Schema Tools	Defines strict API interfaces	Avoids runtime errors
Streaming Handlers	Handles partial token streams	Speeds up feedback
Persistent Memory	Saves long-term dialogue context	Keeps conversations coherent
Microservice Event Bus	Dispatches tasks asynchronously	Scales workload smoothly

Sample: Setting Up the Request#

csharp
Loading...

Why JSON Schema?#

Skipping strict JSON validation usually means brittle tool calls with malformed inputs causing API failures. This results in retries, wasted tokens, flaky UX, and higher costs. Using JSON Schema in .NET puts guardrails in place to keep calls clean.

Implementing Persistent Memory in Agentic Systems#

GLM-5’s 10,000-token context window means the system keeps chat history and intermediate reasoning alive without a hard reset. Using preserved thinking mode stitches these sessions together smoothly.

Persistent memory cuts down prompt size since you don’t resend all past context each time, and stops repetitive reasoning, saving about 30% on API tokens in our deployments.

Persistent Memory in C##

csharp
Loading...

Real-World Benefits#

Agents keep themes and facts consistent over hours-long dialogs
Reduced prompt length saves bandwidth and API costs
Users get smooth conversations with instant context recall

Using Event-Driven Microservices for Scalability#

Decoupling AI work into microservices that react to events—like task creation, tool call results, and memory updates—avoids blocking calls. This spreads load, increases throughput, and simplifies retries.

Why Microservices?#

Spin up or down workers as demand changes
Isolate and retry failures without bringing down the whole system
Smaller, focused services mean easier debugging and upgrades

In production, we use RabbitMQ or Azure Service Bus to route:

Incoming user requests
Tool call triggers
Response handlers
Persistent memory updates

Event Flow#

plaintext
Loading...

Performance Gains#

Moving from a monolithic design smashed average API response latency from 700 ms down to 150–300 ms by leveraging microservices and optimized event queues (source: AI 4U Labs benchmarks).

Building and Integrating the Components#

A robust agentic AI system combines:

Model clients running preserved thinking mode with interleaved reasoning
JSON Schema-defined tool endpoints that eliminate tool call errors
Streaming output handlers feeding UI progressively
Persistent memory seamlessly following conversations
Event-driven microservices scaling workload and bolstering resilience

Here’s a straightforward example combining these aspects:

csharp
Loading...

Deploying and Maintaining Agentic AI Systems#

Deploy these on Azure Kubernetes Service or AWS EKS setups that include:

Auto-scaling microservice clusters
Managed RabbitMQ or Azure Service Bus
Redis or Cosmos DB for caching persistent memory
OpenTelemetry or similar for distributed tracing across multi-turn reasoning

Cost Breakdown#

GLM-5 API costs hover around $0.15 per 1,000 tokens (docs.z.ai)
Running 500K active sessions typically costs between $2,500 and $3,000 per month on mid-tier Azure clusters
Redis caching for state averages under $200 per month

Using preserved thinking mode slashes token usage by roughly 30%, making these systems affordable at scale.

Case Study: Autonomous Marketing Campaign Planner at AI 4U Labs#

We crafted a marketing automation agent that supports 50,000 monthly active users with:

GLM-5’s preserved thinking mode to keep campaign context across sessions
Strict JSON Schema tool calls for budget, audience segmentation, and scheduling APIs
Streaming responses that update UI in real time
.NET event-driven microservices powering scale and efficiency

Results: Campaign planning times fell by 40%, API usage costs dropped 28%, and average response latency stayed below 250 ms.

Check out the full writeup: Build Production-Ready AgentScope Workflows with OpenAI Agents.

Summary Table: GLM-5 Features for Autonomous AI#

Feature	Description	Benefits
Preserved Thinking	Keeps 10k+ token context for ongoing sessions	Coherent dialogs & cost-efficient
Interleaved Thinking	Reason during tool calls	Faster results, lower latency
Streaming Output	Sends partial tokens in <200ms latency	Smooth, real-time UX
JSON Schema Tool Calling	Enforces strict validation on tool inputs	Avoids runtime errors, easy integration

Frequently Asked Questions#

Q: Why does preserved thinking mode beat traditional prompting?#

Preserved thinking keeps context alive over 10,240 tokens, so you don’t resend full histories. This improves coherence and cuts API token usage by about 30% (source: docs.z.ai).

Q: Can I build agentic AI without microservices?#

You can, but monoliths struggle to handle real-world scale. Event-driven microservices boost resilience, lower latency, and isolate failures—key for supporting millions of users.

Q: What’s the benefit of JSON Schema with tool integration?#