FREQUENTLY ASKED QUESTIONS

AI Agent and Artificial Intelligence Solutions

This document collects the questions client CTOs and engineering teams raise when evaluating the OKAXI AI Agent approach. Answers are written from hands-on experience on the retail and contact center engagements currently in production.

AI Agent and Artificial Intelligence Solutions

Back to the FAQ hub

How does an OKAXI AI Agent differ from a classical chatbot?

A classical chatbot only replies in text based on templates or hard rules. The OKAXI AI Agent goes beyond a chatbot in three ways. First, it reasons through multiple steps before taking action. Second, it calls external tools or APIs to fetch real-time data or perform operations. Third, it closes the business workflow loop without needing a human at every step.

Can the Agent call third-party APIs on its own?

Yes. OKAXI builds a tool registry for every Agent. Each tool is a function with an explicit JSON Schema covering input and output. The Agent picks the right tool based on context and current goal, calls the function, reads the result, and decides the next step. The registry covers REST API clients, database queries, file system access, and external SaaS APIs.

How does the Agent make decisions?

The Agent runs an Observe Think Act loop. Observe reads the current workflow state and context window. Think calls the LLM for reasoning about the next step, with the available tool list attached. Act executes a tool or returns the final answer. The loop stops when the goal is reached or when the iteration cap is hit.

How do function-calling and tool-calling work in the OKAXI architecture?

OKAXI uses the function-calling standard shared by OpenAI, Anthropic, and Google. Each function has a JSON schema defining name, description, and parameters. The Agent prompt carries the list of available functions. The LLM returns structured JSON selecting which function to call with concrete arguments. The backend receives the JSON, validates the schema, executes the function, and feeds the result back into the context window for the next turn.

How does the Agent handle complex multi-step workflows?

OKAXI models workflows as a directed acyclic graph (DAG). Each node is a reasoning step or an action. The Agent can jump between nodes based on runtime conditions. Workflow state is persisted to Redis or PostgreSQL so the Agent can resume after an interruption. Long-running workflows remain auditable down to each intermediate decision.

How does OKAXI integrate an Agent into an existing Microservices stack?

OKAXI exposes the Agent through a REST endpoint or a gRPC interface. Existing Microservices publish an event to Kafka when they need the Agent. A dedicated consumer receives the event, calls the Agent runtime, runs the Observe Think Act loop, and publishes the result back to another topic. The core Microservices do not need to know that an LLM is involved.

Can OKAXI integrate AI Agents into a legacy Monolith?

Yes. OKAXI deploys a Sidecar service alongside the Monolith. The Monolith only needs to call the Sidecar REST API when it wants to invoke the Agent. The Sidecar carries all the LLM logic, tool calling, and workflow orchestration. This approach keeps the Monolith codebase untouched and removes the need to refactor core business logic.

What is the role of Kafka in the OKAXI Agent architecture?

Kafka is the event-driven backbone for the Agent infrastructure. It plays four roles. First, it buffers between the producer and the Agent consumer so traffic bursts do not overload the system. Second, it persists events so the Agent can replay after a restart. Third, it fans out the same event to multiple Agents with different responsibilities. Fourth, it decouples business services from the specific Agent that will handle the event.

How does the OKAXI MCP Server work?

The Model Context Protocol (MCP) Server is the bridge between LLM clients and the enterprise data systems. The server exposes resources and tools per the MCP spec. The LLM client sends an MCP request to the server, the server reads data from the database, file system, or internal API, and returns a standard JSON response. MCP lets the Agent read enterprise data without pasting full content into the prompt. The result is lower token cost and tighter access control.

Does AI integration disrupt the core business logic?

No. OKAXI applies the strangler pattern and the sidecar pattern to avoid editing core code. All Agent logic lives in a new integration tier triggered through events or REST endpoints. When the Agent fails or has to be rolled back, the business core keeps running through a circuit breaker and a fallback path. Production deployment uses feature flags to switch Agent features on and off without a core redeploy.

How does OKAXI optimise token usage for the LLM API?

OKAXI applies five tactics. First, minimal prompt templates that drop redundant instructions. Second, context window pruning that keeps only the most recent N turns. Third, summarisation for long-running sessions, replacing detailed history with a compact summary. Fourth, tool description compression that uses schema references rather than pasting full documentation. Fifth, model routing that sends simple tasks to a small model and complex tasks to a larger model.

How does OKAXI Prompt Caching work?

OKAXI uses native prompt caching from Anthropic and OpenAI when available. Common system prompts and tool definitions are marked cacheable. The provider caches the prefix for up to one hour, and each cache hit saves around 90 percent of input token cost. OKAXI also builds a client-side cache layer in Redis for frequent prompts. The two cache layers together cut cost by 60 to 80 percent on repeat workloads.

How do you control LLM API cost month over month?

OKAXI sets a quota per Agent and per client workspace. Each request measures token usage before being sent. When monthly quota reaches 80 percent, the system switches to a smaller model or rejects non-urgent tasks. A real-time dashboard reports cost per Agent, per task type, and per customer. Alerts trigger when cost crosses a configured threshold.

Can you pick the right model size per task?

Yes. OKAXI runs model routing based on task complexity. Simple classification tasks like intent or sentiment run on small models such as Claude Haiku or GPT-4o-mini. Complex reasoning tasks like multi-step planning or code generation run on Claude Sonnet or GPT-4o. Heavy reasoning or long-context tasks run on Claude Opus or o1. The router decides at runtime based on signals attached to the task metadata.

What are the OKAXI AI Agent latency targets?

OKAXI designs Agents at three latency tiers. The real-time tier (front-line chatbot) targets p95 under 3 seconds. The near-real-time tier (ticket routing, classification) targets p95 under 10 seconds. The batch tier (large-volume reporting and summarisation) accepts p95 under 60 seconds. Each tier uses a different model size and caching strategy to hit the corresponding target.

Does internal data leave the enterprise to the LLM provider?

By default OKAXI sends only the necessary fields to the LLM provider through MCP and the data masking configuration. Sensitive fields such as national ID, credit card number, and personal HR data are redacted or tokenised before they enter the prompt. The OpenAI and Anthropic enterprise contracts both include a zero-data-retention clause. For clients with stricter requirements, OKAXI deploys an on-premise LLM instead of calling a cloud provider.

How does OKAXI private RAG protect data?

OKAXI builds a dedicated vector store for each client. Embeddings are generated by a self-hosted model or by a provider with explicit data residency. The vector store is fully isolated between tenants, so embeddings are never shared across customers. The retrieval pipeline runs inside the client VPC. The LLM call only sees the relevant retrieved chunks, never the full corpus.

Does OKAXI support on-premise LLM deployments?

Yes. OKAXI deploys self-hosted LLMs using Llama, Mistral, or Qwen depending on the client hardware. The inference stack runs on vLLM or TGI to reach production throughput. Model fine-tuning happens on the client internal data and the data never leaves the client VPC. On-premise fits clients with data residency requirements or compliance regimes such as HIPAA, GDPR Article 9, or the Vietnam Cybersecurity Law.

How does the OKAXI AI Gateway protect data?

The AI Gateway is a proxy layer between the application and the LLM provider. It serves four functions. First, PII detection and masking on outbound prompts. Second, policy enforcement that blocks prompts violating content policy. Third, rate limiting and quotas per tenant. Fourth, audit logging of every request and response, stored on the client encrypted storage. The Gateway runs inside the client VPC with no exception.

How does OKAXI handle audit logging and compliance?

Each Agent request carries a correlation ID that flows from the Kafka event through every tool call to the final response. Logs include the prompt, the tool call sequence, the final response, latency, and cost. Logs sit on immutable storage with a 7-year retention policy for audit compliance. Clients can query logs through a dashboard or export them to an internal SIEM. OKAXI supports the OpenTelemetry and CloudEvents standard formats.

Back to the FAQ hub