ai-robot-core/CLAUDE.md

7.3 KiB

CLAUDE.md

This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.

Project Overview

AI Robot Core is a multi-tenant AI service platform providing intelligent chat, RAG (Retrieval-Augmented Generation) knowledge base, and LLM integration capabilities. The system consists of a Python FastAPI backend (ai-service) and a Vue.js admin frontend (ai-service-admin).

Architecture

Service Components

  • ai-service: FastAPI backend (port 8080 internal, 8182 external)

    • Multi-tenant isolation via X-Tenant-Id header
    • SSE streaming support via Accept: text/event-stream
    • RAG-powered responses with confidence scoring
    • Intent-driven script flows and metadata governance
  • ai-service-admin: Vue 3 + Element Plus frontend (port 80 internal, 8183 external)

    • Admin UI for knowledge base, LLM config, RAG experiments
    • Nginx reverse proxy to backend at /api/*
  • postgres: PostgreSQL 15 database (port 5432)

    • Chat sessions, messages, knowledge base metadata
    • Multi-tenant data isolation
  • qdrant: Vector database (port 6333)

    • Document embeddings and vector search
    • Collections prefixed with kb_
  • ollama: Local embedding model service (port 11434)

    • Default: nomic-embed-text (768 dimensions)
    • Recommended: toshk0/nomic-embed-text-v2-moe:Q6_K

Key Architectural Patterns

Multi-Tenancy: All data is scoped by tenant_id. The TenantContextMiddleware extracts tenant from X-Tenant-Id header and stores in request state. Database queries must filter by tenant_id.

Service Layer Organization:

  • app/services/llm/: LLM provider adapters (OpenAI, DeepSeek, Ollama)
  • app/services/embedding/: Embedding providers (OpenAI, Ollama, Nomic)
  • app/services/retrieval/: Vector search and indexing
  • app/services/document/: Document parsers (PDF, Word, Excel, Text)
  • app/services/flow/: Intent-driven script flow engine
  • app/services/guardrail/: Input/output filtering and safety
  • app/services/monitoring/: Dashboard metrics and logging
  • app/services/mid/: Mid-platform dialogue and session management

API Structure:

  • /ai/chat: Main chat endpoint (supports JSON and SSE streaming)
  • /ai/health: Health check
  • /admin/*: Admin endpoints for configuration and management
  • /mid/*: Mid-platform API for dialogue sessions and messages

Configuration: Uses pydantic-settings with AI_SERVICE_ prefix. All settings in app/core/config.py can be overridden via environment variables.

Database: SQLModel (SQLAlchemy + Pydantic) with async PostgreSQL. Entities in app/models/entities.py. Session factory in app/core/database.py.

Development Commands

Backend (ai-service)

cd ai-service

# Install dependencies (development)
pip install -e ".[dev]"

# Run development server
uvicorn app.main:app --reload --port 8000

# Run tests
pytest

# Run specific test
pytest tests/test_confidence.py -v

# Run tests with coverage
pytest --cov=app --cov-report=html

# Lint with ruff
ruff check app/
ruff format app/

# Type check with mypy
mypy app/

# Database migrations
python scripts/migrations/run_migration.py scripts/migrations/005_create_mid_tables.sql

Frontend (ai-service-admin)

cd ai-service-admin

# Install dependencies
npm install

# Run development server (port 5173)
npm run dev

# Build for production
npm run build

# Preview production build
npm run preview

Docker Compose

# Start all services
docker compose up -d

# Rebuild and start
docker compose up -d --build

# View logs
docker compose logs -f ai-service
docker compose logs -f ai-service-admin

# Stop all services
docker compose down

# Stop and remove volumes (clears data)
docker compose down -v

# Pull embedding model in Ollama container
docker exec -it ai-ollama ollama pull toshk0/nomic-embed-text-v2-moe:Q6_K

Important Conventions

Acceptance Criteria Traceability

All code must reference AC (Acceptance Criteria) codes in docstrings and comments:

  • Format: [AC-AISVC-XX] for ai-service
  • Example: [AC-AISVC-01] Centralized configuration
  • See spec/contracting.md for contract maturity levels (L0-L3)

OpenAPI Contract Management

  • Provider API: openapi.provider.yaml (APIs this module provides)
  • Consumer API: openapi.deps.yaml (APIs this module depends on)
  • Must declare info.x-contract-level: L0|L1|L2|L3
  • L2 required before merging to main
  • Run contract checks: scripts/check-openapi-level.sh, scripts/check-openapi-diff.sh

Multi-Tenant Data Access

Always filter by tenant_id when querying database:

from app.core.tenant import get_current_tenant_id

tenant_id = get_current_tenant_id()
result = await session.exec(
    select(ChatSession).where(ChatSession.tenant_id == tenant_id)
)

SSE Streaming Pattern

Check Accept header for streaming mode:

from app.core.sse import create_sse_response

if request.headers.get("accept") == "text/event-stream":
    return create_sse_response(event_generator())
else:
    return JSONResponse({"response": "..."})

Error Handling

Use custom exceptions from app/core/exceptions.py:

from app.core.exceptions import AIServiceException, ErrorCode

raise AIServiceException(
    code=ErrorCode.KNOWLEDGE_BASE_NOT_FOUND,
    message="Knowledge base not found",
    details={"kb_id": kb_id}
)

Configuration Access

from app.core.config import get_settings

settings = get_settings()  # Cached singleton
llm_model = settings.llm_model

Embedding Configuration

Embedding config is persisted to config/embedding_config.json and loaded at startup. Use app/services/embedding/factory.py to get the configured provider.

Testing Strategy

  • Unit tests in tests/test_*.py
  • Use pytest-asyncio for async tests
  • Fixtures in tests/conftest.py
  • Mock external services (LLM, Qdrant) in tests
  • Integration tests require running services (use docker compose up -d)

Session Handoff Protocol

For complex multi-phase tasks, use the Session Handoff Protocol (v2.0):

  • Progress docs in docs/progproject]-progress.md
  • Must include: task overview, requirements reference, technical context, next steps
  • See ~/.claude/specs/session-handoff-protocol-ai-ref.md for details
  • Stop proactively after 40-50 tool calls or phase completion

Common Pitfalls

  1. Tenant Isolation: Never query without tenant_id filter
  2. Embedding Model Changes: Switching models requires rebuilding all knowledge bases (vectors are incompatible)
  3. SSE Streaming: Must yield events in correct format: data: {json}\n\n
  4. API Key: Backend auto-generates default key on first startup (check logs)
  5. Long-Running Commands: Don't use --watch modes in tests/dev servers via bash tool
  6. Windows Paths: Use forward slashes in code, even on Windows (Python handles conversion)

Key Files