# CLAUDE.md This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository. ## Project Overview AI Robot Core is a multi-tenant AI service platform providing intelligent chat, RAG (Retrieval-Augmented Generation) knowledge base, and LLM integration capabilities. The system consists of a Python FastAPI backend (ai-service) and a Vue.js admin frontend (ai-service-admin). ## Architecture ### Service Components - **ai-service**: FastAPI backend (port 8080 internal, 8182 external) - Multi-tenant isolation via `X-Tenant-Id` header - SSE streaming support via `Accept: text/event-stream` - RAG-powered responses with confidence scoring - Intent-driven script flows and metadata governance - **ai-service-admin**: Vue 3 + Element Plus frontend (port 80 internal, 8183 external) - Admin UI for knowledge base, LLM config, RAG experiments - Nginx reverse proxy to backend at `/api/*` - **postgres**: PostgreSQL 15 database (port 5432) - Chat sessions, messages, knowledge base metadata - Multi-tenant data isolation - **qdrant**: Vector database (port 6333) - Document embeddings and vector search - Collections prefixed with `kb_` - **ollama**: Local embedding model service (port 11434) - Default: `nomic-embed-text` (768 dimensions) - Recommended: `toshk0/nomic-embed-text-v2-moe:Q6_K` ### Key Architectural Patterns **Multi-Tenancy**: All data is scoped by `tenant_id`. The `TenantContextMiddleware` extracts tenant from `X-Tenant-Id` header and stores in request state. Database queries must filter by tenant_id. **Service Layer Organization**: - `app/services/llm/`: LLM provider adapters (OpenAI, DeepSeek, Ollama) - `app/services/embedding/`: Embedding providers (OpenAI, Ollama, Nomic) - `app/services/retrieval/`: Vector search and indexing - `app/services/document/`: Document parsers (PDF, Word, Excel, Text) - `app/services/flow/`: Intent-driven script flow engine - `app/services/guardrail/`: Input/output filtering and safety - `app/services/monitoring/`: Dashboard metrics and logging - `app/services/mid/`: Mid-platform dialogue and session management **API Structure**: - `/ai/chat`: Main chat endpoint (supports JSON and SSE streaming) - `/ai/health`: Health check - `/admin/*`: Admin endpoints for configuration and management - `/mid/*`: Mid-platform API for dialogue sessions and messages **Configuration**: Uses `pydantic-settings` with `AI_SERVICE_` prefix. All settings in `app/core/config.py` can be overridden via environment variables. **Database**: SQLModel (SQLAlchemy + Pydantic) with async PostgreSQL. Entities in `app/models/entities.py`. Session factory in `app/core/database.py`. ## Development Commands ### Backend (ai-service) ```bash cd ai-service # Install dependencies (development) pip install -e ".[dev]" # Run development server uvicorn app.main:app --reload --port 8000 # Run tests pytest # Run specific test pytest tests/test_confidence.py -v # Run tests with coverage pytest --cov=app --cov-report=html # Lint with ruff ruff check app/ ruff format app/ # Type check with mypy mypy app/ # Database migrations python scripts/migrations/run_migration.py scripts/migrations/005_create_mid_tables.sql ``` ### Frontend (ai-service-admin) ```bash cd ai-service-admin # Install dependencies npm install # Run development server (port 5173) npm run dev # Build for production npm run build # Preview production build npm run preview ``` ### Docker Compose ```bash # Start all services docker compose up -d # Rebuild and start docker compose up -d --build # View logs docker compose logs -f ai-service docker compose logs -f ai-service-admin # Stop all services docker compose down # Stop and remove volumes (clears data) docker compose down -v # Pull embedding model in Ollama container docker exec -it ai-ollama ollama pull toshk0/nomic-embed-text-v2-moe:Q6_K ``` ## Important Conventions ### Acceptance Criteria Traceability All code must reference AC (Acceptance Criteria) codes in docstrings and comments: - Format: `[AC-AISVC-XX]` for ai-service - Example: `[AC-AISVC-01] Centralized configuration` - See `spec/contracting.md` for contract maturity levels (L0-L3) ### OpenAPI Contract Management - Provider API: `openapi.provider.yaml` (APIs this module provides) - Consumer API: `openapi.deps.yaml` (APIs this module depends on) - Must declare `info.x-contract-level: L0|L1|L2|L3` - L2 required before merging to main - Run contract checks: `scripts/check-openapi-level.sh`, `scripts/check-openapi-diff.sh` ### Multi-Tenant Data Access Always filter by tenant_id when querying database: ```python from app.core.tenant import get_current_tenant_id tenant_id = get_current_tenant_id() result = await session.exec( select(ChatSession).where(ChatSession.tenant_id == tenant_id) ) ``` ### SSE Streaming Pattern Check `Accept` header for streaming mode: ```python from app.core.sse import create_sse_response if request.headers.get("accept") == "text/event-stream": return create_sse_response(event_generator()) else: return JSONResponse({"response": "..."}) ``` ### Error Handling Use custom exceptions from `app/core/exceptions.py`: ```python from app.core.exceptions import AIServiceException, ErrorCode raise AIServiceException( code=ErrorCode.KNOWLEDGE_BASE_NOT_FOUND, message="Knowledge base not found", details={"kb_id": kb_id} ) ``` ### Configuration Access ```python from app.core.config import get_settings settings = get_settings() # Cached singleton llm_model = settings.llm_model ``` ### Embedding Configuration Embedding config is persisted to `config/embedding_config.json` and loaded at startup. Use `app/services/embedding/factory.py` to get the configured provider. ## Testing Strategy - Unit tests in `tests/test_*.py` - Use `pytest-asyncio` for async tests - Fixtures in `tests/conftest.py` - Mock external services (LLM, Qdrant) in tests - Integration tests require running services (use `docker compose up -d`) ## Session Handoff Protocol For complex multi-phase tasks, use the Session Handoff Protocol (v2.0): - Progress docs in `docs/progproject]-progress.md` - Must include: task overview, requirements reference, technical context, next steps - See `~/.claude/specs/session-handoff-protocol-ai-ref.md` for details - Stop proactively after 40-50 tool calls or phase completion ## Common Pitfalls 1. **Tenant Isolation**: Never query without tenant_id filter 2. **Embedding Model Changes**: Switching models requires rebuilding all knowledge bases (vectors are incompatible) 3. **SSE Streaming**: Must yield events in correct format: `data: {json}\n\n` 4. **API Key**: Backend auto-generates default key on first startup (check logs) 5. **Long-Running Commands**: Don't use `--watch` modes in tests/dev servers via bash tool 6. **Windows Paths**: Use forward slashes in code, even on Windows (Python handles conversion) ## Key Files - [app/main.py](ai-service/app/main.py): FastAPI app entry point - [app/core/config.py](ai-service/app/core/config.py): Configuration settings - [app/models/entities.py](ai-service/app/models/entities.py): Database models - [app/api/chat.py](ai-service/app/api/chat.py): Main chat endpoint - [app/services/flow/engine.py](ai-service/app/services/flow/engine.py): Intent-driven flow engine - [docker-compose.yaml](docker-compose.ya orchestration - [spec/contracting.md](spec/coing.md): Contract maturity rules