Allma Studio

AI Platform

Demo
Getting Started

Documentation & Setup Guide

Everything you need to get Allma Studio running on your local machine.

Prerequisites

Docker DesktopRequired
Recommended for easy setup
Node.js 18+
For frontend development
Python 3.11+
For backend development
OllamaRequired
Local LLM runtime

Quick Start (Docker)

1. Clone the repository

bash
git clone https://github.com/VaibhavK289/Allma.git
cd Allma

2. Copy environment file

bash
cp .env.example .env

3. Start all services

bash
docker compose up -d

4. Open in browser

bash
open http://localhost:3000

Manual Setup (Development)

Install Ollama & Models

bash
# Install Ollama (Windows)
# Download from https://ollama.ai/download

# Pull required models
ollama pull nomic-embed-text    # Required for embeddings
ollama pull deepseek-r1:latest  # Recommended LLM

# Or choose another model:
ollama pull gemma2:9b
ollama pull qwen2.5-coder:7b

Backend Setup

bash
cd allma-backend

# Create virtual environment
python -m venv venv

# Activate (Windows)
.\venv\Scripts\activate

# Install dependencies
pip install -r requirements.txt

# Start server
uvicorn main:app --reload --host 0.0.0.0 --port 8000

Frontend Setup

bash
cd allma-frontend

# Install dependencies
npm install

# Start development server
npm run dev

Configuration

Environment Variables

Create a .env file in the root directory:

env
# Backend Configuration
OLLAMA_HOST=http://localhost:11434
OLLAMA_MODEL=deepseek-r1:latest
OLLAMA_EMBEDDING_MODEL=nomic-embed-text:latest

# Vector Store
VECTOR_STORE_PATH=./data/vectorstore
CHROMA_PERSIST_DIRECTORY=./data/vectorstore

# API Settings
API_HOST=0.0.0.0
API_PORT=8000
LOG_LEVEL=INFO

# Frontend Configuration
VITE_API_URL=http://localhost:8000

Available Models

ModelSizeBest For
deepseek-r1:latest5.2GBReasoning, Analysis
gemma2:9b5.4GBGeneral Purpose
qwen2.5-coder:7b4.7GBCode Generation
llama3.22.0GBFast Responses
nomic-embed-text274MBEmbeddings (Required)

Access Points

FrontendReact Application
http://localhost:5173
Backend APIFastAPI Server
http://localhost:8000
API DocsSwagger UI
http://localhost:8000/docs
OllamaLLM Runtime
http://localhost:11434

API Reference

Base URL: http://localhost:8000

GET/health— Check system health
json
{
  "status": "healthy",
  "components": {
    "ollama": { "status": "connected", "model": "deepseek-r1:latest" },
    "vector_store": { "status": "ready", "documents_count": 150 },
    "database": { "status": "connected" }
  },
  "version": "1.0.0"
}
POST/chat/— Send a chat message

Request Body:

json
{
  "message": "Explain quantum computing",
  "use_rag": false,
  "conversation_id": "optional-uuid",
  "stream": true,
  "temperature": 0.7,
  "max_tokens": 2048
}

Response (Streaming):

text
data: {"content": "Quantum", "done": false}
data: {"content": " computing", "done": false}
data: {"content": "", "done": true, "sources": []}
POST/rag/ingest— Upload document for RAG

Request (multipart/form-data):

bash
curl -X POST http://localhost:8000/rag/ingest \
  -F "file=@document.pdf"

Response:

json
{
  "success": true,
  "document_id": "doc_abc123",
  "filename": "document.pdf",
  "chunks_created": 25,
  "processing_time_ms": 1250
}
POST/rag/search— Search documents

Request Body:

json
{
  "query": "What is quantum entanglement?",
  "k": 5,
  "threshold": 0.7
}

Response:

json
{
  "results": [
    {
      "chunk_id": "chunk_001",
      "source": "quantum_physics.pdf",
      "content": "Quantum entanglement is...",
      "score": 0.95
    }
  ],
  "search_time_ms": 45
}
GET/models/— List available models
json
{
  "models": [
    {
      "name": "deepseek-r1:latest",
      "size_human": "5.2 GB",
      "details": { "parameter_size": "8B", "quantization": "Q4_K_M" }
    }
  ],
  "current_model": "deepseek-r1:latest"
}
POST/models/switch— Switch active model
json
// Request
{ "model_name": "gemma2:9b" }

// Response
{
  "success": true,
  "previous_model": "deepseek-r1:latest",
  "current_model": "gemma2:9b"
}

Error Codes

CodeHTTPDescription
VALIDATION_ERROR400Invalid request format
NOT_FOUND404Resource not found
UNSUPPORTED_FILE_TYPE400File type not supported
MODEL_NOT_FOUND404Ollama model not installed
OLLAMA_UNAVAILABLE503Cannot connect to Ollama
RATE_LIMITED429Too many requests

Error Response Format:

json
{
  "detail": {
    "error": "OLLAMA_UNAVAILABLE",
    "message": "Cannot connect to Ollama server at http://localhost:11434",
    "timestamp": "2024-01-15T10:30:00.000Z"
  }
}

SDK Examples

Python

python
import httpx

client = httpx.Client(base_url="http://localhost:8000")

# Chat
response = client.post("/chat/", json={
    "message": "Explain AI",
    "use_rag": False
})
print(response.json())

# Ingest document
with open("doc.pdf", "rb") as f:
    response = client.post("/rag/ingest", files={"file": f})
print(response.json())

JavaScript

javascript
// Streaming chat with EventSource
const eventSource = new EventSource('/api/chat?' + new URLSearchParams({
  message: 'Hello!',
  use_rag: 'false'
}));

eventSource.onmessage = (event) => {
  const data = JSON.parse(event.data);
  if (data.done) {
    console.log('Sources:', data.sources);
    eventSource.close();
  } else {
    process.stdout.write(data.content);
  }
};

// Upload document
const formData = new FormData();
formData.append('file', fileInput.files[0]);
await fetch('/rag/ingest', { method: 'POST', body: formData });

cURL

bash
# Health check
curl http://localhost:8000/health

# Chat
curl -X POST http://localhost:8000/chat/ \
  -H "Content-Type: application/json" \
  -d '{"message": "Hello!", "use_rag": false}'

# Ingest document
curl -X POST http://localhost:8000/rag/ingest \
  -F "file=@document.pdf"

# List models
curl http://localhost:8000/models/