Allma Studio

AI Chat Platform

Overview Technical Docs Deep-Dive

Live Demo

Allma Studio

AI Platform

Demo

Overview Technical Docs Deep-Dive

Back to Overview

Getting Started

Documentation & Setup Guide

Everything you need to get Allma Studio running on your local machine.

Prerequisites

Docker DesktopRequired

Recommended for easy setup

Node.js 18+

For frontend development

Python 3.11+

For backend development

OllamaRequired

Local LLM runtime

Quick Start (Docker)

1. Clone the repository

bash

git clone https://github.com/VaibhavK289/Allma.git
cd Allma

2. Copy environment file

bash

cp .env.example .env

3. Start all services

bash

docker compose up -d

4. Open in browser

bash

open http://localhost:3000

Manual Setup (Development)

Install Ollama & Models

bash

# Install Ollama (Windows)
# Download from https://ollama.ai/download

# Pull required models
ollama pull nomic-embed-text    # Required for embeddings
ollama pull deepseek-r1:latest  # Recommended LLM

# Or choose another model:
ollama pull gemma2:9b
ollama pull qwen2.5-coder:7b

Backend Setup

bash

cd allma-backend

# Create virtual environment
python -m venv venv

# Activate (Windows)
.\venv\Scripts\activate

# Install dependencies
pip install -r requirements.txt

# Start server
uvicorn main:app --reload --host 0.0.0.0 --port 8000

Frontend Setup

bash

cd allma-frontend

# Install dependencies
npm install

# Start development server
npm run dev

Configuration

Environment Variables

Create a .env file in the root directory:

env

# Backend Configuration
OLLAMA_HOST=http://localhost:11434
OLLAMA_MODEL=deepseek-r1:latest
OLLAMA_EMBEDDING_MODEL=nomic-embed-text:latest

# Vector Store
VECTOR_STORE_PATH=./data/vectorstore
CHROMA_PERSIST_DIRECTORY=./data/vectorstore

# API Settings
API_HOST=0.0.0.0
API_PORT=8000
LOG_LEVEL=INFO

# Frontend Configuration
VITE_API_URL=http://localhost:8000

Available Models

Model	Size	Best For
deepseek-r1:latest	5.2GB	Reasoning, Analysis
gemma2:9b	5.4GB	General Purpose
qwen2.5-coder:7b	4.7GB	Code Generation
llama3.2	2.0GB	Fast Responses
nomic-embed-text	274MB	Embeddings (Required)

Access Points

FrontendReact Application

http://localhost:5173

Backend APIFastAPI Server

http://localhost:8000

API DocsSwagger UI

http://localhost:8000/docs

OllamaLLM Runtime

http://localhost:11434

API Reference

Base URL: http://localhost:8000

GET/health— Check system health

json

{
  "status": "healthy",
  "components": {
    "ollama": { "status": "connected", "model": "deepseek-r1:latest" },
    "vector_store": { "status": "ready", "documents_count": 150 },
    "database": { "status": "connected" }
  },
  "version": "1.0.0"
}

POST/chat/— Send a chat message

Request Body:

json

{
  "message": "Explain quantum computing",
  "use_rag": false,
  "conversation_id": "optional-uuid",
  "stream": true,
  "temperature": 0.7,
  "max_tokens": 2048
}

Response (Streaming):

text

data: {"content": "Quantum", "done": false}
data: {"content": " computing", "done": false}
data: {"content": "", "done": true, "sources": []}

POST/rag/ingest— Upload document for RAG

Request (multipart/form-data):

bash

curl -X POST http://localhost:8000/rag/ingest \
  -F "file=@document.pdf"

Response:

json

{
  "success": true,
  "document_id": "doc_abc123",
  "filename": "document.pdf",
  "chunks_created": 25,
  "processing_time_ms": 1250
}

POST/rag/search— Search documents

Request Body:

json

{
  "query": "What is quantum entanglement?",
  "k": 5,
  "threshold": 0.7
}

Response:

json

{
  "results": [
    {
      "chunk_id": "chunk_001",
      "source": "quantum_physics.pdf",
      "content": "Quantum entanglement is...",
      "score": 0.95
    }
  ],
  "search_time_ms": 45
}

GET/models/— List available models

json

{
  "models": [
    {
      "name": "deepseek-r1:latest",
      "size_human": "5.2 GB",
      "details": { "parameter_size": "8B", "quantization": "Q4_K_M" }
    }
  ],
  "current_model": "deepseek-r1:latest"
}

POST/models/switch— Switch active model

json

// Request
{ "model_name": "gemma2:9b" }

// Response
{
  "success": true,
  "previous_model": "deepseek-r1:latest",
  "current_model": "gemma2:9b"
}

Error Codes

Code	HTTP	Description
VALIDATION_ERROR	400	Invalid request format
NOT_FOUND	404	Resource not found
UNSUPPORTED_FILE_TYPE	400	File type not supported
MODEL_NOT_FOUND	404	Ollama model not installed
OLLAMA_UNAVAILABLE	503	Cannot connect to Ollama
RATE_LIMITED	429	Too many requests

Error Response Format:

json

{
  "detail": {
    "error": "OLLAMA_UNAVAILABLE",
    "message": "Cannot connect to Ollama server at http://localhost:11434",
    "timestamp": "2024-01-15T10:30:00.000Z"
  }
}

SDK Examples

Python

python

import httpx

client = httpx.Client(base_url="http://localhost:8000")

# Chat
response = client.post("/chat/", json={
    "message": "Explain AI",
    "use_rag": False
})
print(response.json())

# Ingest document
with open("doc.pdf", "rb") as f:
    response = client.post("/rag/ingest", files={"file": f})
print(response.json())

JavaScript

javascript

// Streaming chat with EventSource
const eventSource = new EventSource('/api/chat?' + new URLSearchParams({
  message: 'Hello!',
  use_rag: 'false'
}));

eventSource.onmessage = (event) => {
  const data = JSON.parse(event.data);
  if (data.done) {
    console.log('Sources:', data.sources);
    eventSource.close();
  } else {
    process.stdout.write(data.content);
  }
};

// Upload document
const formData = new FormData();
formData.append('file', fileInput.files[0]);
await fetch('/rag/ingest', { method: 'POST', body: formData });

cURL

bash

# Health check
curl http://localhost:8000/health

# Chat
curl -X POST http://localhost:8000/chat/ \
  -H "Content-Type: application/json" \
  -d '{"message": "Hello!", "use_rag": false}'

# Ingest document
curl -X POST http://localhost:8000/rag/ingest \
  -F "file=@document.pdf"

# List models
curl http://localhost:8000/models/

Technical Deep-Dive