Getting Started
Documentation & Setup Guide
Everything you need to get Allma Studio running on your local machine.
Prerequisites
Docker DesktopRequired
Recommended for easy setupNode.js 18+
For frontend developmentPython 3.11+
For backend developmentOllamaRequired
Local LLM runtimeQuick Start (Docker)
1. Clone the repository
bash
git clone https://github.com/VaibhavK289/Allma.git
cd Allma2. Copy environment file
bash
cp .env.example .env3. Start all services
bash
docker compose up -d4. Open in browser
bash
open http://localhost:3000Manual Setup (Development)
Install Ollama & Models
bash
# Install Ollama (Windows)
# Download from https://ollama.ai/download
# Pull required models
ollama pull nomic-embed-text # Required for embeddings
ollama pull deepseek-r1:latest # Recommended LLM
# Or choose another model:
ollama pull gemma2:9b
ollama pull qwen2.5-coder:7bBackend Setup
bash
cd allma-backend
# Create virtual environment
python -m venv venv
# Activate (Windows)
.\venv\Scripts\activate
# Install dependencies
pip install -r requirements.txt
# Start server
uvicorn main:app --reload --host 0.0.0.0 --port 8000Frontend Setup
bash
cd allma-frontend
# Install dependencies
npm install
# Start development server
npm run devConfiguration
Environment Variables
Create a .env file in the root directory:
env
# Backend Configuration
OLLAMA_HOST=http://localhost:11434
OLLAMA_MODEL=deepseek-r1:latest
OLLAMA_EMBEDDING_MODEL=nomic-embed-text:latest
# Vector Store
VECTOR_STORE_PATH=./data/vectorstore
CHROMA_PERSIST_DIRECTORY=./data/vectorstore
# API Settings
API_HOST=0.0.0.0
API_PORT=8000
LOG_LEVEL=INFO
# Frontend Configuration
VITE_API_URL=http://localhost:8000Available Models
| Model | Size | Best For |
|---|---|---|
| deepseek-r1:latest | 5.2GB | Reasoning, Analysis |
| gemma2:9b | 5.4GB | General Purpose |
| qwen2.5-coder:7b | 4.7GB | Code Generation |
| llama3.2 | 2.0GB | Fast Responses |
| nomic-embed-text | 274MB | Embeddings (Required) |
Access Points
FrontendReact Application
http://localhost:5173Backend APIFastAPI Server
http://localhost:8000API DocsSwagger UI
http://localhost:8000/docsOllamaLLM Runtime
http://localhost:11434API Reference
Base URL: http://localhost:8000
GET
/health— Check system healthjson
{
"status": "healthy",
"components": {
"ollama": { "status": "connected", "model": "deepseek-r1:latest" },
"vector_store": { "status": "ready", "documents_count": 150 },
"database": { "status": "connected" }
},
"version": "1.0.0"
}POST
/chat/— Send a chat messageRequest Body:
json
{
"message": "Explain quantum computing",
"use_rag": false,
"conversation_id": "optional-uuid",
"stream": true,
"temperature": 0.7,
"max_tokens": 2048
}Response (Streaming):
text
data: {"content": "Quantum", "done": false}
data: {"content": " computing", "done": false}
data: {"content": "", "done": true, "sources": []}POST
/rag/ingest— Upload document for RAGRequest (multipart/form-data):
bash
curl -X POST http://localhost:8000/rag/ingest \
-F "file=@document.pdf"Response:
json
{
"success": true,
"document_id": "doc_abc123",
"filename": "document.pdf",
"chunks_created": 25,
"processing_time_ms": 1250
}POST
/rag/search— Search documentsRequest Body:
json
{
"query": "What is quantum entanglement?",
"k": 5,
"threshold": 0.7
}Response:
json
{
"results": [
{
"chunk_id": "chunk_001",
"source": "quantum_physics.pdf",
"content": "Quantum entanglement is...",
"score": 0.95
}
],
"search_time_ms": 45
}GET
/models/— List available modelsjson
{
"models": [
{
"name": "deepseek-r1:latest",
"size_human": "5.2 GB",
"details": { "parameter_size": "8B", "quantization": "Q4_K_M" }
}
],
"current_model": "deepseek-r1:latest"
}POST
/models/switch— Switch active modeljson
// Request
{ "model_name": "gemma2:9b" }
// Response
{
"success": true,
"previous_model": "deepseek-r1:latest",
"current_model": "gemma2:9b"
}Error Codes
| Code | HTTP | Description |
|---|---|---|
| VALIDATION_ERROR | 400 | Invalid request format |
| NOT_FOUND | 404 | Resource not found |
| UNSUPPORTED_FILE_TYPE | 400 | File type not supported |
| MODEL_NOT_FOUND | 404 | Ollama model not installed |
| OLLAMA_UNAVAILABLE | 503 | Cannot connect to Ollama |
| RATE_LIMITED | 429 | Too many requests |
Error Response Format:
json
{
"detail": {
"error": "OLLAMA_UNAVAILABLE",
"message": "Cannot connect to Ollama server at http://localhost:11434",
"timestamp": "2024-01-15T10:30:00.000Z"
}
}SDK Examples
Python
python
import httpx
client = httpx.Client(base_url="http://localhost:8000")
# Chat
response = client.post("/chat/", json={
"message": "Explain AI",
"use_rag": False
})
print(response.json())
# Ingest document
with open("doc.pdf", "rb") as f:
response = client.post("/rag/ingest", files={"file": f})
print(response.json())JavaScript
javascript
// Streaming chat with EventSource
const eventSource = new EventSource('/api/chat?' + new URLSearchParams({
message: 'Hello!',
use_rag: 'false'
}));
eventSource.onmessage = (event) => {
const data = JSON.parse(event.data);
if (data.done) {
console.log('Sources:', data.sources);
eventSource.close();
} else {
process.stdout.write(data.content);
}
};
// Upload document
const formData = new FormData();
formData.append('file', fileInput.files[0]);
await fetch('/rag/ingest', { method: 'POST', body: formData });cURL
bash
# Health check
curl http://localhost:8000/health
# Chat
curl -X POST http://localhost:8000/chat/ \
-H "Content-Type: application/json" \
-d '{"message": "Hello!", "use_rag": false}'
# Ingest document
curl -X POST http://localhost:8000/rag/ingest \
-F "file=@document.pdf"
# List models
curl http://localhost:8000/models/