Thank you for sending your enquiry! One of our team members will contact you shortly.
Thank you for sending your booking! One of our team members will contact you shortly.
Course Outline
AI Sovereignty and Local LLM Deployment
- Challenges of cloud LLMs: data retention risks, input training, and foreign jurisdiction implications.
- Ollama architecture: model server, registry, and OpenAI-compatible API.
- Comparison with vLLM, llama.cpp, and Text Generation Inference.
- Model licensing specifics for Llama, Mistral, Qwen, and Gemma.
Installation and Hardware Configuration
- Installing Ollama on Linux with CUDA and ROCm support.
- CPU-only fallback mechanisms and AVX/AVX2 optimization.
- Docker deployment strategies and persistent volume mapping.
- Multi-GPU setups and VRAM allocation strategies.
Model Management
- Retrieving models from the Ollama registry: using 'ollama pull llama3'.
- Importing GGUF models from HuggingFace and TheBloke.
- Understanding quantization levels: trade-offs between Q4_K_M, Q5_K_M, and Q8_0.
- Model switching and limits on concurrent model loading.
Custom Modelfiles
- Writing Modelfile syntax: utilizing FROM, PARAMETER, SYSTEM, and TEMPLATE.
- Tuning temperature, top_p, and repeat_penalty.
- Engineering system prompts for role-specific behaviors.
- Creating and publishing custom models to the local registry.
API Integration
- Utilizing the OpenAI-compatible /v1/chat/completions endpoint.
- Handling streaming responses and JSON mode.
- Integration with LangChain, LlamaIndex, and custom applications.
- Implementing authentication and rate limiting via reverse proxy.
Performance Optimization
- Managing context window sizing and KV cache.
- Batch inference and parallel request handling.
- CPU thread allocation and NUMA awareness.
- Monitoring GPU utilization and memory pressure.
Security and Compliance
- Network isolation for model serving endpoints.
- Input filtering and output moderation pipelines.
- Audit logging of prompts and completions.
- Verifying model provenance and hashes.
Requirements
- Intermediate proficiency in Linux and container administration.
- High-level understanding of machine learning concepts and transformer models.
- Familiarity with REST APIs and JSON data formats.
Target Audience
- AI engineers and developers seeking to replace cloud LLM APIs.
- Organizations with stringent data privacy concerns that preclude cloud model usage.
- Government and defense teams necessitating air-gapped language model solutions.
14 Hours