Home›Projects›Local AI Inference

Local AI Inference

On-prem heterogeneous AI inference: Mac Silicon MLX + NVIDIA llama.cpp side-by-side, observed end-to-end via OpenTelemetry into Splunk and SigNoz. Full model supply chain with Harbor + KitOps + Nexus + GitOps.

3 posts

21 May 202630 min read

Heterogeneous AI Inference at Home, Backed by an OCI Model Registry: Mac MLX + NVIDIA llama.cpp Side-by-Side, One OpenTelemetry Stream, Watts and Tokens in Splunk + SigNoz

Two clusters, one OpenTelemetry stream, three backends, and a ~100-line Python adapter that turns the bundled `macmon` binary into a first-class OTLP source for SigNoz + Splunk. Every watt, every token, every model swap, visible across the whole on-prem AI stack. Plus four real gotchas from the build (macOS Local Network grants, libp2p mDNS across VLANs, SignalFX trial caps, and how `open -a EXO --args` silently drops arguments).

splunksplunk-observability-cloudsignoz

13 March 202628 min read

RAGtronic: Building a Production AI Platform in Rust, Multi-Model Orchestration, Zero-Trust Auth, and Making LLMs Speak Creole and Aussie

A deep dive into RAGtronic, a production AI platform I built on Rust/Actix-web with 28,000+ lines of backend code. It orchestrates 91 models across 10 providers through LiteLLM, runs dual-layer content safety with NVIDIA NeMo Guardrails and a compiled-in fallback engine, enforces zero-trust authentication via Ory Kratos and Oathkeeper, and through some creative prompt engineering, teaches models to speak Creole and respond with an authentic Aussie personality, injected at the Cloudflare Worker edge layer.

rustactix-webrag

13 November 20257 min read

Persistent Lubuntu Desktop Workspaces for Coder

Deep dive into the dual-disk Coder template: Proxmox + TrueNAS persistence, KasmVNC desktops, Langfuse-observed AI tooling, Wazuh telemetry, and OpenBao-managed secrets.

coderproxmoxtruenas