Local AI Inference
On-prem heterogeneous AI inference: Mac Silicon MLX + NVIDIA llama.cpp side-by-side, observed end-to-end via OpenTelemetry into Splunk and SigNoz. Full model supply chain with Harbor + KitOps + Nexus + GitOps.
Heterogeneous AI Inference at Home: Mac MLX + NVIDIA llama.cpp Side-by-Side, One OpenTelemetry Stream, Watts and Tokens in Splunk + SigNoz
Two clusters, one OpenTelemetry stream, three backends, and a ~100-line Python adapter that turns the bundled `macmon` binary into a first-class OTLP source for SigNoz + Splunk. Every watt, every token, every model swap, visible across the whole on-prem AI stack. Plus four real gotchas from the build (macOS Local Network grants, libp2p mDNS across VLANs, SignalFX trial caps, and how `open -a EXO --args` silently drops arguments).
RAGtronic: Building a Production AI Platform in Rust, Multi-Model Orchestration, Zero-Trust Auth, and Making LLMs Speak Creole and Aussie
A deep dive into RAGtronic, a production AI platform I built on Rust/Actix-web with 28,000+ lines of backend code. It orchestrates 91 models across 10 providers through LiteLLM, runs dual-layer content safety with NVIDIA NeMo Guardrails and a compiled-in fallback engine, enforces zero-trust authentication via Ory Kratos and Oathkeeper, and through some creative prompt engineering, teaches models to speak Creole and respond with an authentic Aussie personality, injected at the Cloudflare Worker edge layer.
Persistent Lubuntu Desktop Workspaces for Coder
Deep dive into the dual-disk Coder template: Proxmox + TrueNAS persistence, KasmVNC desktops, Langfuse-observed AI tooling, Wazuh telemetry, and OpenBao-managed secrets.