Heterogeneous AI Inference at Home, Backed by an OCI Model Registry: Mac MLX + NVIDIA llama.cpp Side-by-Side, One OpenTelemetry Stream, Watts and Tokens in Splunk + SigNoz
Two clusters, one OpenTelemetry stream, three backends, and a ~100-line Python adapter that turns the bundled `macmon` binary into a first-class OTLP source for SigNoz + Splunk. Every watt, every token, every model swap, visible across the whole on-prem AI stack. Plus four real gotchas from the build (macOS Local Network grants, libp2p mDNS across VLANs, SignalFX trial caps, and how `open -a EXO --args` silently drops arguments).