v0.30.10
Ollama release - check for model support, multi-GPU, and compatibility.
Independent publication // local-first AI // field reports from owned systems
A running publication about local models, private inference, self-hosted agents, weird hardware, research sweeps, and the builders wiring their own AI stack together.
AI is getting physical again. It shows up in terminals, racks, side projects, and ugly little workflows people actually control.
Ollama release - check for model support, multi-GPU, and compatibility.
Agent or tool-use pattern - relevant to local AI workflow automation.
Geohot take - high-signal on open-source AI, tinygrad, and builder philosophy.
Simon Willison signal - surfaces practical patterns before they spread.
A current field note on why the 3x3090 serving path moved through pipeline parallelism, not tensor parallelism, for the tested 32B AWQ model.
A public-safe read on Gemma 4 12B as a local sensory preprocessor: useful for seeing, hearing, and structuring observations without turning into an action system.
A measured note on 300W bursty inference, lower caps for sustained runs, and why power-cap sweet spots are workload-specific.
A field-report read on 14x RTX 3090 agent serving, EXL3, FP8 KV cache, Aphrodite, and why concurrency is becoming the local hardware metric.