Independent publication // local-first AI // field reports from owned systems

Nodehome

A running publication about local models, private inference, self-hosted agents, weird hardware, research sweeps, and the builders wiring their own AI stack together.

AI is getting physical again. It shows up in terminals, racks, side projects, and ugly little workflows people actually control.

Latest

Daily Sweep - Jun 15, 2026

infrawatchOllama ReleasesJun 07, 2026

v0.30.5

Ollama release - check for model support, multi-GPU, and compatibility.

infrawatchvLLM ReleasesJun 15, 2026

v0.23.0

vLLM release - check for tensor parallelism, memory, and throughput changes.

infrawatchvLLM ReleasesJun 05, 2026

v0.22.1

vLLM release - check for tensor parallelism, memory, and throughput changes.

infrawatchvLLM ReleasesJun 02, 2026

v0.22.0

vLLM release - check for tensor parallelism, memory, and throughput changes.

infrawatchOllama ReleasesJun 12, 2026

v0.30.8

Ollama release - check for model support, multi-GPU, and compatibility.

infrawatchOllama ReleasesJun 08, 2026

v0.30.7

Ollama release - check for model support, multi-GPU, and compatibility.

infrawatchOllama ReleasesJun 07, 2026

v0.30.6

Ollama release - check for model support, multi-GPU, and compatibility.

hardwarewatchllama.cpp CommitsJun 15, 2026

CUDA: only support F32/F16 for GGML_OP_REPEAT (#24533)

Performance-sensitive backend path - could affect local throughput.

Field Reports

builds, experiments, notes

Field Note Jun 05, 2026

Three RTX 3090s, One 32B Model: A Pipeline-Parallel Canary

A current field note on why the 3x3090 serving path moved through pipeline parallelism, not tensor parallelism, for the tested 32B AWQ model.

Research Jun 05, 2026

Gemma 4 12B And The Sensory Agent Lane

A public-safe read on Gemma 4 12B as a local sensory preprocessor: useful for seeing, hearing, and structuring observations without turning into an action system.

Hardware

machines, thermals, economics

Hardware Jun 05, 2026

Power Caps On Three RTX 3090s: Bursts Versus Sustained Load

A measured note on 300W bursty inference, lower caps for sustained runs, and why power-cap sweet spots are workload-specific.

Hardware Jun 05, 2026

Parallel Agent Serving Is A Hardware Shape Now

A field-report read on 14x RTX 3090 agent serving, EXL3, FP8 KV cache, Aphrodite, and why concurrency is becoming the local hardware metric.