Topic Hub

AI Infrastructure and Operations

Independent research on AI infrastructure, LLM operations, AI agents, model serving, GPU telemetry, and reliability for AI systems.

Curated Writing

26 posts in this signal path.

7 min readAI Infrastructure

AI-Powered Security Operations: What Actually Works in 2026

Security operations and SRE share more DNA than either community usually acknowledges. Both involve monitoring large volumes of signals to detect anomalies, both require rapid triage and investigation when something goes wrong, and both are fighting the same...

Read post →
7 min readAI Infrastructure

I Modeled a 6x Cloud Cost Reduction with an LLM Agent

Cloud cost optimization is one of those problems that's theoretically easy and practically miserable. Everyone knows the levers: right-size instances, delete unused resources, use Spot where possible, move cold data to cheaper storage tiers.

Read post →