Speaking Archive

Technical talks from the intersection of SRE, data platforms, Kubernetes, and AI operations.

A curated wall of past presentations covering Apache Spark reliability, Prometheus monitoring, OpenShift-native ML workflows, GPU observability, MLFlow, and anomaly detection.

talks.indexed7
primary.topics7
archive.range2017-2019

Presentation Signal Wall

Past sessions, organized like an ops dashboard.

Each talk is a different telemetry stream: distributed data systems, Prometheus, ML platform operations, GPU monitoring, and anomaly detection.

talk.012017Streaming Data

Building Robust Data Pipeline with Apache Spark

Resilient Spark pipelines, data platform reliability, and operational design.

Speakers: Zak Hassan

talk.022018Prometheus

Scalable Monitoring Using Prometheus with Apache Spark

Metrics architecture for Spark workloads and high-cardinality operational signals.

Speakers: Zak Hassan and Diane Feddema

talk.032019ML Platforms

MLFlow Model Lifecycle Manager and Kubernetes Operator

Experiment tracking, Kubernetes operators, and reproducible ML workflows.

Speakers: Zak Hassan and Mani Parkhe

talk.042019AIOps

Log Anomaly Detector Service

Anomaly detection patterns for noisy logs and operational triage.

Speakers: Zak Hassan

talk.052019GPU Observability

Monitoring of GPU Usage with TensorFlow Models Using Prometheus

GPU telemetry, TensorFlow workloads, and Prometheus-based visibility.

Speakers: Zak Hassan and Diane Feddema

talk.062019NLP Observability

Unsupervised NLP for Log Anomaly Detection

Using unsupervised language techniques to surface suspicious log patterns.

Speakers: Zak Hassan and Michael Clifford

talk.072019OpenShift ML

Introduction to Hyperparameter Tuning in Machine Learning with MLFlow

MLFlow operations on Kubernetes-native infrastructure.

Speakers: Zak Hassan and Hema Veeradhi

Compare Notes

Want to discuss SRE, observability, Kubernetes, or AI infrastructure?

These talks are older public artifacts, but the operating themes still map directly to how I think about platform reliability, telemetry, and infrastructure leverage today.

Connect on LinkedIn