Summary

Disclaimer: This summary has been generated by AI. It is experimental, and feedback is welcomed. Please reach out to info@qconsf.com with any comments or concerns.

This presentation by Brian Martin, an expert in performance optimization and distributed systems, provides a comprehensive overview of instrumentation strategies for high-performance systems. The talk highlights critical aspects of collecting performance metrics without compromising system speed and efficiency.

Key Topics Discussed:

Instrumentation Challenges: The balance between too much instrumentation, which can slow down systems, and too little, which can miss critical insights.
Technologies Used:
- EBPF: Extended Berkeley Packet Filter used for dynamic kernel instrumentation to gather detailed metrics without altering the kernel code.
- Prometheus: Standard tool for consuming telemetry data.
Counter and Histogram Techniques: Discussion on cache-line aware design, atomic operations, and the importance of implementation choices in handling metrics.
Fearless Instrumentation: Techniques to instrument systems extensively without fearing performance degradation.

Libaries & Tools:

Resolus: A system performance telemetry agent written in Rust, utilizing EBPF for kernel-level metrics collection.
Matrican: A low-overhead metrics library optimized for performance-critical paths used in IOP Systems projects.

Key Takeaways:

Effective instrumentation is crucial for gaining insights into system performance and resolving production issues.
Implementation details, especially for metrics like histograms, can significantly affect performance, emphasizing the need for well-considered design choices.
By applying the right techniques and using advanced tools, it is possible to build robust systems that offer detailed observability without impairing performance.

The talk encourages deep instrumentation as a means to maintain visibility into system health and improve reliability and scalability.

This presentation is valuable for anyone involved in system performance tuning or interested in scalable instrumentation practices.

This is the end of the AI-generated content.

Abstract

In high-performance code, a single misplaced counter increment can cost more than the operation it’s measuring. That creates a paradox: instrument too much and you slow the system down; instrument too little and you miss the insights you need to continuously deliver.

This talk focuses on techniques for instrumenting latency-sensitive, high-throughput systems with minimal impact—approaches rooted in C and Rust, but with lessons that may apply more broadly. We’ll examine the true costs of metrics collection, the pitfalls of percentile reporting, and how to extend observability from application code down to the kernel using eBPF. Along the way, we’ll discuss cacheline-aware counter design, the trade-offs in struct and memory layout, and the value of a unified metrics framework for both application and infrastructure insights.

Attendees will gain practical, language-level strategies for building observability into performance-critical systems—without sacrificing the speed their users expect.

Speaker

Brian Martin

Co-founder and Software Engineer @IOP Systems, Focused on High-Performance Software and Systems, Previously @Twitter

Brian is a software engineer who focuses on performance optimization and distributed systems. He worked at Twitter for 8 years, initially with the Cache Team and later as a member of the newly created Performance Team. After November 2022, Brian joined his teammates from Twitter as a co-founder of IOP Systems and continues to work on improving software and platform performance, efficiency, and reliability.