The pros and cons of eBPF profiling

September 29, 2022 · 6 min read

What is eBPF?#

At its root, eBPF takes advantage of the kernel’s privileged ability to oversee and control the entire system. With eBPF you can run sandboxed programs in a privileged context such as the operating system kernel. To better understand the implications and learn more check out this blog post which goes into much more detail. For profiling this typically means running a program that pulls stacktraces for the whole system at a constant rate (e.g 100Hz).

As you can see in the diagram, some of the most popular use cases for eBPF are related to Networking, Security, and most relevant to this blog post — observability (logs, metrics, traces, and profiles).

Landscape of eBPF profiling#

Over the past few years there has been significant growth in the profiling space as well as the eBPF space and there are a few notable companies and open source projects innovating at the intersection of profiling and eBPF. Some examples include:

Pyroscope
Pixie
Parca
Prodfiler (closed source)

The collective growth is representative of the rapidly growing interest in this space as projects like Pyroscope, Pixie, and Parca all gained a significant amount of traction over this time period.

It's also worth noting that the growth of profiling is not limited to eBPF, the prevalence of profiling tools has grown to the point where it is now possible to find a tool for almost any language or runtime. As a result, profiling is more frequently being considered a first-class citizen in observability suites.

For example, OpenTelemetry has kicked off efforts to standardize profiling in order to enable more effective observability. For more information on those efforts check out the #otel-profiling channel on the CNCF slack!

Pros and cons of eBPF and non-eBPF profiling#

When it comes to modern continuous profiling, there are two ways of getting profiling data:

User-space level: Popular profilers like pprof, async-profiler, rbspy, py-spy, pprof-rs, dotnet-trace, etc. operate at this level
Kernel level: eBPF profilers and linux perf are able to get stacktraces for the whole system from the kernel

Pyroscope is designed to be language agnostic and supports ingesting profiles originating from either or both of these methods.

However, each approach comes with its own set of pros and cons:

Pros and Cons of
native-language profiling

Pros

Ability tag application code in flexible way (i.e, tagging spans, controllers, functions)
Ability to profile various specific parts of code (i.e. Lambda functions, Test suites, scripts)
Ability/simplicity to profile other types of data (i.e. Memory profiling, goroutines)
Consistency of access to symbols across all languages
Simplicity of using in local development

Cons

Complexity, for large multi-language systems, to get fleet-wide view
Constraints on ability to auto-tag infrastructure metadata (i.e. kubernetes)

Pros and Cons of
eBPF profiling

Pros

Ability to get fleet-wide, whole-system metrics easily
Ability to auto-tag metadata that's available when profiling whole system (i.e. kubernetes pods, namespaces)
Simplicity of adding profiling at infrastructure level (i.e. multi-language systems)
Consistency of access to symbols across all languages
Simplicity of using in local development

Cons

Requirements call for particular linux kernel versions
Constraints on being able to tag user-level code
Constraints on performant ways to retrieve certain profile types (i.e. memory, goroutines)
Difficulty of developing locally for developers

Pyroscope's solution: Merge eBPF profiling and native-language profiling#

We believe that there's benefits to both eBPF and native-language profiling and our focus long term is to integrate them together seamlessly in Pyroscope. The cons of eBPF profiling are the pros of native-language profiling and vice versa. As a result, the best way to get the most value out of profiling itself is to actually combine the two.

Profiling compiled languages (Golang, Java, C++, etc.)#

When profiling compiled languages, like Golang, the eBPF profiler is able to get very similar information to the non-eBPF profiler.

Native Golang
eBPF Golang

Frame width represents CPU time per function

Pyroscope

Profiling interpreted languages (Ruby, Python, etc.)#

With interpreted languages like Ruby or Python, stacktraces in their runtimes are not easily accessible from the kernel. As a result, the eBPF profiler is not able to parse user-space stack traces for interpreted languages. You can see how the kernel interprets stack traces of compiled languages (go) vs how the kernel interprets stack traces from interpreted languages (ruby/python) in the examples below.

Native Python
eBPF Python
Native Ruby
eBPF Ruby

Frame width represents CPU time per function

Pyroscope

How to use eBPF for cluster level profiling#

Using Pyroscope's auto-tagging feature in the eBPF integration you can get a breakdown of cpu usage by kubernetes metadata. In this case, we can see which namespace is consuming the most cpu resources for our demo instance after adding Pyroscope with two lines of code:

# Add Pyroscope eBPF integration to your kubernetes clusterhelm repo add pyroscope-io https://pyroscope-io.github.io/helm-charthelm install pyroscope-ebpf pyroscope-io/pyroscope-ebpf

and you can also see the flamegraph representing CPU utilization for the entire cluster:

Internally, we use a variety of integrations to get both a high level overview of what's going on in our cluster, but also a very detailed view for each runtime that we use:

We use our eBPF integration for our kubernetes cluster
We use ruby gem, pip package, go client, and java client with tags for our k8s services and github action test suites
We us our otel-profiling integrations (go, java) to get span-specific profiles inside our traces
We use our lambda extension to profile the code inside lambda functions

The next evolution: merging kernel and user-space profiling#

With the help of our community we've charted out several promising paths to improving our integrations by merging the eBPF and user-space profiles within the integration. One of the most promising approaches is using:

Non-eBPF language-specific integrations for more granular control and analytic capabilities (i.e. dynamic tags and labels)
eBPF integration for a comprehensive view of the whole cluster

Screen Shot 2022-09-27 at 8 49 41 PM

Stay tuned for more progress on these efforts. In them meantime check out the docs to get started with eBPF or the other integrations!

What is eBPF?#

Landscape of eBPF profiling#

Pros and cons of eBPF and non-eBPF profiling#

Pros and Cons ofnative-language profiling

Pros

Cons

Pros and Cons ofeBPF profiling

Pros

Cons

Pyroscope's solution: Merge eBPF profiling and native-language profiling#

Profiling compiled languages (Golang, Java, C++, etc.)#

Profiling interpreted languages (Ruby, Python, etc.)#

How to use eBPF for cluster level profiling#

The next evolution: merging kernel and user-space profiling#

Pros and Cons of
native-language profiling

Pros and Cons of
eBPF profiling