Skip to main content

· 6 min read

What is eBPF?#

At its root, eBPF takes advantage of the kernel’s privileged ability to oversee and control the entire system. With eBPF you can run sandboxed programs in a privileged context such as the operating system kernel. To better understand the implications and learn more check out this blog post which goes into much more detail. For profiling this typically means running a program that pulls stacktraces for the whole system at a constant rate (e.g 100Hz).

image

As you can see in the diagram, some of the most popular use cases for eBPF are related to Networking, Security, and most relevant to this blog post — observability (logs, metrics, traces, and profiles).

Landscape of eBPF profiling#

Over the past few years there has been significant growth in the profiling space as well as the eBPF space and there are a few notable companies and open source projects innovating at the intersection of profiling and eBPF. Some examples include:

The collective growth is representative of the rapidly growing interest in this space as projects like Pyroscope, Pixie, and Parca all gained a significant amount of traction over this time period.

It's also worth noting that the growth of profiling is not limited to eBPF, the prevalence of profiling tools has grown to the point where it is now possible to find a tool for almost any language or runtime. As a result, profiling is more frequently being considered a first-class citizen in observability suites.

For example, OpenTelemetry has kicked off efforts to standardize profiling in order to enable more effective observability. For more information on those efforts check out the #otel-profiling channel on the CNCF slack!

Pros and cons of eBPF and non-eBPF profiling#

When it comes to modern continuous profiling, there are two ways of getting profiling data:

  • User-space level: Popular profilers like pprof, async-profiler, rbspy, py-spy, pprof-rs, dotnet-trace, etc. operate at this level
  • Kernel level: eBPF profilers and linux perf are able to get stacktraces for the whole system from the kernel

Pyroscope is designed to be language agnostic and supports ingesting profiles originating from either or both of these methods.

However, each approach comes with its own set of pros and cons:

Pros and Cons of
native-language profiling

Pros

  • Ability tag application code in flexible way (i.e, tagging spans, controllers, functions)
  • Ability to profile various specific parts of code (i.e. Lambda functions, Test suites, scripts)
  • Ability/simplicity to profile other types of data (i.e. Memory profiling, goroutines)
  • Consistency of access to symbols across all languages
  • Simplicity of using in local development

Cons

  • Complexity, for large multi-language systems, to get fleet-wide view
  • Constraints on ability to auto-tag infrastructure metadata (i.e. kubernetes)

Pros and Cons of
eBPF profiling

Pros

  • Ability to get fleet-wide, whole-system metrics easily
  • Ability to auto-tag metadata that's available when profiling whole system (i.e. kubernetes pods, namespaces)
  • Simplicity of adding profiling at infrastructure level (i.e. multi-language systems)
  • Consistency of access to symbols across all languages
  • Simplicity of using in local development

Cons

  • Requirements call for particular linux kernel versions
  • Constraints on being able to tag user-level code
  • Constraints on performant ways to retrieve certain profile types (i.e. memory, goroutines)
  • Difficulty of developing locally for developers

Pyroscope's solution: Merge eBPF profiling and native-language profiling#

We believe that there's benefits to both eBPF and native-language profiling and our focus long term is to integrate them together seamlessly in Pyroscope. The cons of eBPF profiling are the pros of native-language profiling and vice versa. As a result, the best way to get the most value out of profiling itself is to actually combine the two.

Profiling compiled languages (Golang, Java, C++, etc.)#

When profiling compiled languages, like Golang, the eBPF profiler is able to get very similar information to the non-eBPF profiler.

Frame width represents CPU time per function
Pyroscope

Profiling interpreted languages (Ruby, Python, etc.)#

With interpreted languages like Ruby or Python, stacktraces in their runtimes are not easily accessible from the kernel. As a result, the eBPF profiler is not able to parse user-space stack traces for interpreted languages. You can see how the kernel interprets stack traces of compiled languages (go) vs how the kernel interprets stack traces from interpreted languages (ruby/python) in the examples below.

Frame width represents CPU time per function
Pyroscope

How to use eBPF for cluster level profiling#

Using Pyroscope's auto-tagging feature in the eBPF integration you can get a breakdown of cpu usage by kubernetes metadata. In this case, we can see which namespace is consuming the most cpu resources for our demo instance after adding Pyroscope with two lines of code:

# Add Pyroscope eBPF integration to your kubernetes clusterhelm repo add pyroscope-io https://pyroscope-io.github.io/helm-charthelm install pyroscope-ebpf pyroscope-io/pyroscope-ebpf

image

and you can also see the flamegraph representing CPU utilization for the entire cluster: image

Internally, we use a variety of integrations to get both a high level overview of what's going on in our cluster, but also a very detailed view for each runtime that we use:

  • We use our eBPF integration for our kubernetes cluster
  • We use ruby gem, pip package, go client, and java client with tags for our k8s services and github action test suites
  • We us our otel-profiling integrations (go, java) to get span-specific profiles inside our traces
  • We use our lambda extension to profile the code inside lambda functions

The next evolution: merging kernel and user-space profiling#

With the help of our community we've charted out several promising paths to improving our integrations by merging the eBPF and user-space profiles within the integration. One of the most promising approaches is using:

  • Non-eBPF language-specific integrations for more granular control and analytic capabilities (i.e. dynamic tags and labels)
  • eBPF integration for a comprehensive view of the whole cluster

Screen Shot 2022-09-27 at 8 49 41 PM

Stay tuned for more progress on these efforts. In them meantime check out the docs to get started with eBPF or the other integrations!

· 7 min read

floating_cloud_01

We started Pyroscope a few years ago, because we had seen, first-hand, how profiling was a powerful tool for improving performance, but at the time profiling was also not very user-friendly. Since then, we've been working hard not only to make Pyroscope easier to use, but also to make it easier to get value out of it.

As it stands today, Pyroscope has evolved to support an increasingly wide array of our community's day-to-day workflows by adding a valuable extra dimension to their observability stack:

Application Developers

  • Resolve spikes / increases in CPU usage
  • Locate and fix memory leaks and memory errors
  • Understand call trees of your applications
  • Clean up unused / dead code

SREs

  • Create performance-driven culture in dev cycle
  • Spot performance regressions in codebase proactively
  • Configure monitoring / alerts for infrastructure
  • Optimize test suites and team productivity

Engineering Managers

  • Get real-time analysis of resource utilization
  • Make efficient cost allocations for resources
  • Use insights for better decision making

Why we built a cloud service#

As our community has grown to include this diverse set of companies, users, and use-cases, we've had more people express interest in getting all the value from using Pyroscope, but without some of the costs that come with maintaining and scaling open source software. Some of the other reasons we decided to build a cloud service include:

  • Companies who have less time/resources to dedicate to setting up Pyroscope
  • Companies operating at scale who need an optimized solution that can handle the volume of data that is produced by profiling applications at scale
  • Users who are less technical and want a solution that's easy to use and requires little to no configuration
  • Users who want access to the latest features and bug fixes as soon as they are released (with zero downtime)
  • Users who want additional access to the Pyroscope team's profiling expertise and support (past our community Slack and GitHub)

And from our side, we believe that a cloud product will:

  • Making it easier for more companies to adopt Pyroscope
  • Providing more feedback to help prioritize features on our road map
  • Providing more resources to invest in Pyroscope's open source projects
  • Making it easier to offer integrations with other tools in the observability stack (e.g. Grafana, Honeycomb, Gitlab, Github, etc.)

Plus, we got to solve a lot of really cool challenges along the way!

Introducing Pyroscope Cloud#

Today we are excited to announce the general availability of Pyroscope Cloud, our hosted version of Pyroscope!

Pyroscope Cloud enables you to achieve your observability goals by removing concerns around setup, configuration, and scaling. It's designed to be easy to use and gives you a significant amount of insight into your application's performance with very minimal configuration.

Some notable features for the cloud include:

  • Horizontal scalability
  • Support for high-cardinality profiling data
  • Zero-downtime upgrades
  • Data encryption at rest and in transit
  • Compliance with SOC 2
  • Extra support options beyond public Slack / Github
  • Tracing integrations (Honeycomb and Jaeger)

Pyroscope Cloud's Major Scaling Improvements#

Similar to Pyroscope Open Source Software(OSS), the cloud service is designed to store, query, and analyze profiling data as efficiently as possible. However, certain limitations that fundamentally limit the scalability of Pyroscope OSS (for now) have been removed in Pyroscope Cloud.

When running Pyroscope OSS at scale, eventually people run into the limitations of the Open Source storage engine. It is built around badgerDB, which is an embeddable key-value database written in Go. The reliance on this component makes the OSS version of Pyroscope scale vertically but not horizontally.

In the cloud, we replace BadgerDB with a distributed key-value store which allows more freedom to scale Pyroscope horizontally. We leverage many of the techniques used by Honeycomb and many Grafana projects (i.e. Loki, Tempo) but with particular adjustments made for the unique requirements of profiling data (stay tuned for future blog post on this).

This means that with Pyroscope Cloud you don't need to worry about limiting the number of applications, profiles, and tag cardinality that you need in order to get the most out of Pyroscope!

Pyroscope Cloud's Major Features#

We've built Pyroscope Cloud with several different use cases in mind:

Continuous profiling for system-wide visibility#

This feature used for profiling your applications across various environments. Most agents have support for tags which will let you differentiate between different environments (e.g. staging vs production) and other metadata (e.g. pod, namespace, region, version, git commit, pr, etc.). Using single, comparison, or diff view in combination with these tags will let you easily understand and debug performance issues.

Tag Explorer View

  • Which tags are consuming the most cpu and memory resources?
  • How did the performance of my application change between versions?
  • What is our most/least frequently used code paths?
  • Which libraries are consuming the most resources?
  • Where are memory leaks originating?
  • etc.

Adhoc profiling for deep-dive debugging#

This feature is used for when you may want to profile something in a more adhoc manner. Most commonly people use this feature to either upload previously recorded profiles or save a profile for a particular script. Where many used to save profiles to a random folder on their computer, they can use our adhoc page to store them more efficiently, share them with others, and view them with the same powerful UI that they use for continuous profiling.

ebpf adhoc diff

Tracing exemplars for transaction-level visibility#

This feature is used for when you want to correlate profiling data with tracing data. While traces often will tell you where your application is running slow, profiling gives more granular detail into why and what particular lines of code are responsible for the performance issues. This view gives you a heatmap of span durations. We also have integrations with a few popular tracing tools:

Tracing exemplars

Profile upload API for automated workflows and migrations#

Over time we've found that some of the major companies in various sectors have built their own internal profiling systems which often will ultimately dump a large collection of profiles into some storage system like S3.

Pyroscope Cloud's API is built to accept many popular formats for profiling data and then store them in a way that is optimized for querying and analysis. This means that you can redirect your existing profiling data to Pyroscope Cloud and then use the same UI that you use for continuous profiling to analyze it.

# First get API_TOKEN from https://pyroscope.cloud/settings/api-keys
# Ingest profiles in pprof formatcurl \  -H "Authorization: Bearer $API_TOKEN" \  --data-binary @cpuprofile.pb.gz \  "https://pyroscope.cloud/ingest?format=pprof&from=1669931604&until=1669931614&name=my-app-name-pprof"

How to get started with Pyroscope Cloud#

Migrating from Pyroscope OSS#

In order to migrate from Pyroscope OSS to Pyroscope Cloud, you can use our remote write feature to send your data to Pyroscope Cloud. This will allow you to continue using Pyroscope OSS while you migrate your data to Pyroscope Cloud. remote_write_diagram_01_optimized

You can also get started directly with Pyroscope Cloud, by signing up for a free account at pyroscope.cloud.

What's Next for Pyroscope Cloud#

  • CI/CD Integrations (GitHub, GitLab, CircleCI, etc.): We've heard many using Pyroscope to profiling their testing suite and we have plans (link) for a UI specifically geared towards analyzing this data
  • More integrations (Tempo, PagerDuty, etc.)
  • More features (Alerting, etc.)
  • More documentation (Tutorials, etc.)

· 4 min read

Profile AWS Lambda Functions

What is AWS Lambda?#

AWS lambda is a popular serverless computing service that allows you to write code in any language and run it on AWS. In this case "serverless" means that rather than having to manage a server or set of servers, you can instead run your code on-demand on highly-available machines in the cloud.

Lambda manages your "serverless" infrastructure for you including:

  • Server maintenance
  • Automatic scaling
  • Capacity provisioning
  • and more

AWS Lambda functions are a "black box"#

However, the tradeoff that happens as a result of using AWS Lambda is that because AWS handles so much of the infrastructure and management for you, it ends up being somewhat of a "black box" with regards to:

  • Cost: You have little insight into why your Lambda function is costs so much or what functions are responsible
  • Performance: you often run into hard-to-debug latency or memory issues when running your Lambda function
  • Reliability: You have little insight into why your Lambda function is failing as much as its failing

Depending on availability of resources, these issues can be balloon over time until they become an expensive foundation which is hard to analyze and fix post-facto once much of your infrastructure relies on these functions.

Continuous Profiling for Lambda: A window into the serverless "black box" problem#

Continuous Profiling is a method of analyzing the performance of an application giving you a breakdown of which lines of code are consuming the most CPU or memory resources and how much of each resource is being consumed. Since, by definition, a Lambda function is a collection of many lines of code which consume resources (and incur costs) on demand, it makes sense that profiling is the perfect tool to use to understand how you can optimize your Lambda functions and allocate resources to them.

While you can already use our various language-specific integrations to profile your Lambda functions, with the naive approach, adding Pyroscope will add extra overhead to the critical path due to how the Lambda Execution Lifecycle works: image

However, we've introduced a more optimal solution which gives you insight into the Lambda "black box" without adding extra overhead to the critical path of your Lambda Function: our Pyroscope AWS Lambda Extension.

Pyroscope Lambda extension adds profiling support without impacting critical path performance#

lambda_image_04-01

This solution makes use of the extension to delegate profiling-related tasks to an asynchronous path which allows for the critical path to continue to run while the profiling related activities are being performed in the background. You can then use the Pyroscope UI to dive deeper into the various profiles and make the necessary changes to optimize your Lambda function!

How to add Pyroscope's Lambda Extension to your Lambda Function#

Pyroscope's lambda extension works with our various agents and documentation on how to integrate with those can be found in the integrations section of our documentation. Once you've added the agent to your code, there are just two steps needed to get up and running with profiling lambda extension:

Step 1: Add a new Layer in "function settings"#

Add a new layer using the latest "layer name" from our releases page.

lambda_add_a_layer_01-01

Step 2: Add environment variables to configure where to send the profiling data#

You can send data to either either pyroscope cloud or any running pyroscope server. This is configured via environment variables like this: lambda_env_variables_01-01

Lambda Function profile#

Here's an interactive flamegraph output of what you will end up with after you add the extension to your Lambda Function: image

While this flamegraph is exported for the purposes of this blog post in the Pyroscope UI you have additional tools for analyzing profiles such as:

  • Labels: to view function cpu performance or memory over time using FlameQL
  • Time controls: to select and filter for particular time periods of interest
  • Diff view: Compare two profiles and see the differences between them
  • And many more!

· 3 min read

Stop screenshotting Flamegraphs and start embedding them#

Typically a flamegraph is most useful when you're able to click into particular nodes or stack traces to understand the program more deeply. After several blog posts where we featured flamegraphs as a key piece of the posts we found that screenshotting pictures of flamegraphs was missing this key functionality compared to being able to interact with flamegraphs.

As a result, we created flamegraph.com to have a place where users can upload, view, and share flamegraphs.

We recently released an update to flamegraph.com that makes it easy to embed flamegraphs in your blog or website. The steps to embed a flamegraph are:

  1. Upload a flamegraph or flamegraph diff to flamegraph.com
  2. Click the "Embed" button
  3. Copy the "Copy" button to copy the embed code snippet
  4. Paste the embed code snippet into your blog or website

clicking_embed_button_high_res

· 6 min read

Grafana is an open-source observability and monitoring platform used by individuals and organizations to monitor their applications and infrastructures. Grafana leverages the three pillars of observability, metrics, logs, and traces, to deliver insights into how well your systems are doing. Nowadays, Observability involves a whole lot more than metrics, logs, and tracing; it also involves profiling.

In this article, I will:

  1. Describe how to leverage continuous profiling in Grafana by installing the Pyroscope flamegraph panel and datasource plugin
  2. Show how to configure the plugins properly
  3. Explain how to setup your first dashboard that includes profiling
  4. Give a sneak peak of an upcoming feature that will let you link profiles to logs, metrics, and traces

If you're new to flamegraphs and would like to learn more about what they are and how to use them, see this blog post.

Introduction#

pillars-of-observability-complete

Grafana provides you with tools to visualize your metrics, view logs, and analyze traces, but it is incomplete without the added benefits of profiling. Continuous Profiling is super critical when you’re looking to debug an existing performance issue in your application. It enables you to monitor your application’s performance over time and provides insights into parts of your application that are consuming resources the most. Continuous profiling is used to locate and fix memory leaks, clean up unused code, and understand the call tree of your application. This results in a more efficient application.

Benefits of using Pyroscope in Grafana#

Unified view for complete observability#

Using Pyroscope in Grafana provides you with complete observability without leaving your Grafana dashboard. Grafana leverages the powerful features of Pyroscope to take complete control of your application’s end-to-end observability and makes things like debugging easy. You can now see your profiles alongside corresponding logs, metrics, and traces to tell the complete story of your application.

Zero migration cost#

It costs nothing to migrate your application profile from Pyroscope’s UI dashboard into Grafana. Simply open Grafana and install both the Pyroscope panel and datasource plugin, and you’re all set!

left-right: flamegraph in Pyroscope, flamegraph in Grafana

· 6 min read

Introduction#

In this article, I will introduce you to flamegraphs, how to use them, how to read them, what makes them unique, and their use cases.

What is a flamegraph#

A flamegraph is a complete visualization of hierarchical data (e.g stack traces, file system contents, etc) with a metric, typically resource usage, attached to the data. Flamegraphs were originally invented by Brendan Gregg. He was inspired by the inability to view, read, and understand stack traces using the regular profilers to debug performance issues. The flamegraph was created to fix this exact problem.

Flamegraphs allow you to view a call stack in a much more visual way. They give you insight into your code performance and allow you to debug efficiently by drilling down to the origin of a bug, thereby increasing the performance of your application.

· 5 min read

Coming from a background working as a frontend developer at Grafana I'm no stranger to open source performance monitoring. I was part of a team that was responsible for the overall user experience of Grafana and performance was one of the key considerations. Along the line, I learned about a debugging technique known as profiling for monitoring application performance and fell in love ever since.

chrome browser profiler

What is continuous profiling#

“Profiling” is a dynamic method of analyzing the complexity of a program, such as CPU utilization or the frequency and duration of function calls. With profiling, you can locate exactly which parts of your application are consuming the most resources. “Continuous profiling” is a more powerful version of profiling that adds the dimension of time. By understanding your system's resources over time, you can then locate, debug, and fix issues related to performance.

As a frontend developer, my experience with profiling was limited to the browser. However, in the course of my study, I discovered a new pattern of profiling that seems exciting– continuous profiling. Similar to how you use the profiler in the dev console to understand frontend performance issues, continuous profiling allows you to profile servers from various languages 24/7 and be able to understand resource usage at any particular time.

While continuous profiling is new to many, the concept is actually relatively old. In 2010, Google released a paper titled “Google Wide profiling: A continuous profiling infrastructure for data centers” where they make the case for the value of adding continuous profiling to your applications.

Industry traction for continuous profiling#

Since then, many major performance monitoring solutions have joined them in releasing continuous profiling products. As time has gone on, the continuous profiling space has been getting increasingly popular as various companies/VCs are more frequently making major investments in Continuous Profiling to keep up with demand for this type of monitoring.

continuous profiling trends

· 4 min read

image

Why we added adhoc profiling#

While most profilers are built for more static or adhoc analysis (ie.profiling a script), Pyroscope's continuous profiling gives you the opportunity to jump around to any point in time. This fluid profiling is beneficial for understanding performance issues in your application. However, as we continued to improve on Pyroscope UI/UX, we identified ideal situations to use static profiling instead of running a profiler continuously, including profiling scripts and attaching a running process.

Our goal is to make Pyroscope a one-stop-shop for all profiling needs. That means supporting all languages, all flamegraph formats, continuously profiling servers, and, of course, quickly profiling an adhoc script.

Introducing Adhoc profiling#

That being said, we are excited to officially release Adhoc profiling mode for Pyroscope! With adhoc mode, you get all the convenience and simplicity of profiling a script, as well as Pyroscope's stellar visualization and UI functionality.

· 6 min read

How we improved performance of our Go application#

Recently we released a new feature where users can run Pyroscope in pull mode. It allows you to pull profiling data from applications and it has various discovery mechanisms so that you can easily integrate with things like kubernetes and start profiling all of your pods with minimum setup.

For Pyroscope, the difference between push and pull mode is that:

  • Push mode: Sends a POST request with profiling data from the application to the Pyroscope server and return a simple response
  • Pull mode: Pyroscope sends a GET request to targets (identified in config file) and the targets return profiling data in the response.

push_vs_pull_diagram_07