Profiling today looks very different than it did just a few years ago. As people move to more cloud-native workloads
continuous profiling has evolved into a key piece of many companies' observability suites. At Pyroscope, we've been a huge
part of this evolution thanks to an ever-expanding community who has provided great insight into the use cases where profiling
is most valuable and how we can continue to improve that experience.
As a result, over the past few years we've released several products to help developers improve their applications' performance.
Continuous Profiling: Our most popular a tool for continuously profiling your applications accross your entire system and then
storing and querying that data efficiently
Adhoc Profiling: In cases where you may not need to constantly profile or perhaps you'd like to save a snapshot of a profile,
our adhoc tool allows you to both capture and save specific profiles to later refer to or use for your convenience
Profiling Exemplars: Profiles linked to particular meaningful units such as HTTP requests or trace spans
Now, we're excited to announce the latest addition to the Pyroscope family - CI Profiling. Continuous Integration and Delivery (CI/CD) pipelines are critical for modern software development, but they can also be a
source of frustration and inefficiency. Waiting for long test runs, dealing with frequent failures and timeouts, and wasting
resources are all common problems associated with CI/CD pipelines. These issues can be compounded when multiple developers
are working on the same codebase or when teams are working across multiple repositories. That's why we've developed this new feature that can help:
Continuous Profiling with Pyroscope in your CI/CD pipelines.
Go arenas are an experimental feature. The API and implementation is completely unsupported and go team makes no guarantees about compatibility or whether it will even continue to exist in any future release.
Go 1.20 introduces an experimental concept of "arenas" for memory management, which can be used to improve the performance of your Go programs. In this blog post, we'll take a look at:
What are arenas
How do they work
How can you determine if your programs could benefit from using arenas
How we used arenas to optimize one of our services
Go is a programming language that utilizes garbage collection, meaning that the runtime automatically manages memory allocation and deallocation for the programmer.
This eliminates the need for manual memory management, but it comes with a cost:
The Go runtime must keep track of every object that is allocated, leading to increased performance overhead.
In certain scenarios, such as when an HTTP server processes requests with large protobuf blobs (which contain many small objects), this can result in the Go runtime spending a significant amount of time tracking each of those individual allocations, and then deallocating them.
As a result this also causes signicant performance overhead.
Arenas offer a solution to this problem, by reducing the overhead associated with many smaller allocations. In this protobuf blob example, a large chunk of memory (an arena) can be allocated before parsing enabling all parsed objects to then be placed within the arena and tracked as a collective unit.
Once parsing is completed, the entire arena can be freed at once, further reducing the overhead of freeing many small objects.
Sandwich view is a mode of viewing flamegraphs popularized by Jamie Wong in the Speedscope project
It's function is relatively simple -- the typical flamegraph will break down resource utilization by function,
but it can be difficult to see how much time is spent in the function itself vs how much time is spent in the functions it calls.
Sandwich view solves this problem by splitting a flamegraph into two sections:
callers: the functions that called the function in question (it's "parents")
callees: the functions that the function in question called (it's "children")
Finding performance issues with standard Flamegraph mode#
A typical use case for leveraging flamegraphs is to identify opportunities for optimization. With a typical flamegraph the most
common workflow is to identify a function node which has the largest width and then to look at the functions it calls to see if
there are any low hanging fruit for optimization.
For example, in the flamegraph below, we can see that the rideshare/car.OrderCar is the largest function in terms of width and
thus a good place to start looking for opportunities for optimization.
If you want to try it out simply go to your Pyroscope UI or upload a flamegraph to flamegraph.com and select the "sandwich" view icon in the new Flamegraph toolbar:
then select a function to see it's callers and callees. We have many more view modes planned for the future, so stay tuned or let us know what you'd like to see!
At its root, eBPF takes advantage of the kernel’s privileged ability to oversee and control the entire system.
With eBPF you can run sandboxed programs in a privileged context such as the operating system kernel.
To better understand the implications and learn more check out this blog post
which goes into much more detail. For profiling this typically means running a program that pulls stacktraces
for the whole system at a constant rate (e.g 100Hz).
As you can see in the diagram, some of the most popular use cases for eBPF are related to Networking, Security,
and most relevant to this blog post — observability (logs, metrics, traces, and profiles).
Over the past few years there has been significant growth in the profiling space as well as the eBPF space and
there are a few notable companies and open source projects innovating at the intersection of profiling and eBPF. Some examples include:
The collective growth is representative of the rapidly growing interest in this space as projects like Pyroscope, Pixie, and Parca all gained a significant amount of traction over this time period.
It's also worth noting that the growth of profiling is not limited to eBPF, the prevalence of profiling tools has grown
to the point where it is now possible to find a tool for almost any language or runtime. As a result, profiling
is more frequently being considered a first-class citizen in observability suites.
For example, OpenTelemetry has kicked off
efforts to standardize profiling
in order to enable more effective observability. For more information on those efforts check out
the #otel-profiling channel on the CNCF slack!
When it comes to modern continuous profiling, there are two ways of getting profiling data:
User-space level: Popular profilers like pprof, async-profiler, rbspy, py-spy, pprof-rs, dotnet-trace, etc. operate at this level
Kernel level: eBPF profilers and linux perf are able to get stacktraces for the whole system from the kernel
Pyroscope is designed to be language agnostic and supports ingesting profiles originating from either or both of these methods.
However, each approach comes with its own set of pros and cons:
Pros and Cons of native-language profiling
Pros
Ability tag application code in flexible way (i.e, tagging spans, controllers, functions)
Ability to profile various specific parts of code (i.e. Lambda functions, Test suites, scripts)
Ability/simplicity to profile other types of data (i.e. Memory profiling, goroutines)
Consistency of access to symbols across all languages
Simplicity of using in local development
Cons
Complexity, for large multi-language systems, to get fleet-wide view
Constraints on ability to auto-tag infrastructure metadata (i.e. kubernetes)
Pros and Cons of eBPF profiling
Pros
Ability to get fleet-wide, whole-system metrics easily
Ability to auto-tag metadata that's available when profiling whole system (i.e. kubernetes pods, namespaces)
Simplicity of adding profiling at infrastructure level (i.e. multi-language systems)
Consistency of access to symbols across all languages
Simplicity of using in local development
Cons
Requirements call for particular linux kernel versions
Constraints on being able to tag user-level code
Constraints on performant ways to retrieve certain profile types (i.e. memory, goroutines)
Difficulty of developing locally for developers
Pyroscope's solution: Merge eBPF profiling and native-language profiling#
We believe that there's benefits to both eBPF and native-language profiling and our focus long term is to integrate them together seamlessly in Pyroscope.
The cons of eBPF profiling are the pros of native-language profiling and vice versa. As a result, the best way to get the most value out of profiling
itself is to actually combine the two.
Profiling compiled languages (Golang, Java, C++, etc.)#
When profiling compiled languages, like Golang, the eBPF profiler is able to get very similar information to the non-eBPF profiler.
Profiling interpreted languages (Ruby, Python, etc.)#
With interpreted languages like Ruby or Python, stacktraces in their runtimes are not easily accessible from the kernel.
As a result, the eBPF profiler is not able to parse user-space stack traces for interpreted languages. You can see how
the kernel interprets stack traces of compiled languages (go) vs how the kernel interprets stack traces from interpreted languages (ruby/python)
in the examples below.
Using Pyroscope's auto-tagging feature in the eBPF integration you can get a breakdown of cpu usage by kubernetes metadata.
In this case, we can see which namespace is consuming the most cpu resources for our demo instance
after adding Pyroscope with two lines of code:
# Add Pyroscope eBPF integration to your kubernetes clusterhelm repo add pyroscope-io https://pyroscope-io.github.io/helm-charthelm install pyroscope-ebpf pyroscope-io/pyroscope-ebpf
and you can also see the flamegraph representing CPU utilization for the entire cluster:
Internally, we use a variety of integrations to get both a high level overview of what's going on in our cluster, but also a very detailed view for each runtime that we use:
We use our eBPF integration for our kubernetes cluster
We us our otel-profiling integrations (go, java) to get span-specific profiles inside our traces
We use our lambda extension to profile the code inside lambda functions
The next evolution: merging kernel and user-space profiling#
With the help of our community we've charted out several promising paths to improving our integrations by merging the eBPF
and user-space profiles within the integration.
One of the most promising approaches is using:
Non-eBPF language-specific integrations for more granular control and analytic capabilities (i.e. dynamic tags and labels)
eBPF integration for a comprehensive view of the whole cluster
Stay tuned for more progress on these efforts. In them meantime check out the docs to get started with
eBPF or the other integrations!
We started Pyroscope a few years ago, because we had seen, first-hand, how profiling was a powerful tool for improving performance, but at the time profiling was also not very user-friendly.
Since then, we've been working hard not only to make Pyroscope easier to use, but also to make it easier to get value out of it.
As it stands today, Pyroscope has evolved to support an increasingly wide array of our community's day-to-day workflows by adding a valuable extra dimension to their observability stack:
Application Developers
Resolve spikes / increases in CPU usage
Locate and fix memory leaks and memory errors
Understand call trees of your applications
Clean up unused / dead code
SREs
Create performance-driven culture in dev cycle
Spot performance regressions in codebase proactively
As our community has grown to include this diverse set of companies, users, and use-cases, we've had more people express interest in getting all the value from using Pyroscope, but without some of the costs that come with maintaining and scaling open source software.
Some of the other reasons we decided to build a cloud service include:
Companies who have less time/resources to dedicate to setting up Pyroscope
Companies operating at scale who need an optimized solution that can handle the volume of data that is produced by profiling applications at scale
Users who are less technical and want a solution that's easy to use and requires little to no configuration
Users who want access to the latest features and bug fixes as soon as they are released (with zero downtime)
Users who want additional access to the Pyroscope team's profiling expertise and support (past our community Slack and GitHub)
And from our side, we believe that a cloud product will:
Making it easier for more companies to adopt Pyroscope
Providing more feedback to help prioritize features on our road map
Providing more resources to invest in Pyroscope's open source projects
Making it easier to offer integrations with other tools in the observability stack (e.g. Grafana, Honeycomb, Gitlab, Github, etc.)
Plus, we got to solve a lot of really cool challenges along the way!
Today we are excited to announce the general availability of Pyroscope Cloud, our hosted version of Pyroscope!
Pyroscope Cloud enables you to achieve your observability goals by removing concerns around setup, configuration, and scaling.
It's designed to be easy to use and gives you a significant amount of insight into your application's performance with very minimal configuration.
Some notable features for the cloud include:
Horizontal scalability
Support for high-cardinality profiling data
Zero-downtime upgrades
Data encryption at rest and in transit
Compliance with SOC 2
Extra support options beyond public Slack / Github
Similar to Pyroscope Open Source Software(OSS), the cloud service is designed to store, query, and analyze profiling data as efficiently as possible. However, certain limitations that fundamentally limit the scalability of Pyroscope OSS (for now) have been removed in Pyroscope Cloud.
When running Pyroscope OSS at scale, eventually people run into the limitations of the Open Source storage engine. It is built around badgerDB, which is an embeddable key-value database written in Go. The reliance on this component makes the OSS version of Pyroscope scale vertically but not horizontally.
In the cloud, we replace BadgerDB with a distributed key-value store which allows more freedom to scale Pyroscope horizontally. We leverage many of the techniques used by Honeycomb and many Grafana projects (i.e. Loki, Tempo) but with particular adjustments made for the unique requirements of profiling data (stay tuned for future blog post on this).
This means that with Pyroscope Cloud you don't need to worry about limiting the number of applications, profiles, and tag cardinality that you need in order to get the most out of Pyroscope!
This feature used for profiling your applications across various environments. Most agents have support for tags which will let you differentiate between different environments (e.g. staging vs production) and other metadata (e.g. pod, namespace, region, version, git commit, pr, etc.). Using single, comparison, or diff view in combination with these tags will let you easily understand and debug performance issues.
Which tags are consuming the most cpu and memory resources?
How did the performance of my application change between versions?
What is our most/least frequently used code paths?
This feature is used for when you may want to profile something in a more adhoc manner. Most commonly people use this feature to either upload previously recorded profiles or save a profile for a particular script. Where many used to save profiles to a random folder on their computer, they can use our adhoc page to store them more efficiently, share them with others, and view them with the same powerful UI that they use for continuous profiling.
Tracing exemplars for transaction-level visibility#
This feature is used for when you want to correlate profiling data with tracing data. While traces often will tell you where your application is running slow, profiling gives more granular detail into why and what particular lines of code are responsible for the performance issues. This view gives you a heatmap of span durations. We also have integrations with a few popular tracing tools:
Profile upload API for automated workflows and migrations#
Over time we've found that some of the major companies in various sectors have built their own internal profiling systems which often will ultimately dump a large collection of profiles into some storage system like S3.
Pyroscope Cloud's API is built to accept many popular formats for profiling data and then store them in a way that is optimized for querying and analysis. This means that you can redirect your existing profiling data to Pyroscope Cloud and then use the same UI that you use for continuous profiling to analyze it.
Pprof Ingestion
JFR Ingestion
Collapsed Ingestion
# First get API_TOKEN from https://pyroscope.cloud/settings/api-keys# Ingest profiles in pprof formatcurl\ -H "Authorization: Bearer $API_TOKEN"\ --data-binary @cpuprofile.pb.gz \"https://pyroscope.cloud/ingest?format=pprof&from=1680290810&until=1680290820&name=my-app-name-pprof"
In order to migrate from Pyroscope OSS to Pyroscope Cloud, you can use our remote write feature to send your data to Pyroscope Cloud. This will allow you to continue using Pyroscope OSS while you migrate your data to Pyroscope Cloud.
You can also get started directly with Pyroscope Cloud, by signing up for a free account at pyroscope.cloud.
CI/CD Integrations (GitHub, GitLab, CircleCI, etc.): We've heard many using Pyroscope to profiling their testing suite and we have plans (link) for a UI specifically geared towards analyzing this data
AWS lambda is a popular serverless computing service that allows you to write code in any language and run it on AWS.
In this case "serverless" means that rather than having to manage a server or set of servers, you can instead run your code on-demand on highly-available machines in the cloud.
Lambda manages your "serverless" infrastructure for you including:
However, the tradeoff that happens as a result of using AWS Lambda is that because AWS handles so much of the infrastructure and management for you, it ends up being somewhat of a "black box" with regards to:
Cost: You have little insight into why your Lambda function is costs so much or what functions are responsible
Performance: you often run into hard-to-debug latency or memory issues when running your Lambda function
Reliability: You have little insight into why your Lambda function is failing as much as its failing
Depending on availability of resources, these issues can be balloon over time until they become an expensive foundation which is hard to analyze and fix post-facto once much of your infrastructure relies on these functions.
Continuous Profiling for Lambda: A window into the serverless "black box" problem#
Continuous Profiling is a method of analyzing the performance of an application giving you a breakdown of which lines of code are consuming the most CPU or memory resources and how much of each resource is being consumed.
Since, by definition, a Lambda function is a collection of many lines of code which consume resources (and incur costs) on demand, it makes sense that profiling is the perfect tool to use to understand how you can optimize your Lambda functions and allocate resources to them.
While you can already use our various language-specific integrations to profile your Lambda functions, with the naive approach, adding Pyroscope will add extra overhead to the critical path due to how the Lambda Execution Lifecycle works:
However, we've introduced a more optimal solution which gives you insight into the Lambda "black box" without adding extra overhead to the critical path of your Lambda Function: our Pyroscope AWS Lambda Extension.
Pyroscope Lambda extension adds profiling support without impacting critical path performance#
This solution makes use of the extension to delegate profiling-related tasks to an asynchronous path which allows for the critical path to continue to run while the profiling related activities are being performed in the background.
You can then use the Pyroscope UI to dive deeper into the various profiles and make the necessary changes to optimize your Lambda function!
How to add Pyroscope's Lambda Extension to your Lambda Function#
Pyroscope's lambda extension works with our various agents and documentation on how to integrate with those can be found in the integrations section of our documentation.
Once you've added the agent to your code, there are just two steps needed to get up and running with profiling lambda extension:
Stop screenshotting Flamegraphs and start embedding them#
Typically a flamegraph is most useful when you're able to click into particular nodes or stack traces to understand the program more deeply.
After several blog posts where we featured flamegraphs as a key piece of the posts we found that screenshotting pictures of
flamegraphs was missing this key functionality compared to being able to interact with flamegraphs.
As a result, we created flamegraph.com to have a place where users can upload, view, and share flamegraphs.
We recently released an update to flamegraph.com that makes it easy to embed flamegraphs in your blog or website. The steps to embed a flamegraph are:
Upload a flamegraph or flamegraph diff to flamegraph.com
Click the "Embed" button
Copy the "Copy" button to copy the embed code snippet
Paste the embed code snippet into your blog or website
Grafana is an open-source observability and monitoring platform used by individuals and organizations to monitor their applications and infrastructures. Grafana leverages the three pillars of observability, metrics, logs, and traces, to deliver insights into how well your systems are doing. Nowadays, Observability involves a whole lot more than metrics, logs, and tracing; it also involves profiling.
In this article, I will:
Describe how to leverage continuous profiling in Grafana by installing the Pyroscope flamegraph panel and datasource plugin
Show how to configure the plugins properly
Explain how to setup your first dashboard that includes profiling
Give a sneak peak of an upcoming feature that will let you link profiles to logs, metrics, and traces
If you're new to flamegraphs and would like to learn more about what they are and how to use them, see this blog post.
Grafana provides you with tools to visualize your metrics, view logs, and analyze traces, but it is incomplete without the added benefits of profiling. Continuous Profiling is super critical when you’re looking to debug an existing performance issue in your application. It enables you to monitor your application’s performance over time and provides insights into parts of your application that are consuming resources the most. Continuous profiling is used to locate and fix memory leaks, clean up unused code, and understand the call tree of your application. This results in a more efficient application.
Using Pyroscope in Grafana provides you with complete observability without leaving your Grafana dashboard. Grafana leverages the powerful features of Pyroscope to take complete control of your application’s end-to-end observability and makes things like debugging easy. You can now see your profiles alongside corresponding logs, metrics, and traces to tell the complete story of your application.
It costs nothing to migrate your application profile from Pyroscope’s UI dashboard into Grafana. Simply open Grafana and install both the Pyroscope panel and datasource plugin, and you’re all set!
A flamegraph is a complete visualization of hierarchical data (e.g stack traces, file system contents, etc) with a metric, typically resource usage, attached to the data. Flamegraphs were originally invented by Brendan Gregg. He was inspired by the inability to view, read, and understand stack traces using the regular profilers to debug performance issues. The flamegraph was created to fix this exact problem.
Flamegraphs allow you to view a call stack in a much more visual way. They give you insight into your code performance and allow you to debug efficiently by drilling down to the origin of a bug, thereby increasing the performance of your application.
Coming from a background working as a frontend developer at Grafana I'm no stranger to open source performance monitoring. I was part of a team that was responsible for the overall user experience of Grafana and performance was one of the key considerations. Along the line, I learned about a debugging technique known as profiling for monitoring application performance and fell in love ever since.
“Profiling” is a dynamic method of analyzing the complexity of a program, such as CPU utilization or the frequency and duration of function calls. With profiling, you can locate exactly which parts of your application are consuming the most resources. “Continuous profiling” is a more powerful version of profiling that adds the dimension of time. By understanding your system's resources over time, you can then locate, debug, and fix issues related to performance.
As a frontend developer, my experience with profiling was limited to the browser. However, in the course of my study, I discovered a new pattern of profiling that seems exciting– continuous profiling. Similar to how you use the profiler in the dev console to understand frontend performance issues, continuous profiling allows you to profile servers from various languages 24/7 and be able to understand resource usage at any particular time.
Since then, many major performance monitoring solutions have joined them in releasing continuous profiling products. As time has gone on, the continuous profiling space has been getting increasingly popular as various companies/VCs are more frequently making major investments in Continuous Profiling to keep up with demand for this type of monitoring.