Skip to main content

· 7 min read

Go-Gopher-PSD-With-larger-Newspaper_01

If you're of the opinion that AI isn't going to solve all the world's problems, you're probably right. And if you think it could, well, you might be onto something too.

In the world of software engineering, particularly observability, where complexity often outpaces comprehension, AI isn't just a fancy tool; it's becoming an inevitable necessity.

During the Grafana Hackathon, the Pyroscope team embraced this reality. We tackled a genuine challenge head-on with AI: making flamegraph analysis intuitive, even for those new to profiling.

We've seen that AI excels at tasks like language interpretation. So, why not leverage it to demystify flamegraphs? This led us to our Hackathon mission:

Demonstrate how AI can transform the user experience in analyzing and interpreting flamegraphs.

We did this by giving humans and AI the same flamegraph and asking them to interpret it as shown below. We then compared the results and analyzed the differences.

A Direct Challenge: How Well Can You Interpret This Flamegraph?#

Before diving into AI’s capabilities, let's set a baseline. Here’s a flamegraph for you to analyze. See if you can answer these key profiling questions:

  1. Performance Bottleneck: What's causing the slowdown?
  2. Root Cause: Why is it happening?
  3. Recommended Fix: How would you resolve it?
Frame width represents CPU time per function
Pyroscope

This is where many face challenges, particularly beginners. But understanding these flamegraphs is key to simplifying the code that powers them.

AI’s Flamegraph Interpretation: An Eye-Opening Comparison#

Now for the fun part: let's see how our AI interprets this same flamegraph. We use a prompt that is roughly equivalent to the questions above.

Click the button to see the AI's interpretation of the flamegraph below#

Frame width represents CPU time per function
Pyroscope

It was given a similar prompt to the questions above:

interpret this flamegraph for me and answer the following three questions:- **Performance Bottleneck**: What's slowing things down?- **Root Cause**: Why is this happening?- **Recommended Fix**: How can we resolve it?
[ ... specially compressed flamegraph data ]

How does its analysis stack up against yours? Statistically it probably did either better or worse than you (obviously)...

Bots vs. Brains: Who's better at Flamegraph Interpretation?#

We didn't stop at theoreticals. We put AI to a real-world test, sending the same flamegraph to a diverse group of individuals, categorizing them by their expertise in flamegraph analysis, and comparing their responses to AI's.

Distribution of participants by skill level:#

image

The Results Are In: AI is better than (most) humans at interpreting flamegraphs#

image

  • Flamegraph Experts: Score: 83% passed. They demonstrated high accuracy and detailed understanding, quickly pinpointing issues and interpreting them correctly

  • Flamegraph Advanced: Score: 70% passed. Their responses varied, some were spot on while others often didn't dig far enough into the flamegraph to identify the root cause

  • Non-Technical Professionals: Score: 23% passed. This group most frequently took thie idk option on especially the root cause and recommended fix question -- some very entertaining guesses though!

  • AI Interpreter: Pass Score: 100% (10 iterations with same prompt). The AI consistently outperformed beginners and advanced users, providing accurate, albeit less detailed/nuanced, interpretations than the experts.

These initial results at least point towards a great opportunity in adding value to most users by incorporating AI.

We will definitely be exploring this further via more formal testing and would love to hear your thoughts on this as well -- where do you see it fitting in best to your workflow?

AI in Pyroscope: A clear win for UX#

While we still have the rest of the week to tweak our project for the Hackathon, what we've learned so far is that AI's strength lies not just in analysis but more importantly in "filling the gaps" by augmenting and enhancing user experience no matter what level you're at.

It only takes using a flamegraph successfully once to really move from the beginner to advanced category. The thing is, from a product standpoint our biggest challenge has been building a user-experience that can span from beginner to expert and still be useful for both.

However, with just a little bit of prompt engineering we can use AI to bridge this gap and provide and endless array of tailored experiences for each user.

AI Tailored Responses for Diverse User Needs#

Explain the flamegraph to a beginner with no flamegraph knowledge...#

Explain the flamegraph in the form of a HN thread...#

Explain the flamegraph in the form of IT department humor...#

Explain the flamegraph in the form of a poem...#

Explain the flamegraph in the form of a Michael Scott quote...#

Whether you're a beginner, an expert, or just someone bored at work trying to find an entertaining way to do an otherwise boring task -- all it takes is a little prompt engineering to tune the experience to your liking.

With just a few lines of code we can adjust these prompts (or allow users to adjust them) to guide everyone from beginner to expert.

Your Turn: Test AIs analysis of your code!#

It's worth noting that, the flamegraph used for this post is a current representation of our distributor. As you can see, our distributors have bottlnecks in two notorious tasks that most companies are likely not doing perfectly either: regular expressions which is technically the bottlneck, but compression/parsing is also an acceptable answer for bottlneck (fixes comoing soon!).

However, while AI was successful in analyzing this particular flamegraph, there are probably 51,331,542 more cases where AI fails (or succeeds) spectacularly and we'd love for you to find and share real examples with us!

How to use our AI-powered flamegraph interpreter on your own flamegraphs#

Option 1: Upload pprof to Flamegraph.com#

Upload pprof file containing one flamegraph to flamegraph.com and click the "ai explainer" button. You can get a pprof file from most easily from Go runtime, but Pyroscope exports pprof from all languages via the export button.

Note: You do have to export pprof format specifically and upload to flamegraph.com separately; exporting directly to flamegraph.com via the flamegraph.com button will not (yet) work

image

Option 2: View flamegraph in Grafana Cloud (coming soon):#

Send profiling data to Grafana Cloud Profiles and look at the Cloud profiles app (you can sign up for a free account) and click the "ai explainer" button

image

Option 3: Wait a bit...#

Wait until we release an open source version of this tool (pending positive feedback from the community)

All feedback / ideas welcome!#

Let us know the good, bad and ugly of your experience with AI-powered flamegraph analysis:

  • On Twitter: @PyroscopeIO and let the world know how you're using AI in flamegraph analysis.
  • On Slack: Join the conversation in our community channel
  • On GitHub: Feel free to add to the discussion to share detailed feedback or suggest enhancements

Appendix#

In case you were wondering how we clasified peopel into groups we had them self categorize on the following scale:

  1. Flamegraph Experts: Comprising experienced software engineers and DevOps engineers who have used flamegraphs before
  2. Flamegraph Advanced: Junior developers, technical support staff, frontend engineers, or other engineers new to Pyroscope or continuous profiling tools
  3. Non-Technical Professionals: From sales, marketing, HR, and project management, know their way around observability tooling but seldom if ever directly involved in coding or debugging

· 4 min read

pyroscope_cloud

Greetings Pyroscope Community!

We have some exciting news to share with you today. As you know, Pyroscope has always been committed to delivering top-notch continuous profiling solutions, empowering developers to optimize their code and infrastructure for enhanced performance. Today, we have a game-changing update to announce.

We are thrilled to inform you that Pyroscope Cloud will no longer be accepting new signups. Instead, we invite you to join us on a new and improved platform that will revolutionize the way you approach profiling: Pyroscope within Grafana Cloud!

Why the merge? By joining forces with Grafana, we are uniting the strengths of Pyroscope and Phlare to accelerate both the adoption and the value of continuous profiling.

With Grafana Cloud Profiles, the Pyroscope project has integrated into the Grafana ecosystem. The integration of Pyroscope and Phlare projects has resulted in a powerful and comprehensive profiling solution that goes beyond the capabilities of Pyroscope Cloud. By leveraging Grafana's expertise in data visualization and Grafana Cloud's managed observability stack, we are taking profiling to the next level expanding from 1 to 3 different vies for viewing profiling data:

Grafana Cloud Profiles app plugin view#

This is the view you're likely familiar with if you're a Pyroscope OSS or Pyroscope cloud user. This is now included in the Grafana UI as an app plugin. That being said, the ceiling for what is possible with this app plugin is significantly higher as part of Grafana ecosystem which is already connected to other data sources like logs, metrics, and traces. Stay tuned for more functionality of this app plugin that makes use of other datasources to enhance profiling data as well as overall insights.

app_plugin_view

Dashboard view#

This is a view that anyone who has used Grafana before is likely familiar with. However, now that you can add profiles natively to your dashboards you can get a much more comprehensive and real-time view of your profiling data alongside your other mission-critical dashboard items.

dashboard_view

Explore view#

Finally, we have the explore view which is convenient for making targeted queries to not only your profiling data, but also your other observability signals which you can now view side-by-side with profiles

explore_view

What to expect with Pyroscope in Grafana Cloud#

We're excited to take this next step in Pyroscope's journey, but want to do so in a way that is convient for our community. Here's what you can expect:

  • Seamless Transition: Existing Pyroscope Cloud users can continue to use pyroscope.cloud service without disruption. You will have ample time to migrate your profiles to the new platform at your convenience
  • Enhanced Scalability: Pyroscope is now horizontally scalable to accommodate workloads of any size, ensuring optimal performance for projects of all scales
  • Broad Language Support: The Pyroscope SDKs have been updated and standardized, allowing you to collect profiles from major languages such as Go, Python, Ruby, Java, eBPF, .NET, PHP, Node.js, and Rust
  • Robust Visualization Options: Grafana's data visualization expertise has enriched the frontend and UI of Pyroscope in Grafana Cloud, providing multiple ways to visualize and analyze profiling data. You'll have a comprehensive set of tools at your disposal, including the App plugin, Explore view, and Dashboard view.

To learn more about this exciting development, we encourage you to read the official announcement on the Grafana blog.

How to get started in Grafana Cloud Profiles#

Step 1: Log into your Grafana Cloud account. (If you don’t already have one, you can sign up for free. )

Step 2: Find Pyroscope in your stacks

pyroscope_stack

Step 3: Follow the client instructions on how to send profiles from your application

Whats next for Grafana Cloud Profiles?#

By converging the efforts of Pyroscope and Grafana, we are excited to advance the fields of application profiling and observability. We've seen how by optimizing resource utilization and gaining a deeper understanding of your systems through profiling, you can deliver better software faster.

As always, your feedback and input are invaluable to us. Please share your success stories, feature requests, and any other insights you may have.

Stay tuned for more exciting updates and developments on Pyroscope within Grafana Cloud. We can't wait to embark on this profiling journey with you!

Happy profiling!

· 9 min read

thumbnail_image

Profiling today looks very different than it did just a few years ago. As people move to more cloud-native workloads continuous profiling has evolved into a key piece of many companies' observability suites. At Pyroscope, we've been a huge part of this evolution thanks to an ever-expanding community who has provided great insight into the use cases where profiling is most valuable and how we can continue to improve that experience.

As a result, over the past few years we've released several products to help developers improve their applications' performance.

  1. Continuous Profiling: Our most popular a tool for continuously profiling your applications accross your entire system and then storing and querying that data efficiently

  2. Adhoc Profiling: In cases where you may not need to constantly profile or perhaps you'd like to save a snapshot of a profile, our adhoc tool allows you to both capture and save specific profiles to later refer to or use for your convenience

  3. Profiling Exemplars: Profiles linked to particular meaningful units such as HTTP requests or trace spans

Introducing CI Profiling#

Now, we're excited to announce the latest addition to the Pyroscope family - CI Profiling. Continuous Integration and Delivery (CI/CD) pipelines are critical for modern software development, but they can also be a source of frustration and inefficiency. Waiting for long test runs, dealing with frequent failures and timeouts, and wasting resources are all common problems associated with CI/CD pipelines. These issues can be compounded when multiple developers are working on the same codebase or when teams are working across multiple repositories. That's why we've developed this new feature that can help:

Continuous Profiling with Pyroscope in your CI/CD pipelines.

· 9 min read
Dmitry Filimonov
ChatGPT

Go 1.20 Experiment with Memory Arenas

caution

Go arenas are an experimental feature. The API and implementation is completely unsupported and go team makes no guarantees about compatibility or whether it will even continue to exist in any future release.

See this Github discussion for more details.

Introduction#

Go 1.20 introduces an experimental concept of "arenas" for memory management, which can be used to improve the performance of your Go programs. In this blog post, we'll take a look at:

  • What are arenas
  • How do they work
  • How can you determine if your programs could benefit from using arenas
  • How we used arenas to optimize one of our services

What Are Memory Arenas?#

Go is a programming language that utilizes garbage collection, meaning that the runtime automatically manages memory allocation and deallocation for the programmer. This eliminates the need for manual memory management, but it comes with a cost:

The Go runtime must keep track of every object that is allocated, leading to increased performance overhead.

In certain scenarios, such as when an HTTP server processes requests with large protobuf blobs (which contain many small objects), this can result in the Go runtime spending a significant amount of time tracking each of those individual allocations, and then deallocating them. As a result this also causes signicant performance overhead.

Arenas offer a solution to this problem, by reducing the overhead associated with many smaller allocations. In this protobuf blob example, a large chunk of memory (an arena) can be allocated before parsing enabling all parsed objects to then be placed within the arena and tracked as a collective unit.

Once parsing is completed, the entire arena can be freed at once, further reducing the overhead of freeing many small objects.

arenas drawio2

· 2 min read
Example banner

What is sandwich view?#

Sandwich view is a mode of viewing flamegraphs popularized by Jamie Wong in the Speedscope project It's function is relatively simple -- the typical flamegraph will break down resource utilization by function, but it can be difficult to see how much time is spent in the function itself vs how much time is spent in the functions it calls. Sandwich view solves this problem by splitting a flamegraph into two sections:

  • callers: the functions that called the function in question (it's "parents")
  • callees: the functions that the function in question called (it's "children")

Finding performance issues with standard Flamegraph mode#

A typical use case for leveraging flamegraphs is to identify opportunities for optimization. With a typical flamegraph the most common workflow is to identify a function node which has the largest width and then to look at the functions it calls to see if there are any low hanging fruit for optimization. For example, in the flamegraph below, we can see that the rideshare/car.OrderCar is the largest function in terms of width and thus a good place to start looking for opportunities for optimization.

Frame width represents CPU time per function
Pyroscope

Finding performance issues with sandwich view Flamegraph mode#

However, you'll also notice that Time.Since() shows up frequently towards the end of almost every path.

Example banner

Sandwich view helps you focus in on functions like this to analyze your application and determine if it's easier to optimize:

  • Time.Since(): a node with a shorter width that gets called frequently across many code paths discovered with sandwich view
  • rideshare/car.OrderCar: a node with a longer width, that gets called infrequently in a single code path discovered with standard flamegraph view

How to use sandwich view for your flamegraphs#

If you want to try it out simply go to your Pyroscope UI or upload a flamegraph to flamegraph.com and select the "sandwich" view icon in the new Flamegraph toolbar:

Example banner

then select a function to see it's callers and callees. We have many more view modes planned for the future, so stay tuned or let us know what you'd like to see!

· 6 min read

What is eBPF?#

At its root, eBPF takes advantage of the kernel’s privileged ability to oversee and control the entire system. With eBPF you can run sandboxed programs in a privileged context such as the operating system kernel. To better understand the implications and learn more check out this blog post which goes into much more detail. For profiling this typically means running a program that pulls stacktraces for the whole system at a constant rate (e.g 100Hz).

image

As you can see in the diagram, some of the most popular use cases for eBPF are related to Networking, Security, and most relevant to this blog post — observability (logs, metrics, traces, and profiles).

Landscape of eBPF profiling#

Over the past few years there has been significant growth in the profiling space as well as the eBPF space and there are a few notable companies and open source projects innovating at the intersection of profiling and eBPF. Some examples include:

The collective growth is representative of the rapidly growing interest in this space as projects like Pyroscope, Pixie, and Parca all gained a significant amount of traction over this time period.

It's also worth noting that the growth of profiling is not limited to eBPF, the prevalence of profiling tools has grown to the point where it is now possible to find a tool for almost any language or runtime. As a result, profiling is more frequently being considered a first-class citizen in observability suites.

For example, OpenTelemetry has kicked off efforts to standardize profiling in order to enable more effective observability. For more information on those efforts check out the #otel-profiling channel on the CNCF slack!

Pros and cons of eBPF and non-eBPF profiling#

When it comes to modern continuous profiling, there are two ways of getting profiling data:

  • User-space level: Popular profilers like pprof, async-profiler, rbspy, py-spy, pprof-rs, dotnet-trace, etc. operate at this level
  • Kernel level: eBPF profilers and linux perf are able to get stacktraces for the whole system from the kernel

Pyroscope is designed to be language agnostic and supports ingesting profiles originating from either or both of these methods.

However, each approach comes with its own set of pros and cons:

Pros and Cons of
native-language profiling

Pros

  • Ability tag application code in flexible way (i.e, tagging spans, controllers, functions)
  • Ability to profile various specific parts of code (i.e. Lambda functions, Test suites, scripts)
  • Ability/simplicity to profile other types of data (i.e. Memory profiling, goroutines)
  • Consistency of access to symbols across all languages
  • Simplicity of using in local development

Cons

  • Complexity, for large multi-language systems, to get fleet-wide view
  • Constraints on ability to auto-tag infrastructure metadata (i.e. kubernetes)

Pros and Cons of
eBPF profiling

Pros

  • Ability to get fleet-wide, whole-system metrics easily
  • Ability to auto-tag metadata that's available when profiling whole system (i.e. kubernetes pods, namespaces)
  • Simplicity of adding profiling at infrastructure level (i.e. multi-language systems)
  • Consistency of access to symbols across all languages
  • Simplicity of using in local development

Cons

  • Requirements call for particular linux kernel versions
  • Constraints on being able to tag user-level code
  • Constraints on performant ways to retrieve certain profile types (i.e. memory, goroutines)
  • Difficulty of developing locally for developers

Pyroscope's solution: Merge eBPF profiling and native-language profiling#

We believe that there's benefits to both eBPF and native-language profiling and our focus long term is to integrate them together seamlessly in Pyroscope. The cons of eBPF profiling are the pros of native-language profiling and vice versa. As a result, the best way to get the most value out of profiling itself is to actually combine the two.

Profiling compiled languages (Golang, Java, C++, etc.)#

When profiling compiled languages, like Golang, the eBPF profiler is able to get very similar information to the non-eBPF profiler.

Frame width represents CPU time per function
Pyroscope

Profiling interpreted languages (Ruby, Python, etc.)#

With interpreted languages like Ruby or Python, stacktraces in their runtimes are not easily accessible from the kernel. As a result, the eBPF profiler is not able to parse user-space stack traces for interpreted languages. You can see how the kernel interprets stack traces of compiled languages (go) vs how the kernel interprets stack traces from interpreted languages (ruby/python) in the examples below.

Frame width represents CPU time per function
Pyroscope

How to use eBPF for cluster level profiling#

Using Pyroscope's auto-tagging feature in the eBPF integration you can get a breakdown of cpu usage by kubernetes metadata. In this case, we can see which namespace is consuming the most cpu resources for our demo instance after adding Pyroscope with two lines of code:

# Add Pyroscope eBPF integration to your kubernetes clusterhelm repo add pyroscope-io https://pyroscope-io.github.io/helm-charthelm install pyroscope-ebpf pyroscope-io/pyroscope-ebpf

image

and you can also see the flamegraph representing CPU utilization for the entire cluster: image

Internally, we use a variety of integrations to get both a high level overview of what's going on in our cluster, but also a very detailed view for each runtime that we use:

  • We use our eBPF integration for our kubernetes cluster
  • We use ruby gem, pip package, go client, and java client with tags for our k8s services and github action test suites
  • We us our otel-profiling integrations (go, java) to get span-specific profiles inside our traces
  • We use our lambda extension to profile the code inside lambda functions

The next evolution: merging kernel and user-space profiling#

With the help of our community we've charted out several promising paths to improving our integrations by merging the eBPF and user-space profiles within the integration. One of the most promising approaches is using:

  • Non-eBPF language-specific integrations for more granular control and analytic capabilities (i.e. dynamic tags and labels)
  • eBPF integration for a comprehensive view of the whole cluster

Screen Shot 2022-09-27 at 8 49 41 PM

Stay tuned for more progress on these efforts. In them meantime check out the docs to get started with eBPF or the other integrations!

· 7 min read

floating_cloud_01

We started Pyroscope a few years ago, because we had seen, first-hand, how profiling was a powerful tool for improving performance, but at the time profiling was also not very user-friendly. Since then, we've been working hard not only to make Pyroscope easier to use, but also to make it easier to get value out of it.

As it stands today, Pyroscope has evolved to support an increasingly wide array of our community's day-to-day workflows by adding a valuable extra dimension to their observability stack:

Application Developers

  • Resolve spikes / increases in CPU usage
  • Locate and fix memory leaks and memory errors
  • Understand call trees of your applications
  • Clean up unused / dead code

SREs

  • Create performance-driven culture in dev cycle
  • Spot performance regressions in codebase proactively
  • Configure monitoring / alerts for infrastructure
  • Optimize test suites and team productivity

Engineering Managers

  • Get real-time analysis of resource utilization
  • Make efficient cost allocations for resources
  • Use insights for better decision making

Why we built a cloud service#

As our community has grown to include this diverse set of companies, users, and use-cases, we've had more people express interest in getting all the value from using Pyroscope, but without some of the costs that come with maintaining and scaling open source software. Some of the other reasons we decided to build a cloud service include:

  • Companies who have less time/resources to dedicate to setting up Pyroscope
  • Companies operating at scale who need an optimized solution that can handle the volume of data that is produced by profiling applications at scale
  • Users who are less technical and want a solution that's easy to use and requires little to no configuration
  • Users who want access to the latest features and bug fixes as soon as they are released (with zero downtime)
  • Users who want additional access to the Pyroscope team's profiling expertise and support (past our community Slack and GitHub)

And from our side, we believe that a cloud product will:

  • Making it easier for more companies to adopt Pyroscope
  • Providing more feedback to help prioritize features on our road map
  • Providing more resources to invest in Pyroscope's open source projects
  • Making it easier to offer integrations with other tools in the observability stack (e.g. Grafana, Honeycomb, Gitlab, Github, etc.)

Plus, we got to solve a lot of really cool challenges along the way!

Introducing Pyroscope Cloud#

Today we are excited to announce the general availability of Pyroscope Cloud, our hosted version of Pyroscope!

Pyroscope Cloud enables you to achieve your observability goals by removing concerns around setup, configuration, and scaling. It's designed to be easy to use and gives you a significant amount of insight into your application's performance with very minimal configuration.

Some notable features for the cloud include:

  • Horizontal scalability
  • Support for high-cardinality profiling data
  • Zero-downtime upgrades
  • Data encryption at rest and in transit
  • Compliance with SOC 2
  • Extra support options beyond public Slack / Github
  • Tracing integrations (Honeycomb and Jaeger)

Pyroscope Cloud's Major Scaling Improvements#

Similar to Pyroscope Open Source Software(OSS), the cloud service is designed to store, query, and analyze profiling data as efficiently as possible. However, certain limitations that fundamentally limit the scalability of Pyroscope OSS (for now) have been removed in Pyroscope Cloud.

When running Pyroscope OSS at scale, eventually people run into the limitations of the Open Source storage engine. It is built around badgerDB, which is an embeddable key-value database written in Go. The reliance on this component makes the OSS version of Pyroscope scale vertically but not horizontally.

In the cloud, we replace BadgerDB with a distributed key-value store which allows more freedom to scale Pyroscope horizontally. We leverage many of the techniques used by Honeycomb and many Grafana projects (i.e. Loki, Tempo) but with particular adjustments made for the unique requirements of profiling data (stay tuned for future blog post on this).

This means that with Pyroscope Cloud you don't need to worry about limiting the number of applications, profiles, and tag cardinality that you need in order to get the most out of Pyroscope!

Pyroscope Cloud's Major Features#

We've built Pyroscope Cloud with several different use cases in mind:

Continuous profiling for system-wide visibility#

This feature used for profiling your applications across various environments. Most agents have support for tags which will let you differentiate between different environments (e.g. staging vs production) and other metadata (e.g. pod, namespace, region, version, git commit, pr, etc.). Using single, comparison, or diff view in combination with these tags will let you easily understand and debug performance issues.

Tag Explorer View

  • Which tags are consuming the most cpu and memory resources?
  • How did the performance of my application change between versions?
  • What is our most/least frequently used code paths?
  • Which libraries are consuming the most resources?
  • Where are memory leaks originating?
  • etc.

Adhoc profiling for deep-dive debugging#

This feature is used for when you may want to profile something in a more adhoc manner. Most commonly people use this feature to either upload previously recorded profiles or save a profile for a particular script. Where many used to save profiles to a random folder on their computer, they can use our adhoc page to store them more efficiently, share them with others, and view them with the same powerful UI that they use for continuous profiling.

ebpf adhoc diff

Tracing exemplars for transaction-level visibility#

This feature is used for when you want to correlate profiling data with tracing data. While traces often will tell you where your application is running slow, profiling gives more granular detail into why and what particular lines of code are responsible for the performance issues. This view gives you a heatmap of span durations. We also have integrations with a few popular tracing tools:

Tracing exemplars

Profile upload API for automated workflows and migrations#

Over time we've found that some of the major companies in various sectors have built their own internal profiling systems which often will ultimately dump a large collection of profiles into some storage system like S3.

Pyroscope Cloud's API is built to accept many popular formats for profiling data and then store them in a way that is optimized for querying and analysis. This means that you can redirect your existing profiling data to Pyroscope Cloud and then use the same UI that you use for continuous profiling to analyze it.

# First get API_TOKEN from https://pyroscope.cloud/settings/api-keys
# Ingest profiles in pprof formatcurl \  -H "Authorization: Bearer $API_TOKEN" \  --data-binary @cpuprofile.pb.gz \  "https://pyroscope.cloud/ingest?format=pprof&from=1704405833&until=1704405843&name=my-app-name-pprof"

How to get started with Pyroscope Cloud#

Migrating from Pyroscope OSS#

In order to migrate from Pyroscope OSS to Pyroscope Cloud, you can use our remote write feature to send your data to Pyroscope Cloud. This will allow you to continue using Pyroscope OSS while you migrate your data to Pyroscope Cloud. remote_write_diagram_01_optimized

You can also get started directly with Pyroscope Cloud, by signing up for a free account at pyroscope.cloud.

What's Next for Pyroscope Cloud#

  • CI/CD Integrations (GitHub, GitLab, CircleCI, etc.): We've heard many using Pyroscope to profiling their testing suite and we have plans (link) for a UI specifically geared towards analyzing this data
  • More integrations (Tempo, PagerDuty, etc.)
  • More features (Alerting, etc.)
  • More documentation (Tutorials, etc.)

· 4 min read

Profile AWS Lambda Functions

What is AWS Lambda?#

AWS lambda is a popular serverless computing service that allows you to write code in any language and run it on AWS. In this case "serverless" means that rather than having to manage a server or set of servers, you can instead run your code on-demand on highly-available machines in the cloud.

Lambda manages your "serverless" infrastructure for you including:

  • Server maintenance
  • Automatic scaling
  • Capacity provisioning
  • and more

AWS Lambda functions are a "black box"#

However, the tradeoff that happens as a result of using AWS Lambda is that because AWS handles so much of the infrastructure and management for you, it ends up being somewhat of a "black box" with regards to:

  • Cost: You have little insight into why your Lambda function is costs so much or what functions are responsible
  • Performance: you often run into hard-to-debug latency or memory issues when running your Lambda function
  • Reliability: You have little insight into why your Lambda function is failing as much as its failing

Depending on availability of resources, these issues can be balloon over time until they become an expensive foundation which is hard to analyze and fix post-facto once much of your infrastructure relies on these functions.

Continuous Profiling for Lambda: A window into the serverless "black box" problem#

Continuous Profiling is a method of analyzing the performance of an application giving you a breakdown of which lines of code are consuming the most CPU or memory resources and how much of each resource is being consumed. Since, by definition, a Lambda function is a collection of many lines of code which consume resources (and incur costs) on demand, it makes sense that profiling is the perfect tool to use to understand how you can optimize your Lambda functions and allocate resources to them.

While you can already use our various language-specific integrations to profile your Lambda functions, with the naive approach, adding Pyroscope will add extra overhead to the critical path due to how the Lambda Execution Lifecycle works: image

However, we've introduced a more optimal solution which gives you insight into the Lambda "black box" without adding extra overhead to the critical path of your Lambda Function: our Pyroscope AWS Lambda Extension.

Pyroscope Lambda extension adds profiling support without impacting critical path performance#

lambda_image_04-01

This solution makes use of the extension to delegate profiling-related tasks to an asynchronous path which allows for the critical path to continue to run while the profiling related activities are being performed in the background. You can then use the Pyroscope UI to dive deeper into the various profiles and make the necessary changes to optimize your Lambda function!

How to add Pyroscope's Lambda Extension to your Lambda Function#

Pyroscope's lambda extension works with our various agents and documentation on how to integrate with those can be found in the integrations section of our documentation. Once you've added the agent to your code, there are just two steps needed to get up and running with profiling lambda extension:

Step 1: Add a new Layer in "function settings"#

Add a new layer using the latest "layer name" from our releases page.

lambda_add_a_layer_01-01

Step 2: Add environment variables to configure where to send the profiling data#

You can send data to either either pyroscope cloud or any running pyroscope server. This is configured via environment variables like this: lambda_env_variables_01-01

Lambda Function profile#

Here's an interactive flamegraph output of what you will end up with after you add the extension to your Lambda Function: image

While this flamegraph is exported for the purposes of this blog post in the Pyroscope UI you have additional tools for analyzing profiles such as:

  • Labels: to view function cpu performance or memory over time using FlameQL
  • Time controls: to select and filter for particular time periods of interest
  • Diff view: Compare two profiles and see the differences between them
  • And many more!

· 3 min read

Stop screenshotting Flamegraphs and start embedding them#

Typically a flamegraph is most useful when you're able to click into particular nodes or stack traces to understand the program more deeply. After several blog posts where we featured flamegraphs as a key piece of the posts we found that screenshotting pictures of flamegraphs was missing this key functionality compared to being able to interact with flamegraphs.

As a result, we created flamegraph.com to have a place where users can upload, view, and share flamegraphs.

We recently released an update to flamegraph.com that makes it easy to embed flamegraphs in your blog or website. The steps to embed a flamegraph are:

  1. Upload a flamegraph or flamegraph diff to flamegraph.com
  2. Click the "Embed" button
  3. Copy the "Copy" button to copy the embed code snippet
  4. Paste the embed code snippet into your blog or website

clicking_embed_button_high_res

· 6 min read
warning

Pyroscope is now part of Grafana Labs! As a result we have consolidated efforts around our Grafana plugins to make one recomended way of using Pyroscope. As a result this blogpost is now outdated. However, you can find updated docs on how to add profiling to Grafana with Pyroscope using the Grafana profiling documentation.

Grafana is an open-source observability and monitoring platform used by individuals and organizations to monitor their applications and infrastructures. Grafana leverages the three pillars of observability, metrics, logs, and traces, to deliver insights into how well your systems are doing. Nowadays, Observability involves a whole lot more than metrics, logs, and tracing; it also involves profiling.

In this article, I will:

  1. Describe how to leverage continuous profiling in Grafana by installing the Pyroscope flamegraph panel and datasource plugin
  2. Show how to configure the plugins properly
  3. Explain how to setup your first dashboard that includes profiling
  4. Give a sneak peak of an upcoming feature that will let you link profiles to logs, metrics, and traces

If you're new to flamegraphs and would like to learn more about what they are and how to use them, see this blog post.

Introduction#

pillars-of-observability-complete

Grafana provides you with tools to visualize your metrics, view logs, and analyze traces, but it is incomplete without the added benefits of profiling. Continuous Profiling is super critical when you’re looking to debug an existing performance issue in your application. It enables you to monitor your application’s performance over time and provides insights into parts of your application that are consuming resources the most. Continuous profiling is used to locate and fix memory leaks, clean up unused code, and understand the call tree of your application. This results in a more efficient application.

Benefits of using Pyroscope in Grafana#

Unified view for complete observability#

Using Pyroscope in Grafana provides you with complete observability without leaving your Grafana dashboard. Grafana leverages the powerful features of Pyroscope to take complete control of your application’s end-to-end observability and makes things like debugging easy. You can now see your profiles alongside corresponding logs, metrics, and traces to tell the complete story of your application.

Zero migration cost#

It costs nothing to migrate your application profile from Pyroscope’s UI dashboard into Grafana. Simply open Grafana and install both the Pyroscope panel and datasource plugin, and you’re all set!

left-right: flamegraph in Pyroscope, flamegraph in Grafana