Skip to main content

5 posts tagged with "go"

View All Tags

· 6 min read

Introduction#

In this article, I will introduce you to flamegraphs, how to use them, how to read them, what makes them unique, and their use cases.

What is a flamegraph#

A flamegraph is a complete visualization of hierarchical data (e.g stack traces, file system contents, etc) with a metric, typically resource usage, attached to the data. Flamegraphs were originally invented by Brendan Gregg. He was inspired by the inability to view, read, and understand stack traces using the regular profilers to debug performance issues. The flamegraph was created to fix this exact problem.

Flamegraphs allow you to view a call stack in a much more visual way. They give you insight into your code performance and allow you to debug efficiently by drilling down to the origin of a bug, thereby increasing the performance of your application.

note

Technically, a flamegraph has it's root at the bottom and child nodes are shown above their parents while an icicle graph has it's root at the top and child nodes are shown below their parents. However, for the purposes of this article, we will be using the flamegraph terminology.

How are flamegraphs generated?#

tree-flamegraph

Flamegraphs are generated as a by-product of software profilers. They are represented in a tree-like structure where a node connects to another node to show their parent-child relationship. This corresponds to a call stack where one function calls another until it gets to the last function in the stack. The tree structure is then converted into a collection of boxes stacked together according to their relationship. This is known as a flamegraph.

example-flamegraph (3)

Example use cases for flamegraphs#

Flamegraphs are often seen as more of a high level or abstract concept, so here are some situations where flamegraphs would be useful.

Using flamegraphs during pager duty response#

Imagine a scenario where you’re the only engineer in your team that is on-call for the week. Days passed and thankfully, you don’t get any outage alert. On the last day of your on-call, you get a pager duty alert showing a spike in latency in the US-east region. You start by checking the metrics and then logs. In the course of your investigation, you notice the problems go even deeper in the stack.

In a scenario like this, you begin to find ways to visualize CPU usage and identify what’s consuming your CPU resource the most. That is exactly where flamegraphs shine!

Using flamegraphs for debugging customer complaints#

Let’s assume you work in a team fully focused on improving the performance of your company’s product. While doing your normal morning routine, you notice an escalation from customer support. A customer just reported an unusual delay in load time. Looking at your graph, you notice a spike in latency in the US-east region. In the course of your investigation, you begin to notice the problem goes even deeper in the stack.

In a scenario like this, you begin to find ways to visualize CPU usage and identify what is consuming your CPU resource the most. That is also where flamegraphs come to your rescue!

Why flamegraphs are important?#

Flamegraphs are the best way to visualize performance data. They allow you to have a full view of your computing resources and how they are utilized, in one single pane.

In an industry where software changes fast, it is important to quickly understand profiles. Flamegraphs do a good job of visually representing the state of your application. In other words, when something goes wrong, you’re able to quickly identify the difference between an ideal “efficient” state and the buggy “inefficient” state.

Flamegraphs on Pyroscope#

flamegraph on pyroscope

Pyroscope, an open-source continuous profiling platform, provides you with sleek flamegraphs to easily visualize your performance profile data. Pyroscope simplifies the entire process of collecting, converting, and analyzing profiles, thereby allowing you to easily focus on the things you care about the most. Pyroscope extends the functionalities of flamegraphs by allowing you to perform actions such as:

  • flamegraph comparison
  • sorting of stack frames from top to bottom and vice versa
  • flamegraph export
  • flamegraph collapsing, etc.

How to interact with a flamegraph using Pyroscope#

Pyroscope provides different ways of interacting with or using their flamegaph: pyroscope-hover

  • Mouse hover for information: To easily view the information of a particular profile node, you can hover on the node using the mouse. This displays a tooltip showing the full function name, number of samples present in the profile node, and its corresponding percentage.

click-to-zoom

  • Click to expand a node: Because of the tree-like structure of profiles, it can be tricky to display the relationship all at once. Flamegraphs makes it easy to navigate from a parent to a child node. By clicking a particular node, you can expand the node and view its subtree.

search-pyroscope

  • Search: In the Pyroscope UI, there is a search box right above the flamegraph. This allows you to search for any term including function names.

How to interpret a flamegraph#

parts-of-flamegraph (1)

You can read a flamegraph by first understanding its features. They include:

  • Node: They are represented as columns of rectangle boxes where each box represents a function
  • Width: The width of each node shows how often it was present in the stack. In other words, the wider the box, the longer the runtime.
  • X-axis: The x-axis shows the population of nodes. It does not show the passage of time but rather, it displays the entire collection of stack traces.
  • Y-axis: The y-axis indicates the depth of the call stack. On Pyroscope, the head/root node is positioned at the top by default. The bottom box (tip of the flame) shows the function call that is on-CPU at the initial point of stack trace collection.
  • Background color: The colors for profiles are generally not significant. They are distributed at random to distinguish between function calls. This helps the eye differentiate one profile from the other.

Playground#

With everything you've learned so far about flamegraphs, feel free to play around and interpret this interactive flamegraph embedded below.

Summary#

Now that you’ve learned about flamegraphs and how to use them, I encourage you to take a bold step in making performance profiling a priority in your observability journey. Metrics, logs, and traces, tells you a little bit of your observability story but flamegraphs make the story complete. Try Pyroscope today so that next time you encounter a performance issue, you have flamegraphs to your rescue.

· 5 min read

Coming from a background working as a frontend developer at Grafana I'm no stranger to open source performance monitoring. I was part of a team that was responsible for the overall user experience of Grafana and performance was one of the key considerations. Along the line, I learned about a debugging technique known as profiling for monitoring application performance and fell in love ever since.

chrome browser profiler

What is continuous profiling#

“Profiling” is a dynamic method of analyzing the complexity of a program, such as CPU utilization or the frequency and duration of function calls. With profiling, you can locate exactly which parts of your application are consuming the most resources. “Continuous profiling” is a more powerful version of profiling that adds the dimension of time. By understanding your system's resources over time, you can then locate, debug, and fix issues related to performance.

As a frontend developer, my experience with profiling was limited to the browser. However, in the course of my study, I discovered a new pattern of profiling that seems exciting– continuous profiling. Similar to how you use the profiler in the dev console to understand frontend performance issues, continuous profiling allows you to profile servers from various languages 24/7 and be able to understand resource usage at any particular time.

While continuous profiling is new to many, the concept is actually relatively old. In 2010, Google released a paper titled “Google Wide profiling: A continuous profiling infrastructure for data centers” where they make the case for the value of adding continuous profiling to your applications.

Industry traction for continuous profiling#

Since then, many major performance monitoring solutions have joined them in releasing continuous profiling products. As time has gone on, the continuous profiling space has been getting increasingly popular as various companies/VCs are more frequently making major investments in Continuous Profiling to keep up with demand for this type of monitoring.

continuous profiling trends

It’s not just major cloud providers and venture capitalists who are excited about continuous profiling, our industry leaders are super pumped about continuous profiling too!

Chris Aniszczyk, CTO of the Cloud Native Computing Foundation (CNCF) agrees that continuous profiling is an integral part of the observability stack.

This poll by Michael Hausenblas, solution engineering lead at AWS is good evidence that continuous performance profiling is becoming a culture for most engineering teams.

Even Carl Bergquist, Principal Engineer at Grafana, predicts that the popular FOSDEM conference will see more profiling talks moving forward.

And like Liz Fong-Jones, Principal developer advocate at Honeycomb, many in the tech world are “buzzing” about continuous profiling!

Simply put, continuous profiling is the future of application performance assessment

Continuous profiling using Pyroscope#

At Pyroscope, we actually profile our own servers using Pyroscope and have found many cases where profiling has saved us a lot of time and money by identifying performance issues. That being said, at any organization, understanding resource utilization and adjusting to allocate it efficiently can be the difference between a healthy company with happy end-users (and employees) or a chaotic one where everyone is running around putting out fires and end-users are suffering.

There is no better tool than continuous profiling to help be the difference-maker in these situations. No matter what your job function is in your organization there are still very valuable benefits of having continuous profiling as part of your observability stack.

user_personas_04-01

Pyroscope also provides you with multi-language support, ad-hoc profiling, profile sharing, and more!

Pyroscope GIF Demo

What makes Pyroscope unique?#

Pyroscope is open-source which means that it’s easier to get started and try out than cloud providers profilers. Pyroscope’s custom storage engine is language-agnostic so any profiling data ranging from rbspy to eBPF to even new profilers such as Bloomberg's Memray can easily send data to Pyroscope and have it compressed, stored and queried efficiently.

agent_server_diagram_for_blog_post_1-01

Pyroscope is also built to be easy to use by everyone. The pyroscope UI in particular was designed to be super intuitive for anyone to play around, analyze, and understand profiles.

Whether you are operating at a small scale or enterprise level, the power of continuous profiling can quickly and easily reduce infrastructure costs and improve scalability. With Pyroscope, you can build high-performance applications that increase customer satisfaction and put value back into your business.

How to get started with Pyroscope#

Coming in as being relatively new to profiling myself, the easiest way for me to get started with continuous profiling was to install pyroscope. To learn how to use continuous profiling in your code or application, I would suggest you run the most relevant dockerized example from the examples folder on GitHub or you can take a look at the installation guide here.

Pyroscope has offered me useful features that make profiling fun and exciting. With Pyroscope, I’ve been able to understand how my application, server, and code are running. The best part is that I can use Pyroscope without paying any money; it’s completely free!

If you need any help getting started, join the community slack or follow Pyroscope on Github and Twitter!

· 4 min read

image

Why we added adhoc profiling#

While most profilers are built for more static or adhoc analysis (ie.profiling a script), Pyroscope's continuous profiling gives you the opportunity to jump around to any point in time. This fluid profiling is beneficial for understanding performance issues in your application. However, as we continued to improve on Pyroscope UI/UX, we identified ideal situations to use static profiling instead of running a profiler continuously, including profiling scripts and attaching a running process.

Our goal is to make Pyroscope a one-stop-shop for all profiling needs. That means supporting all languages, all flamegraph formats, continuously profiling servers, and, of course, quickly profiling an adhoc script.

Introducing Adhoc profiling#

That being said, we are excited to officially release Adhoc profiling mode for Pyroscope! With adhoc mode, you get all the convenience and simplicity of profiling a script, as well as Pyroscope's stellar visualization and UI functionality.

Exporting interactive Flamegraph HTML from Pyroscope Adhoc mode#

You can use the pyroscope adhoc command to profile a script and export the HTML file for an interactive profile that you can view, play with, and share with your team.

For example, to profile a Golang script like this one you simply run:

pyroscope adhoc go run adhoc-push.go

which will produce a folder containing an .html file with an interactive flamegraph for each of the pprof profiles.

image

This command will profile the file passed in and then export an HTML file that looks like the Pyroscope UI, but it is exported as a simple standalone HTML file:

image

Note: This HTML export can also be achieved through the file exported dropdown available in the flamegraph toolbar of your continuous profiling flamegraphs:

image

This is the simplest way to use Pyroscope adhoc mode. The new feature is great for quickly viewing and sharing profiles in a format with more functionality than a simple screenshot.

Analyzing Pyroscope-generated Adhoc Flamegraphs#

HTML file export allows you to easily share profiles with other people. Pyroscope also permanently stores all of your profiles in the ~/.pyroscope/ directory so you can revisit or compare profiles. In order to do that, run: Pyroscope stores data in the ~/.pyroscope/ directory. If you run:

pyroscope server

And go to port :4040 to access the flamegraphs you have created via the pyroscope adhoc command.

2022-01-25 11 55 01

Viewing Flamgraph diffs between two adhoc profiles#

Similar to how we have the diff view for continuous profiling, Adhoc also supports calculating the diff between two flamegraphs. This functionality requires server-side code, so it is only available for files saved in the ~/.pyroscope directory. We plan to improve this over time, or potentially even compile these functions to WASM.

image

Coming soon!#

As of January 26, 2022, adhoc mode is meant for three main use cases:

  • Profiling scripts using Pyroscope: Running pyroscope adhoc command and being able to view/share the resulting flamegraphs
  • Viewing profiles exported from Pyroscope: Dragging and dropping files that have been exported to json format and then analyzing those flamegraphs
  • Getting a shareable standalone HTML flamegraph that has the same UI and analysis features as Pyroscope web interface

We plan to expand functionality and support more languages, formats and use cases:

  • Supporting drag-and-drop for pprof files
  • Adding ability to comment on stored files
  • Adding descriptions / annotations to flamegraphs
  • Getting a shareable link to flamegraphs
  • UI improvements
  • and much more!

Let us know if you have any recommendations for how to improve adhoc mode!

Follow us on Github or Twitter or join our slack for updates on this!

· 6 min read

How we improved performance of our Go application#

Recently we released a new feature where users can run Pyroscope in pull mode. It allows you to pull profiling data from applications and it has various discovery mechanisms so that you can easily integrate with things like kubernetes and start profiling all of your pods with minimum setup.

For Pyroscope, the difference between push and pull mode is that:

  • Push mode: Sends a POST request with profiling data from the application to the Pyroscope server and return a simple response
  • Pull mode: Pyroscope sends a GET request to targets (identified in config file) and the targets return profiling data in the response.

push_vs_pull_diagram_07

One of the major benefits of pull mode is creating meaningful tags. For example, in Kubernetes workflows you can tag profiles with popular metadata fields:

  • pod_name: Name of the pod
  • container_name: Name of the container
  • namespace: Namespace of the pod
  • service_name: Name of the service
  • etc

Then using Pyroscope's query language, FlameQL, you can filter profiles by those tags over time to see where the application is spending most of its time.

Early on we had a user who ran pyroscope in pull mode with about a thousand of profiling targets and about 1 TB of raw profiling data per day, but their Pyroscope server was running into some performance issues.

This was surprising to us because in push mode, we've seen pyroscope handle similar amounts of traffic without issues. Therefore, we suspected that the performance issue had to do with one of the key architectural difference between push mode and pull mode.

While pprof is a great format for representing profiling data, it's not the most optimal format when it comes to storing profiling data. So, we transcode pprof into our internal format that we optimized for storage in our custom storage engine.

  • Push mode: In push mode pprof transcoding is done on each profiling target so the load was distributed across many targets
  • Pull mode: In pull mode we moved pprof transcoding to the server side so the load was distributed across only one target

So, we immediately suspected that this performance regression might have something to do with the load increase on the pulling server.

pprof_transcoding_04

Using profiling to spot performance issues in Go#

Pyroscope server is written in Golang and it continuously profiles itself. So, when these kinds of issues happen we're usually able to quickly find them.

On this screenshot you can see that the 2 functions that take up a lot of time are FindFunction and FindLocation. When Pyroscope transcodes pprof profiles into an internal flamegraph format these functions are called as part of that process. However, it seemed suspicious that such simple functions were consuming so much CPU time.

screenshot of pyroscope showing the performance issue

FindFunction and FindLocation consume a significant amount of CPU

To understand the necessary elements to "find" Location and Function we looked up how pprof objects are structured:

anatomy of a pprof profile

The id field for both Location and Function arrays ascend in consecutive numerical order for each new object

Note that location and function fields are actually arrays, containing the locations and functions respectively. The objects in these arrays are identified by IDs.

FindFunction and FindLocation functions are almost identical and they both search through their respective arrays searching for objects by IDs.

func FindFunction(x *Profile, fid uint64) (*Function, bool) {    // this sort.Search function is the expensive part    idx := sort.Search(len(x.Function), func(i int) bool {        return x.Function[i].Id >= fid    })
    if idx < len(x.Function) {        if f := x.Function[idx]; f.Id == fid {            return f, true        }    }    return nil, false}

And if you look closer at the functions they seem to be pretty optimized already — they use sort.Search which is a Golang implementation of binary search algorithm. We initially assumed that binary search would be the fastest here, because it's typically the fastest way to search for an element in a sorted array.

However, looking at the flamegraph, this was the bottleneck that was slowing down the whole system.

Performance Optimization #1: Caching the data in a hash map#

In our first attempt at fixing the issue we tried to use caching. Instead of performing the binary search every time we needed to find a function, we cached the data in a hash map.

That did improve the performance a little bit, but we only traded one relatively expensive operation (binary search - Green) for another one (map lookups - Red) that used slightly less CPU, but still were expensive.

second_fg_image

The green nodes are functions where we decreased CPU usage and the red nodes are functions where we added CPU usage for this optimization. While we removed FindName and FindLocation, we added runtime.mapaccess2_fast64

Performance Optimization #2: Eliminating the need for binary search#

As I mentioned earlier, objects in function and location arrays were sorted by ID. Upon closer inspection, we discovered that not only were the arrays sorted, but the IDs also started at 1 and ascended in consecutive numerical order (1,2,3,4,5). So if you wanted to get an object with ID of 10, you look at the object at position 9.

So, although we initially thought binary search was the fastest way to find functions and locations in their respective arrays, it turned out that we could eliminate the need to search altogether by referencing objects by their ID – 1.

This resulted in complete removal of the performance overhead caused by FindFunction and FindLocation functions.

third_fg_image

The green nodes are functions where we decreased CPU usage and the red nodes are functions where we added CPU usage for this optimization. We removed FindName and FindLocation and also removed runtime.mapaccess2_fast64

What happens when you profile a profiler?#

In retrospect, we probably should have started with looking at the specifications for a pprof object before assuming that binary search was the best way to find objects. But we wouldn't have even know to look at the specifications until we started profiling our own code.

It's just the nature of developing software that as complexity increases over time, more performance issues sneak their way into the codebase.

However, with continuous profiling enabled, it allows you to spot these performance issues in your code and understand which parts of your code are consuming the most resources.

If you'd like to learn more about how to get started with Pyroscope and learn where your code's bottlenecks are, check out the Pyroscope documentation.

Links#

· 5 min read

Continuous Profiling for Golang applications#

Profiling a Golang Rideshare App with Pyroscope#

golang_example_architecture_05_1

Note: For documentation on Pyroscope's golang integration visit our website for golang push mode or golang pull mode

Background#

In this example we show a simplified, basic use case of Pyroscope. We simulate a "ride share" company which has three endpoints found in main.go:

  • /bike : calls the OrderBike(search_radius) function to order a bike
  • /car : calls the OrderCar(search_radius) function to order a car
  • /scooter : calls the OrderScooter(search_radius) function to order a scooter

We also simulate running 3 distinct servers in 3 different regions (via docker-compose.yml)

  • us-east-1
  • us-west-1
  • eu-west-1

One of the most useful capabilities of Pyroscope is the ability to tag your data in a way that is meaningful to you. In this case, we have two natural divisions, and so we "tag" our data to represent those:

  • region: statically tags the region of the server running the code
  • vehicle: dynamically tags the endpoint

Tagging static region#

Tagging something static, like the region, can be done in the initialization code in the main() function:

    pyroscope.Start(pyroscope.Config{        ApplicationName: "ride-sharing-app",        ServerAddress:   serverAddress,        Logger:          pyroscope.StandardLogger,        Tags:            map[string]string{"region": os.Getenv("REGION")},    })

Tagging dynamically within functions#

Tagging something more dynamically, like we do for the vehicle tag can be done inside our utility FindNearestVehicle() function using a with pyroscope.tag_wrapper() block

func FindNearestVehicle(search_radius int64, vehicle string) {    pyroscope.TagWrapper(context.Background(), pyroscope.Labels("vehicle", vehicle), func(ctx context.Context) {               // Mock "doing work" to find a vehicle        var i int64 = 0        start_time := time.Now().Unix()        for (time.Now().Unix() - start_time) < search_radius {            i++        }    })}

What this block does, is:

  1. Add the label pyroscope.Labels("vehicle", vehicle)
  2. execute the FindNearestVehicle() function
  3. Before the block ends it will (behind the scenes) remove the pyroscope.Labels("vehicle", vehicle) from the application since that block is complete

Resulting flamegraph / performance results from the example#

Running the example#

To run the example run the following commands:

# Pull latest pyroscope image:docker pull pyroscope/pyroscope:latest
# Run the example project:docker-compose up --build
# Reset the database (if needed):# docker-compose down

What this example will do is run all the code mentioned above and also send some mock-load to the 3 servers as well as their respective 3 endpoints. If you select our application: ride-sharing-app.cpu from the dropdown, you should see a flamegraph that looks like this (below). After we give 20-30 seconds for the flamegraph to update and then click the refresh button we see our 3 functions at the bottom of the flamegraph taking CPU resources proportional to the size of their respective search_radius parameters.

Where's the performance bottleneck?#

golang_first_slide

The first step when analyzing a profile outputted from your application, is to take note of the largest node which is where your application is spending the most resources. In this case, it happens to be the OrderCar function.

The benefit of using the Pyroscope package, is that now that we can investigate further as to why the OrderCar() function is problematic. Tagging both region and vehicle allows us to test two good hypotheses:

  • Something is wrong with the /car endpoint code
  • Something is wrong with one of our regions

To analyze this we can select one or more tags from the "Select Tag" dropdown:

image

Narrowing in on the Issue Using Tags#

Knowing there is an issue with the OrderCar() function we automatically select that tag. Then, after inspecting multiple region tags, it becomes clear by looking at the timeline that there is an issue with the us-west-1 region, where it alternates between high-cpu times and low-cpu times.

We can also see that the mutexLock() function is consuming almost 70% of CPU resources during this time period.

golang_second_slide-01

Comparing two time periods#

Using Pyroscope's "comparison view" we can actually select two different time ranges from the timeline to compare the resulting flamegraphs. The pink section on the left timeline results in the left flamegraph, and the blue section on the right represents the right flamegraph.

When we select a period of low-cpu utilization and a period of high-cpu utilization we can see that there is clearly different behavior in the mutexLock() function where it takes 33% of CPU during low-cpu times and 71% of CPU during high-cpu times.

golang_third_slide-01

Visualizing Diff Between Two Flamegraphs#

While the difference in this case is stark enough to see in the comparison view, sometimes the diff between the two flamegraphs is better visualized with them overlayed over each other. Without changing any parameters, we can simply select the diff view tab and see the difference represented in a color-coded diff flamegraph.

golang_fourth_slide-01

More use cases#

We have been beta testing this feature with several different companies and some of the ways that we've seen companies tag their performance data:

  • Tagging Kubernetes attributes
  • Tagging controllers
  • Tagging regions
  • Tagging jobs from a queue
  • Tagging commits
  • Tagging staging / production environments
  • Tagging different parts of their testing suites
  • Etc...

Future Roadmap#

We would love for you to try out this example and see what ways you can adapt this to your golang application. While this example focused on CPU debugging, Golang also provides memory profiling as well. Continuous profiling has become an increasingly popular tool for the monitoring and debugging of performance issues (arguably the fourth pillar of observability).

We'd love to continue to improve our golang integrations and so we would love to hear what features you would like to see.