Finding a .NET Core memory leak in a production environment is a high-stakes challenge that often happens when you cannot attach a debugger. Whether your Linux container is hitting a memory limit or your Windows service is slowly consuming all available RAM, you need a way to inspect the managed heap without stopping the world. This guide walks you through the exact process of using dotnet-dump and SOS extensions to isolate leaking objects and find their root causes.
By the end of this tutorial, you will know how to capture a process dump, analyze object statistics, and trace references back to the source of the leak. We will focus on .NET 8 and .NET 9 environments, though these techniques apply to all modern .NET versions. If your application is crashing with an OutOfMemoryException, the steps below are your primary line of defense.
TL;DR — Use dotnet-tool install -g dotnet-dump to install the CLI tool. Capture a dump with dotnet-dump collect -p [PID]. Open it using analyze, run dumpheap -stat to find heavy objects, and use gcroot [Address] to find why the Garbage Collector isn't freeing them.
Understanding Managed Memory Leaks
A .NET Core memory leak usually occurs within the Managed Heap. Unlike C++ where you might forget to free() memory, in C# the leak happens when you unintentionally keep a reference to an object alive. Because the Garbage Collector (GC) only deletes objects that are "unreachable," a single static list or an un-disposed event listener can hold thousands of objects in memory indefinitely.
The dotnet-dump tool is a global CLI tool that functions as a cross-platform replacement for Windows-specific debuggers like WinDbg. It includes the SOS (Son of Strike) extension, which provides the commands necessary to inspect the internal state of the .NET Runtime. When you use this tool, you are looking at the "raw" memory of the application at a specific point in time to see exactly what the GC sees.
In modern cloud-native development, understanding the GC's behavior is vital. As of .NET 8, the runtime has become more aggressive with memory management, but it still cannot override a developer's decision to keep a reference. You must use diagnostic tools to prove which specific object types are bloating your memory footprint.
When to Use dotnet-dump Over Other Tools
You should use dotnet-dump when you are dealing with production environments where you cannot install a full IDE or a profiling suite. It is particularly effective for Linux containers (Docker/Kubernetes) because it does not require lldb or other heavy native dependencies. If your metrics show a "sawtooth" pattern in memory usage or a steady climb that never returns to baseline, it is time for a dump.
Consider these three real-world scenarios for a .NET Core memory leak investigation:
- The Slow Growth: Memory increases by 10MB every hour. This is often a collection (like a
ConcurrentDictionary) that is being added to but never cleared. - The Sudden Spike: Memory jumps from 200MB to 2GB in seconds. This usually indicates a large data processing task, like reading a massive file into a
byte[]instead of streaming it. - The Non-Returning High: After a high-traffic event, memory remains high even though traffic dropped. This could be a Fragmentation issue or a Gen 2 collection problem.
For external documentation, the Official Microsoft dotnet-dump docs provide the latest CLI flags and updates.
Step-by-Step: Analyzing the Memory Leak
Step 1: Install and Capture
First, ensure you have the diagnostic tools installed on the machine where the process is running. Open your terminal and run:
dotnet tool install -g dotnet-dump
Find the Process ID (PID) of your running .NET application. You can use dotnet-dump ps to list them. Once you have the PID, capture the dump. On Linux, a "Full" dump is usually required to see the managed heap effectively.
# Capture the dump
dotnet-dump collect -p 1234 --type Full
Step 2: Start the Analysis
Once you have the .dmp file (or a coredump on Linux), start the interactive shell. This doesn't require the app to be running anymore; you are analyzing a snapshot.
dotnet-dump analyze core_20231027_101522
Step 3: Finding the Culprit with dumpheap
The most important command for a .NET Core memory leak is dumpheap -stat. This sorts all objects on the heap by their type and tells you how much memory each type consumes.
# Command
> dumpheap -stat
# Example Output Snippet
MT Count TotalSize Class Name
00007f81 1,204 56,432 System.String
00007f82 450 128,900 System.Byte[]
00007f83 10,000 4,500,000 MyNamespace.LeakyService+UserSession
Look at the TotalSize column. In this example, UserSession is consuming 4.5MB across 10,000 instances. If you know you only have 10 active users, you have found your leak.
Step 4: Finding the GC Root
Now that you know what is leaking, you need to know why. Pick the Method Table (MT) address for the leaky type and list all individual addresses for those objects.
> dumpheap -mt 00007f83
# Output:
Address MT Size
000001f2800012a8 00007f83 450
Take one of those addresses and run gcroot. This command shows the chain of references keeping that object alive.
> gcroot 000001f2800012a8
# Output trace:
-> MyNamespace.GlobalCache
-> System.Collections.Generic.List<UserSession>
-> MyNamespace.UserSession
The output above tells us that a static class named GlobalCache holds a list that contains our UserSession. This is the "smoking gun." To fix this leak, you must remove the session from the GlobalCache when the user logs out.
Common Pitfalls in Dump Analysis
Another major pitfall is the **Large Object Heap (LOH)**. Objects larger than 85,000 bytes are placed in a special heap that is not compacted by default. You might see high memory usage not because of a leak, but because of fragmentation. In dumpheap -stat, look for Free objects. If Free consumes a large percentage of your heap, you aren't leaking objects; you are just leaving "holes" in memory that the GC can't easily fill.
Furthermore, ensure you are analyzing the dump on an architecture that matches the source. If you capture a dump on a Linux ARM64 container, you must analyze it using a toolset compatible with that architecture. Analyzing a Linux dump on Windows used to be impossible, but dotnet-dump has made this significantly easier—just ensure your runtime versions match exactly.
Optimization Tips and Summary
To prevent a .NET Core memory leak from reaching production, consider these metric-backed tips:
- Unsubscribe from Events: Event handlers are the #1 cause of managed leaks. If an object subscribes to an event on a long-lived service, it will never be collected. Always use
-=in yourDisposemethod. - Use ArrayPool for Large Buffers: Instead of creating new
byte[]arrays for every request, useArrayPool<byte>.Shared. This reduces pressure on the LOH and prevents fragmentation. - Limit Static Collections: If you must use a static
Dictionary, ensure it has a maximum size or an eviction policy (likeMemoryCachewith expiration).
- Identify leaks using dumpheap -stat to find high-count/high-size types.
- Use gcroot to find the reference chain preventing Garbage Collection.
- Check the LOH for fragmentation if total memory is high but object counts are low.
- Always capture Full dumps to ensure the managed heap data is included.
Frequently Asked Questions
Q. What causes .NET memory leaks most often?
A. The most common causes are static references (collections that never shrink), un-disposed event handlers where the publisher outlives the subscriber, and captured variables in long-running anonymous lambdas or timers. These keep objects "rooted," preventing the GC from reclaiming them.
Q. How can I see memory usage without taking a full dump?
A. You can use dotnet-counters monitor -p [PID] to see real-time statistics of Gen 0, 1, 2, and LOH sizes. This is a lightweight way to confirm a leak is happening before you commit to the disk space required for a full dump analysis.
Q. Can dotnet-dump analyze native memory leaks?
A. No, dotnet-dump and SOS extensions are primarily designed for the Managed Heap. If your leak is in native code (e.g., P/Invoke, native drivers, or Interop), you will need tools like Valgrind on Linux or PerfView/UMDH on Windows.
Post a Comment