This project has moved. For the latest updates, please go here.

Unmanaged Memory Consumption during trace > 300MB

Sep 29, 2011 at 2:28 PM

Hi,

Is there anyone else having this issue and might have a solution?

I'm tracing (processing) realtime kernel events, and after about one minute or so the memory consumption increases from about 25MB to more than 300MB. About 95% unmanaged memory. Processor is at 100%. 

Sep 30, 2011 at 12:26 PM

Found the Problem: Managed is too slow ^^

Event processing using the bcl.TraceEvent api has too much overhead.

Removing most of the (api-internal) handlers and working with the copied event data in an separate Task() solves the problem.

Oct 3, 2011 at 11:57 PM

I really wish people would not make sweeping generalization like 'Managed is too slow'.   Like most sweeping generalizations, it is almost certainly wrong. 

First, the original complaint was memory size, not speed.   Second, that 300MB is NOT because the code is managed. 

Ideally, people respond by attaching a heap profiler (like ClrProfiler), and discover the problem. 

The problem is most likely a known issue.  You will find in the KernelTraceEventParser the following variables

         internal HistoryDictionary<string> fileIDToName;
        internal HistoryDictionary<int> threadIDtoProcessID;

These allow better printing of kernel events, however currently they never discard  entries.

This is fine for an ETL file (which is finite), however for real time it causes a 'leak'.    This is on the TODO list to fix.

In the mean, you may not need these maps, in which case you can simply turn them off.   However if you need their functionality you should contact me for further details.

Oct 4, 2011 at 8:49 AM

Hi,

during my research I've seen these history dictionaries. Therefore i tried to use a memory profilier, watching the size of the managed components. I couldnt find any problems with the managed stuff. The overall managed memory consumption was something like 20-30 MB. The biggest amount was the unmanaged memory block (~300mb). So there was no real problem on the managed site related to memory.

I figured out, that enabling certain event types in the kernel source causes the memory leak. Reason is: its just an enormous amount of events, causing the bridge from unmanaged to managed to be a bottleneck, because the unmanaged site has to wait until the managed site finished processing and returns from the callback. (The kernel logger fills up an internal buffer and when it reached his threshold it throws an event you can see in the system-error-eventlog)

A solution was to clone all events data and process them in one task for each event. But this in turn increased processor usage to an unusable level.

We thought about using an unmanaged component which does initial filtering of uninteresting events, but there is still another problem with the file name events we need to track:

When i would be able to receive the FileIoName events correctly it would be possible to further reduce the number of events i need to receive, because i wouldn't need to process the FileIoInit events to guess files names. The problem is described in the other thread i started: http://bcl.codeplex.com/discussions/274260

Hope you see my point now.

Kind Regards

 - Chris -

 

PS: Sometimes its just true, that managed processing is simply slower :)

Oct 5, 2011 at 2:53 PM

Your last post is much more helpful.   As I replied on the other thread you have a problem if you want to track File I/O and use a real-time provider until windows 8.

You indicated that the problem seems to be that events are generated faster than they can be processed.   What events are you turning on?   Typically you have < 10K events /sec, which means you have 100us to process each event.  This is alot of time, espeically if you can quickly discard the events.   I would have expected a well designed processor which used TraceEventSource to work fine at that rate.   (The overhead of dispatching an event in TraceEvent is < 100 instrs, which should only take < 1us, the rest is under your control). 

Vance

> PS: Sometimes its just true, that managed processing is simply slower :)

I am certainly willing to conseed that managed code can be slower.  What I am debating is the generalization 'my slowdown must be caused by managed overhead'.   This is VERY unhealthy mindset because it causes you to not even try.  For example, you gave up and created a 'sledge hammer' design (lot of task) because you beleived managed code was simply 'too slow' to keep up.   However from the evidence that you have presented so far does not support that.    It only suggest that you do have a bottleneck but does not indicate why.   The 'obvious' response should be to attach a profiler and find out exactly why.   That WOULD decide if it really was 'managed code overhead' or some other issue.   Only by correctly attributing the cause can you possibily hope to resolve the problem, and THIS is why I don't  like it when people jump to conclusions without measuring (enough).  

OK I will get off my soapbox now...   

Oct 5, 2011 at 4:41 PM
Edited Oct 5, 2011 at 5:07 PM

Thank you!

Sad to say that we moving in another direction now to track file access for those name-resolution reasons.

But let me give another comment to that measuring stuff:

I know that simply passing things into managed, then discard as much as possible and return would be fast enough. I simply removed ALL processing and the system was running fine. Then i reattached some event handlers which just added the opcode id to a list, then returned. List was dumped and cleaned every 500 entries. WOOSCH ->300MB..

Console dump was done in a separate task. So there was no blocking on that site.

Delegate resolution seems to have a little overhead.

Oct 6, 2011 at 11:26 AM

As you have seen, when you are dealing with a perf-critical code path, details matter.

This is probalby mute now, but my generic advice is to attach a profiler, it would have told you what the problem was.  It is not likely that it is delegate resolution.

I am speculating, now but I strongly suspect that you problem is that you are allocating objects of non-trivial lifetime on EVERY event (e.g. putting them into your list).  At the rate you have to service requests this a problem.  TraceEvent was very carefully designed so that it does NOT do this (it reuses the events it gives back).   For example, if you are using TraceEvent.Clone() on most events you will undoubtedly have problems at high rates.   Allocating something on every event (and letting it live just long enough to be a problem for the GC) needs to be avoided on your hot path.  Typically it is not hard to avoid such allocations, you just have a data structure that gets updated in place rather than added to.

But my real advice, however, is don't guess, measure.    As we have seen, it is very easy to misdiagnose a performance problem (we still don't know the root cause!).    You can tell people who have 'done perf' for a while because they want data, and are very careful about drawing conclusions (I am violating that rule a bit above, but I make it clear it is a guess.   I am just showing you that there are other explainations, and we need data to resolve which one is correct). 

Good luck on your project.

Vance