What is PerfMonitor?

PerfMonitor is a tool for collecting and accessing windows tracing information (also known as Event Tracing for Windows or ETW). Built into many parts of windows including the kernel and the .NET Common Language Runtime (CLR) are logging routines that will generate very detailed information about what the computer is doing. PerfMonitor is a tool that allows you to turn this logging on (on a machine wide basis) and collect the data into Event Trace Log (ETL) files. PerfMonitor also allows you to convert the binary ETL files to an XML file. See PerfMonitor for more on donloading the tool or the Download page for getting the binary.

Quick Start

Perhaps the fastest way to get a sence of the kind of data that is available using PerfMonitor is to quickly collect some. Try the command:

     perfMonitor /stacks runPrint perfMonitor list


This command tells perfMonitor to:
  • Turn on event monitoring (by default for OS event and .NET Common Language Runtime (CLR)). By default this goes to two files call perfMonitorOutput.etl and perfMonitorOutput.kernel.etl. The /stacks option tells it to collect stack traces on events where that is supported.
  • Run a command for which you want to collect data. In this case we use the command 'perfMonitor list'.
  • Turn off event monitoring.
  • 'Print' the data from the binary *.etl files in a way that can be read by humans. By default this goes into a file called perfMonitorOutput.xml.

You can then browse the data using a text editor, or an XML viewer. Because the XML files tend to be very large, it is NOT recommended that you display the data using internet explorer (notepad is better) By browsing the file you will see the wealth of detailed information that was collected including:
  • Every process that was active during data collection (including those started before logging began!). Each process knows the command line which started it and the exit code (if it ended).
  • Every thread that was active or created in the system.
  • Every DLL or EXE that was loaded into some process, and where it lives in the address space of the process that loaded it. Because we asked for stack traces we also know the call stack where the DLL was loaded.
  • Every Disk read or write that happened, including its size the the file name associated with it.
  • Every page fault in the system (again with a stack trace)
  • A sample of the instruction pointer (IP) every millisecond. Coupled with the DLL load information this tells you where the CPU is spending its time. Because we provided the /stacks option we also got a stack trace where this sample was taken. This is an extremely powerful method of sample-based profiling.
  • Every Garbage Collection the CLR performed, when it started and ended, and statistics about how much memory was reclaimed.
  • The memory extent of the native code generated by the CLR. This allows the instruction pointers samples to be mapped to the symbolic name of the method that was running.

In short, if you want detailed data on what your computer is doing, ETW can give it to you.

Analyzing data

In addition to the 'print' functionality, PerfMonitor has a few simple analysis built into it. The simplest is the 'stat's command that gives you the counts of various events that ETW collected. It is useful to see 'what data is out there' and to diagnose issues with data collection (am I getting the events I expect).
     perfMonitor stats


Perfmonitor also has a number of CLR specific analyses it can do. In particular it has summarys of the .NET runtime garbage collector as well as Just In Time compiler. For example the command
     perfMonitor gcTime

Will give you detailed information on what was going on in the GC for every process that loaded the runtime. It will give number of collections (and of what type), amount of memory allocated, heap sizes before an after collection, pause times and other information related to the .NET Garbage collector. Simmilarly the command
     perfMonitor jitTime

Will tell you every method you Just in Time compiled of every application that was running during the collection interfval, and what DLL it came from. On V4.0 runtimes, it will also tell you the amount of time spent generating that code. This can be used to quickly determine whether the NGEN.EXE tool would be helpful for your application.

More options for collect data

One of major functions of PerfMonitor is to provide a mechanism for users to control the collection of ETW data by acting as an ETW Controller. This is exposed in the following PerfMonitor commands:
  • perfMonitor [options] start [<fileName>]
  • perfMonitor [options] stop
  • perfMonitor [options] run [command line]

For those that are familiar with XPerf, the collection functionality of XPERF and PerfMonitor are very simmilar. The start command turns on (machine wide) logging, and the stop command turns logging off. Only one session can be on simultaneously. If you try to start a session when one is already started, it closes the existing session and opens a new one. The effect of turning on a session is immediate and takes effect on live processes as well as new processes that are created. A filename can be specified (suggested extension is .ETL), but it will default to 'perfMonitorOutput.etl if not provided. ETW allows a huge array of events to be turned on, which can be a daunting task, so perfMonitor simplifies this by choosing a good set of default events, that are useful in a broad variety of performance investigations but are not too expensive to collect. This includes OS kernel operations associated with processes, threads, DLLs, page faults, diskIO, and CPU sampling as well as CLR events associated with Garbage Collections, the CLR Thread pool, lock contention, exceptions, and compilation of managed code to native code.

Due to a limitation in implementation, kernel event cannot be collected into the same ETL file as all other events. PerfMonitor works around this by making two separate files (e.g. *.etl and *.kernel.etl). Whenever it is manipulating etl files, it looks for the *.kernel.etl 'shadow' file and combines the two files into one when doing its processing. Thus typically perfMonitor generates two files.

PerfMonitor supports the following options for fine tuning the data collected
  • /noKernel - Don't collect OS kernel events (just the CLR events). For example if you were only interested in GC statistics you might use this option
  • /noCLR - Don't collect .NET CLR Runtime events. If the process you are interested in is not managed, but there are other managed processes running, you can use this to avoid collecting events from uninteresting processes.
  • /provider:GuidOrName this option allows you to turn on any arbitrary event provider by specifying its GUID or its registered name.
  • /keyAny:<hex> providers have 'keywords that tell which specific sets of events to turn on. THis is expressed as a 64 bit bitvector. This option allows you to specify this
  • /level:<num> each event also has a 'verbosity' associated with it. This option lets set the verbosity. 1 is the least verbose, 5 is the most verbose, 0 is the 'default'

These options only have an effect if the Kernel provider is turned on (that is /noKernel was NOT specified);
  • /noProfile - The sample based profiling every 1msec is one of the more common events in the log. You can reduce the log size significantly by turning off this profiling if you are not interested in it.
  • /cswitch - The kernel and fire an event every time one thread loses the CPU to another thread (a context switch or cswitch). These events are VERY valuable to understand what is causing delays (because whenever a thread waits, it context switches), but are also quite volumnous (3X more common than sample profile), so they are off by default.
  • /stacks - (Vista+) For kernel OS events, it is possible to ask the kernel to crawl the stack and log that stack trace on every event that it logs. This gives a very useful hierarchical view of where an event occurs. It generally increases the log size by a factor of 5 or more because every event has another stack trace event associated with it (which has up to 96 call frames in it)

Once started, the ETW session created by 'perfMonitor start' will live until it is explictly stopped with 'perfMontior stop'. It is relatively easy for a session to be left running accidentially (eg when a script fails and does not call 'perfMonitor stop'), so perfmonitor provide the following command:
  • perfMonitor run <command>

This command starts monitoring, runs a command, and stops monitoring. It has error handling so that in most cases it will not leave monitoring on (eg if you hit Ctrl-C it will stop monitoring before exiting). However if perfMonitor is terminated by an external agent (eg from Task manager), it won't be able to do this cleanup. Generally it is safer and more convinient to use the run command when you can do so.

Last edited Apr 7, 2010 at 12:27 AM by VanceMorrison, version 8

Comments

KevinBurton Dec 2, 2010 at 1:36 AM 
If you do 'perfMonitor start' and the events are collected in perMonitorOutput.etl, how do I get the human readable output? I don't see that runPrint is an option for 'perfMonitor start'.

Thank you for a great tool.