.NET Internals: The Profiling API

Brian Long (www.blong.com)

Table of Contents

Click here to download the files associated with this article.

If you find this article useful then please consider making a donation. It will be appreciated however big or small it might be and will encourage Brian to continue researching and writing about interesting subjects in the future.


Introduction

Tucked away in a dark corner of the .NET Framework SDK is some documentation on an API that lets you find out lots of useful information about the behaviour of your .NET applications. The API lets you dynamically obtain many different types of information as your applications execute, using a well structured and reasonably straightforward approach. It enables you to build execution flow trace tools, code timing tools, exception analysis tools, memory usage analysis tools, hotspot and bottleneck identification tools to name but a few.

This paper introduces the Profiling API, a part of .NET that has not yet received the coverage it rightly deserves. It certainly deserves to be better known due to the power it imparts on the developer. Before looking at the Profiling API we start by briefly looking at the more traditional mechanism for extracting metrics from .NET applications as they execute.

Note: coverage of certain system settings assumes you are working on an NT-based platform, particularly either Windows 2000, Windows XP or Windows Server 2003. This should be a safe assumption as the .NET Framework SDK is not supported on Windows 95, Windows 98 or Windows Me, although it should be noted that the .NET Framework itself is supported on Windows 98 and Windows Me.

.NET System Metrics

There are many aspects of your programs behaviour that could be useful to be informed of. The traditional advice for finding information about running .NET applications, such as memory consumption, JIT behaviour, processor usage, etc. is to use performance counters.

Certainly if you search the .NET Framework SDK documentation for the word "profiling" you will be mostly directed to the use of performance counters through the System.Diagnostics.PerformanceCounter and System.Diagnostics.PerformanceCounterCategory classes. There is scant reference to the Profiling API in the regular documentation; the only exception is a single page that specifies how to enable a profiler written using the Profiling API.

Performance Counters

According to the documentation, performance counters can be used to monitor everything from memory management to just-in-time (JIT) compiler performance. They tell you about the resources your application uses, which is an indirect measure of your application's performance. You use these counters to understand how your application works internally rather than to directly measure your application's performance. Various Windows services offer performance counters to provide information in a wide variety of areas and .NET adds many more to the list.

From unmanaged Win32 applications you can access performance counter information by querying the Windows registry or by using the PDH (Performance Data Helper) library. Unfortunately for Delphi programmers there is no import unit supplied with Delphi for the PDH library.

As mentioned above, managed applications can access performance counters using the System.Diagnostics.PerformanceCounter and System.Diagnostics.PerformanceCounterCategory classes.

Performance counters are categorised by the performance object that implements them. To access a performance counter in a performance object you refer to it as \object\counter or \\machine\object\counter (you can access performance counters on other machines).

The list of .NET-related performance counters available on my Windows 2000 machine running .NET 1.1 is as follows:

\.NET CLR Data\SqlClient: Current # pooled and nonpooled connections
\.NET CLR Data\SqlClient: Current # pooled connections
\.NET CLR Data\SqlClient: Current # connection pools
\.NET CLR Data\SqlClient: Peak # pooled connections
\.NET CLR Data\SqlClient: Total # failed connects
\.NET CLR Data\SqlClient: Total # failed commands
\.NET CLR Networking\Connections Established
\.NET CLR Networking\Bytes Received
\.NET CLR Networking\Bytes Sent
\.NET CLR Networking\Datagrams Received
\.NET CLR Networking\Datagrams Sent
\.NET CLR Memory\# Gen 0 Collections
\.NET CLR Memory\# Gen 1 Collections
\.NET CLR Memory\# Gen 2 Collections
\.NET CLR Memory\Promoted Memory from Gen 0
\.NET CLR Memory\Promoted Memory from Gen 1
\.NET CLR Memory\Gen 0 Promoted Bytes/Sec
\.NET CLR Memory\Gen 1 Promoted Bytes/Sec
\.NET CLR Memory\Promoted Finalization-Memory from Gen 0
\.NET CLR Memory\Promoted Finalization-Memory from Gen 1
\.NET CLR Memory\Gen 0 heap size
\.NET CLR Memory\Gen 1 heap size
\.NET CLR Memory\Gen 2 heap size
\.NET CLR Memory\Large Object Heap size
\.NET CLR Memory\Finalization Survivors
\.NET CLR Memory\# GC Handles
\.NET CLR Memory\Allocated Bytes/sec
\.NET CLR Memory\# Induced GC
\.NET CLR Memory\% Time in GC
\.NET CLR Memory\Not Displayed
\.NET CLR Memory\# Bytes in all Heaps
\.NET CLR Memory\# Total committed Bytes
\.NET CLR Memory\# Total reserved Bytes
\.NET CLR Memory\# of Pinned Objects
\.NET CLR Memory\# of Sink Blocks in use
\.NET CLR Interop\# of CCWs
\.NET CLR Interop\# of Stubs
\.NET CLR Interop\# of marshalling
\.NET CLR Interop\# of TLB imports / sec
\.NET CLR Interop\# of TLB exports / sec
\.NET CLR Exceptions\# of Exceps Thrown
\.NET CLR Exceptions\# of Exceps Thrown / sec
\.NET CLR Exceptions\# of Filters / sec
\.NET CLR Exceptions\# of Finallys / sec
\.NET CLR Exceptions\Throw To Catch Depth / sec
\.NET CLR Loading\Current Classes Loaded
\.NET CLR Loading\Total Classes Loaded
\.NET CLR Loading\Rate of Classes Loaded
\.NET CLR Loading\Current appdomains
\.NET CLR Loading\Total Appdomains
\.NET CLR Loading\Rate of appdomains
\.NET CLR Loading\Current Assemblies
\.NET CLR Loading\Total Assemblies
\.NET CLR Loading\Rate of Assemblies
\.NET CLR Loading\% Time Loading
\.NET CLR Loading\Assembly Search Length
\.NET CLR Loading\Total # of Load Failures
\.NET CLR Loading\Rate of Load Failures
\.NET CLR Loading\Bytes in Loader Heap
\.NET CLR Loading\Total appdomains unloaded
\.NET CLR Loading\Rate of appdomains unloaded
\.NET CLR LocksAndThreads\Total # of Contentions
\.NET CLR LocksAndThreads\Contention Rate / sec
\.NET CLR LocksAndThreads\Current Queue Length
\.NET CLR LocksAndThreads\Queue Length Peak
\.NET CLR LocksAndThreads\Queue Length / sec
\.NET CLR LocksAndThreads\# of current logical Threads
\.NET CLR LocksAndThreads\# of current physical Threads
\.NET CLR LocksAndThreads\# of current recognized threads
\.NET CLR LocksAndThreads\# of total recognized threads
\.NET CLR LocksAndThreads\rate of recognized threads / sec
\.NET CLR Jit\# of Methods Jitted
\.NET CLR Jit\# of IL Bytes Jitted
\.NET CLR Jit\Total # of IL Bytes Jitted
\.NET CLR Jit\IL Bytes Jitted / sec
\.NET CLR Jit\Standard Jit Failures
\.NET CLR Jit\% Time in Jit
\.NET CLR Jit\Not Displayed
\.NET CLR Remoting\Remote Calls/sec
\.NET CLR Remoting\Total Remote Calls
\.NET CLR Remoting\Channels
\.NET CLR Remoting\Context Proxies
\.NET CLR Remoting\Context-Bound Classes Loaded
\.NET CLR Remoting\Context-Bound Objects Alloc / sec
\.NET CLR Remoting\Contexts
\.NET CLR Security\Total Runtime Checks
\.NET CLR Security\% Time Sig. Authenticating
\.NET CLR Security\# Link Time Checks
\.NET CLR Security\% Time in RT checks
\.NET CLR Security\Stack Walk Depth

As you can see, the list is vast and varied and so there is a lot of scope for extracting useful information at selected points in your application run.

Note: performance counters are supported only on Windows NT-based platforms.

The Performance Monitor

Windows NT-based systems include a tool called the Performance Monitor, which can be accessed through the Control Panel and also by simply running perfmon.exe. The Performance Monitor is the Microsoft Management Console (MMC) with an instance of the System Monitor Control ActiveX embedded within it. The control allows you to select various performance counters and have their values dynamically plotted onto a graph.

When adding a counter to the graph you are presented with this dialog, listing all performance objects and all available counters within the selected object.

Once you have selected one or more objects the graph will start updating:

The Profiling API

Performance counters provide much information but it is difficult to relate the results with the program code unless you use them directly within your application. Sometimes you need more usable information from various applications without adding code to them and this is where the Profiling API comes in.

You can write a profiler that can be used against any arbitrary .NET application without that application knowing it is being profiled. The profiler operates on an event system instead of a polling system (as with the performance counters). As interesting things happen the profiler is immediately notified, and can therefore build up real-time analysis of the profilee (the application being profiled).

The provision of this API is a huge step forward by Microsoft in the effort they are putting into making development of useful tools and utilities much easier. Building profilers for normal Win32 applications is an extremely difficult task but the profiling API makes doing the same for .NET almost a triviality.

Profiling Basics

The first thing you need to appreciate with the Profiling API is that it is all based around COM interfaces. Whilst many pundits will have you believe that with the advent of .NET COM is dead and that if you do not yet know COM you should not bother learning it, .NET profilers must be built as in-process COM servers. This means we can write .NET profilers in any COM-capable unmanaged language such as Delphi.

The profiler must implement an interface defined by the Profiling API. This interface has a variety of methods that are triggered like events when interesting things happen. There are many of these events in the interface and the profiler specifies which groups of events it is interested in receiving. Doing this avoids having every event repeatedly triggered during the profiling session, which may heavily impact the application's performance.

A simple mechanism is employed to indicate to the CLR that profiling is required by a particular profiler. When managed applications are subsequently run the CLR loads the profiler COM server into the managed application's address space, is told which events to fire and proceeds to do so throughout the rest of that application's lifetime.

Note: .NET applications can be multi-threaded and so the profiler is treated as a free-threaded COM server. This means that you must ensure that it is written in a thread-safe manner.

Documentation

The regular .NET Framework SDK documentation has no coverage of the Profiling API because it is assumed that it is only of interest to tools developers who wish to build profiling tools. Certainly there are a growing number of commercial profilers made available from various tool vendors:

SDK documentation for tool developers can be located in a subdirectory below the main .NET Framework SDK directory, which itself may be a subdirectory of a Visual Studio.NET installation. The Framework SDK can be installed by itself or it can be installed implicitly by installing Visual Studio.NET and it ends up in different locations depending on what you did.

Either way, the subdirectory below the Framework SDK directory is called Tool Developers Guide (Reference 1), so you can use Windows Explorer to search for it if you have trouble locating it manually. Below Tool Developers Guide are two directories: docs and samples. As its name suggests docs is the one with the documentation.

There are three key files in this directory (all Microsoft Word documents) that will be of use in building fully functional profilers, each describing various COM interfaces. In order of merit they are:

These three documents all serve as references. To find advice on using the APIs you should look at Matt Pietrek's 2001 MSDN Magazine column (see Reference 2) or John Robbins' excellent debugging book (see Reference 3).

The Profiler Callback Interface

In Profiling.doc you will find the description of the callback interface ICorProfilerCallback, which must be implemented by each profiler. This contains 69 methods, each of which is an event, although 3 of the methods will never be called in .NET 1.x (they are there for use in future versions of .NET). Two of these events are special and are always called; the rest are called only if requested (and if the event actually occurs in the managed application).

ICorProfilerCallback is defined using IDL in the file corprof.idl (and also using C in corprof.h) in the Framework SDK include directory. You will find a translation of the interface into Delphi syntax in the unit corprof.pas in the files that accompany this paper. The job of building a profiler involves implementing this large interface and adding code into the appropriate event methods that you arrange to have called.

Here is the interface in full, so far as I have translated it. Notice that the safecall calling convention is employed to avoid having to mess about with HResult result values and also where appropriate the final out parameter has been turned into a function result value.


type
  ICorProfilerCallback = interface;
  PULONGArray = ^TULONGArray;
  TULONGArray = array[0..16387] of ULONG;
  ICorProfilerCallback = interface(IUnknown)
    ['{176FBED1-A55C-4796-98CA-A9DA0EF883E7}']
    // STARTUP/SHUTDOWN EVENTS
    procedure Initialize(const pICorProfilerInfoUnk: IUnknown); reintroduce; safecall;
    procedure Shutdown; safecall;
    // COR_PRF_MONITOR_APPDOMAIN_LOADS
    procedure AppDomainCreationStarted(appDomainId: ULONG); safecall;
    procedure AppDomainCreationFinished(appDomainId: ULONG; hrStatus: HResult); safecall;
    procedure AppDomainShutdownStarted(appDomainId: ULONG); safecall;
    procedure AppDomainShutdownFinished(appDomainId: ULONG; hrStatus: HResult); safecall;
    // COR_PRF_MONITOR_ASSEMBLY_LOADS
    procedure AssemblyLoadStarted(assemblyId: ULONG); safecall;
    procedure AssemblyLoadFinished(assemblyId: ULONG; hrStatus: HResult); safecall;
    procedure AssemblyUnloadStarted(assemblyId: ULONG); safecall;
    procedure AssemblyUnloadFinished(assemblyId: ULONG; hrStatus: HResult); safecall;
    // COR_PRF_MONITOR_MODULE_LOADS
    procedure ModuleLoadStarted(moduleId: ULONG); safecall;
    procedure ModuleLoadFinished(moduleId: ULONG; hrStatus: HResult); safecall;
    procedure ModuleUnloadStarted(moduleId: ULONG); safecall;
    procedure ModuleUnloadFinished(moduleId: ULONG; hrStatus: HResult); safecall;
    procedure ModuleAttachedToAssembly(moduleId: ULONG; assemblyId: ULONG); safecall;
    // COR_PRF_MONITOR_CLASS_LOADS
    procedure ClassLoadStarted(classId: ULONG); safecall;
    procedure ClassLoadFinished(classId: ULONG; hrStatus: HResult); safecall;
    procedure ClassUnloadStarted(classId: ULONG); safecall;
    procedure ClassUnloadFinished(classId: ULONG; hrStatus: HResult); safecall;
    // COR_PRF_MONITOR_FUNCTION_UNLOADS
    procedure FunctionUnloadStarted(functionId: ULONG); safecall; //currently inactive
    // COR_PRF_MONITOR_JIT_COMPILATION
    procedure JITCompilationStarted(functionId: ULONG; fIsSafeToBlock: Integer); safecall;
    procedure JITCompilationFinished(functionId: ULONG; hrStatus: HResult; fIsSafeToBlock: Integer); safecall;
    function JITInlining(callerId: ULONG; calleeId: ULONG): {fShouldInline} Bool; safecall;
    // COR_PRF_MONITOR_FUNCTION_UNLOADS
    procedure JITFunctionPitched(functionId: ULONG); safecall;
    // COR_PRF_MONITOR_CACHE_SEARCHES
    function JITCachedFunctionSearchStarted(functionId: ULONG): {bUseCachedFunction} Bool; safecall;
    procedure JITCachedFunctionSearchFinished(functionId: ULONG; AResult: COR_PRF_JIT_CACHE); safecall;
    // COR_PRF_MONITOR_THREADS
    procedure ThreadCreated(threadId: ULONG); safecall;
    procedure ThreadDestroyed(threadId: ULONG); safecall;
    procedure ThreadAssignedToOSThread(managedThreadId: ULONG; osThreadId: ULONG); safecall;
    // COR_PRF_MONITOR_REMOTING
    procedure RemotingClientInvocationStarted; safecall;
    procedure RemotingClientSendingMessage(const pCookie: TGuid; fIsAsync: Bool); safecall;
    procedure RemotingClientReceivingReply(const pCookie: TGuid; fIsAsync: Bool); safecall;
    procedure RemotingClientInvocationFinished; safecall;
    procedure RemotingServerReceivingMessage(const pCookie: TGuid; fIsAsync: Bool); safecall;
    procedure RemotingServerInvocationStarted; safecall;
    procedure RemotingServerInvocationReturned; safecall;
    procedure RemotingServerSendingReply(const pCookie: TGuid; fIsAsync: Bool); safecall;
    // COR_PRF_MONITOR_CODE_TRANSITIONS
    procedure UnmanagedToManagedTransition(functionId: ULONG; reason: COR_PRF_TRANSITION_REASON); safecall;
    procedure ManagedToUnmanagedTransition(functionId: ULONG; reason: COR_PRF_TRANSITION_REASON); safecall;
    // COR_PRF_MONITOR_SUSPENDS
    procedure RuntimeSuspendStarted(suspendReason: COR_PRF_SUSPEND_REASON); safecall;
    procedure RuntimeSuspendFinished; safecall;
    procedure RuntimeSuspendAborted; safecall;
    procedure RuntimeResumeStarted; safecall;
    procedure RuntimeResumeFinished; safecall;
    procedure RuntimeThreadSuspended(threadId: ULONG); safecall;
    procedure RuntimeThreadResumed(threadId: ULONG); safecall;
    // COR_PRF_MONITOR_GC
    procedure MovedReferences(cMovedObjectIDRanges: ULONG; var oldObjectIDRangeStart, newObjectIDRangeStart, cObjectIDRangeLength: TULONGArray); safecall;
    procedure ObjectsAllocatedByClass(cClassCount: ULONG; var classIds, cObjects: TULONGarray); safecall;
    procedure ObjectReferences(objectId: ULONG; classId: ULONG; cObjectRefs: ULONG; var objectRefIds: TULONGArray); safecall;
    procedure RootReferences(cRootRefs: ULONG; var rootRefIds: TULONGArray); safecall;
    // COR_PRF_MONITOR_OBJECT_ALLOCATED
    procedure ObjectAllocated(objectId, classId: ULONG); safecall;
    // COR_PRF_MONITOR_EXCEPTIONS
    procedure ExceptionThrown(thrownObjectId: ULONG); safecall;
    procedure ExceptionSearchFunctionEnter(functionId: ULONG); safecall;
    procedure ExceptionSearchFunctionLeave; safecall;
    procedure ExceptionSearchFilterEnter(functionId: ULONG); safecall;
    procedure ExceptionSearchFilterLeave; safecall;
    procedure ExceptionSearchCatcherFound(functionId: ULONG); safecall;
    procedure ExceptionOSHandlerEnter(__unused: ULONG); safecall; //currently inactive
    procedure ExceptionOSHandlerLeave(__unused: ULONG); safecall; //currently inactive
    procedure ExceptionUnwindFunctionEnter(functionId: ULONG); safecall;
    procedure ExceptionUnwindFunctionLeave; safecall;
    procedure ExceptionUnwindFinallyEnter(functionId: ULONG); safecall;
    procedure ExceptionUnwindFinallyLeave; safecall;
    procedure ExceptionCatcherEnter(functionId: ULONG; objectId: ULONG); safecall;
    procedure ExceptionCatcherLeave; safecall;
    // COR_PRF_MONITOR_CLR_EXCEPTIONS
    procedure ExceptionCLRCatcherFound; safecall;
    procedure ExceptionCLRCatcherExecute; safecall;
    // COR_PRF_MONITOR_CCW
    procedure COMClassicVTableCreated(wrappedClassId: ULONG; const implementedIID: TGuid; pVTable: Pointer; cSlots: ULONG); safecall;
    procedure COMClassicVTableDestroyed(wrappedClassId: ULONG; const implementedIID: TGuid; pVTable: Pointer); safecall;
  end;

The special event methods are Initialize and Shutdown. Initialize is called as soon as the CLR has initialised and loaded up the profiler, and it is where you request the event categories you wish to be triggered throughout the program run. Shutdown is called as the CLR is closing down during application termination.

Initialize is passed an IUnknown reference to another interface as a parameter, this one being ICorProfilerInfo (also defined in corprof.idl, corprof.h and corprof.pas). This interface (like all of them except the callback interface) is implemented in the CLR and is there to help you get more information when various events trigger, offering you 33 methods to choose from. You should save this reference away so it can be used in the event methods that are subsequently called. For example:


procedure TSomeProfiler.Initialize(const pICorProfilerInfoUnk: IUnknown);
begin
  CorProfilerInfo := pICorProfilerInfoUnk as ICorProfilerInfo;
end;

Note: both ICorProfilerCallback and ICorProfilerInfo are documented in Profiling.doc.

Profiling Event Categories

The various categories of events that can be requested are identified by bitmask constants defined in corprof.pas, which can be combined together using the OR operator:

Category Constant from corprof.pas Immutable
AppDomains being loaded/unloaded COR_PRF_MONITOR_APPDOMAIN_LOADS No
Assemblies being loaded/unloaded COR_PRF_MONITOR_ASSEMBLY_LOADS No
Modules being loaded/unloaded COR_PRF_MONITOR_MODULE_LOADS No
Classes being loaded/unloaded COR_PRF_MONITOR_CLASS_LOADS No
Thread creation/destruction COR_PRF_MONITOR_THREADS No
Precompiled function location COR_PRF_MONITOR_CACHE_SEARCHES No
JITted functions being unloaded COR_PRF_MONITOR_FUNCTION_UNLOADS No
JIT compilation & inlining COR_PRF_MONITOR_JIT_COMPILATION No
Functions being called and exiting COR_PRF_MONITOR_ENTERLEAVE No
Exception generation, searching & handling COR_PRF_MONITOR_EXCEPTIONS No
CLR exception handling COR_PRF_MONITOR_CLR_EXCEPTIONS No
CLR suspensions/resumptions COR_PRF_MONITOR_SUSPENDS No
Garbage collection COR_PRF_MONITOR_GC Yes
Object allocation COR_PRF_MONITOR_OBJECT_ALLOCATED No
Remoting COR_PRF_MONITOR_REMOTING Yes
Managed/unmanaged code transitions COR_PRF_MONITOR_CODE_TRANSITIONS Yes
CCW proxy creation/destruction COR_PRF_MONITOR_CCW No
All events COR_PRF_MONITOR_ALL No

You initially request the appropriate event categories in ICorProfilerCallback.Initialize by passing the required flags along to the ICorProfilerInfo.SetEventMask method. The Immutable column in the table above indicates whether you can dynamically enable and disable the event category notification arbitrarily at runtime; as you can see certain events can only be requested upon profiler initialisation.

Additional constants are available that control the managed code capabilities, all of which are immutable:

Constant Meaning
COR_PRF_ENABLE_REJIT Enables ability to force methods to be re-JITted via ICorProfilerInfo.SetFunctionReJIT
COR_PRF_ENABLE_INPROC_DEBUGGING Enables use of in-process debugging interfaces
COR_PRF_ENABLE_JIT_MAPS Tells CLR to maintain Native to IL maps
COR_PRF_ENABLE_OBJECT_ALLOCATED Enable use of COR_PRF_MONITOR_OBJECT_ALLOCATED category
COR_PRF_DISABLE_INLINING Globally disables JIT from using method inlining
COR_PRF_DISABLE_OPTIMIZATIONS Disables JIT code optimisations

As an example, to request events that indicate when objects get allocated you might use code like this:


procedure TSomeProfiler.Initialize(const pICorProfilerInfoUnk: IUnknown);
begin
  CorProfilerInfo := pICorProfilerInfoUnk as ICorProfilerInfo;
  CorProfilerInfo.SetEventMask(
    COR_PRF_ENABLE_OBJECT_ALLOCATED or COR_PRF_MONITOR_OBJECT_ALLOCATED);
end;

Building A Base Profiler Class

To avoid having to implement each of the 69 ICorProfilerCallback methods in every profiler it makes sense to implement a base class to do this. Descendant classes can them implement only the event methods that are important for them, making the actual job of building real profilers a little easier.

Additionally the base class should do any rote work that is needed for every profiler, such as saving the ICorProfilerInfo interface passed into the Initialize method. However doing this in the base class would put a requirement on every profiler to call the inherited Initialize method each time they add extra start-up code in their own overridden Initialize method. To reduce the risk of forgetting to do this the base class introduces two virtual methods, OnInitialize and OnShutDown, which are called by Initialize and Shutdown respectively. New descendant classes override these OnXXXX methods and have no obligation to do anything in particular in them.

A base profiler class called TBaseProfiler is supplied in the unit BaseProfiler.pas. As well as what has so far been described the class declares a DWord field called EventMask, initialised to COR_PRF_MONITOR_NONE. Descendant classes assign appropriate event category constants to this field in OnInitialize and the base class proceeds to request these events after OnInitialize returns.

Additionally TBaseProfiler declares a Windows critical section, CritSec, which is initialised in Initialize and deleted in Shutdown. This can be used by profilers to ensure thread safety in their event methods (remember that .NET profilers are free-threaded COM objects) and this in itself is made easier by the helper methods SyncEnter and SyncExit.

Note: the Profiling API documentation states that there are some methods it is unsafe to block in (i.e. hold up the execution by performing a lengthy process or waiting for something to happen). Generally the advice is not to call into managed code or allocate managed heap when this rule is in place, so waiting for a critical section to become available should pose no problem.


type
  TBaseProfiler = class(TComObject, ICorProfilerCallback)
  protected
    CritSec: TRTLCriticalSection;
    CorProfilerInfo: ICorProfilerInfo;
    EventMask: DWord;

    //Helper methods
    procedure OnInitialize; virtual;
    procedure OnShutDown; virtual;
    procedure SyncEnter;
    procedure SyncExit;

    //ICorProfilerCallback interface methods

    // STARTUP/SHUTDOWN EVENTS
    procedure Initialize(const pICorProfilerInfoUnk: IUnknown); reintroduce; safecall;
    procedure Shutdown; safecall;
    ...
  end;
...
procedure TBaseProfiler.OnInitialize;
begin
end;

procedure TBaseProfiler.OnShutDown;
begin
end;

procedure TBaseProfiler.SyncEnter;
begin
  EnterCriticalSection(CritSec);
end;

procedure TBaseProfiler.SyncExit;
begin
  LeaveCriticalSection(CritSec);
end;

procedure TBaseProfiler.Initialize(const pICorProfilerInfoUnk: IUnknown);
begin
{$ifdef NOISE_ON_START_AND_STOP}
  MessageBeep(Cardinal(-1));
  Sleep(100);
  MessageBeep(Cardinal(-1));
{$endif}
  CorProfilerInfo := pICorProfilerInfoUnk as ICorProfilerInfo;
  InitializeCriticalSectionAndSpinCount(CritSec, 5000);
  EventMask := COR_PRF_MONITOR_NONE;
  OnInitialize;
  CorProfilerInfo.SetEventMask(EventMask);
end;

procedure TBaseProfiler.Shutdown;
begin
  OnShutDown;
  DeleteCriticalSection(CritSec);
  CorProfilerInfo := nil;
{$ifdef NOISE_ON_START_AND_STOP}
  MessageBeep(Cardinal(-1));
  Sleep(100);
  MessageBeep(Cardinal(-1));
{$endif}
end;
...

You can see in this section of the class that the Initialize and Shutdown methods have conditionally compiled code to make a little double-beep noise as they execute. This gives us a trivial mechanism by which we can tell if a profiler has been loaded, neatly sidestepping the requirement to debug it and set breakpoints. This debug-by-ear technique is used by a number of low-level developers and is promoted by John Robbins (see Reference 3).

The base class actually has more up its metaphorical sleeve, but this is enough for us to get going with building and testing some sample profilers.

Building A Trivial Profiler

You can build a new .NET profiler by making an in-proc COM server project (described in the Delphi IDE as an ActiveX Library, on the ActiveX page of the File | New | Other... dialog). Add the BaseProfiler.pas unit to the library project and then make use of the TemplateProfiler.pas unit as a template to create new profilers from. The unit looks like this:


unit TemplateProfiler;

interface

uses
  Windows, ActiveX, Classes, ComObj, BaseProfiler, CorProf;

type
  TTemplateProfiler = class(TBaseProfiler, ICorProfilerCallback)
  protected
    procedure OnInitialize; override;
    procedure OnShutDown; override;
  end;

const
  //Update these constants for each new profiler
  ProfilerCoClassName = 'TemplateProfiler';
  ProfilerDescription = 'Template profiler';
  ProfilerGUID: TGUID = '{57518450-308C-44D1-B4D1-C67491ABC427}';
  ProfilerClass: TComClass = TTemplateProfiler;

implementation

uses
  ComServ;

procedure TTemplateProfiler.OnInitialize;
begin
  EventMask := COR_PRF_MONITOR_NONE;
end;

procedure TTemplateProfiler.OnShutDown;
begin

end;

initialization
  TComObjectFactory.Create(ComServer, ProfilerClass, ProfilerGUID,
    ProfilerCoClassName, ProfilerDescription, ciMultiInstance, tmFree);
end.

Add it to the project and then save it with a new name (to avoid messing up the original template, leaving it available for future use). Then follow these steps:

  1. Change the name of the class. This is easily done with the Code Explorer; locate the class name, right-click on it and choose Rename.

  2. Modify the ProfilerCoClassName constant. Note that this constant becomes the second half of this COM object's ProgID, the first half being the project name. For example in a project called Profiler.dll, setting ProfilerCoClassName to MyProfiler yields a ProgID of Profiler.MyProfiler. This fact is mentioned here as you will likely be using the profiler's ProgID later to enable its use.

  3. Modify the ProfilerDescription constant to briefly describe the profiler.

  4. Modify the ProfilerGUID constant to be a new, unique GUID to identify the COM object by. You can easily get a GUID using Ctrl+Shift+G in the Delphi editor, but it will need to be tweaked to have the correct syntax by removing the square brackets that will surround it.

  5. Modify the ProfilerClass constant to reflect the profiler's class type.

These constants are used in the initialisation section by the COM object class factory and are required for successful operation.

You are now free to set up an appropriate event mask and start overriding the event methods that will be triggered by the .NET CLR. To give you the idea the supplied SampleProfiler.pas unit contains the TSampleProfiler class, which methods look like this:


type
  TSampleProfiler = class(TBaseProfiler, ICorProfilerCallback)
  protected
    procedure OnInitialize; override;
    procedure OnShutDown; override;
    procedure ClassLoadStarted(classId: ULONG); safecall;
  end;

const
  //Update these constants for each new profiler
  ProfilerCoClassName = 'SampleProfiler';
  ProfilerDescription = 'Sample profiler';
  ProfilerGUID: TGUID = '{34F20DB8-FA3C-4356-945D-5D7819E81C8B}';
  ProfilerClass: TComClass = TSampleProfiler;
...
procedure TSampleProfiler.OnInitialize;
begin
  EventMask := COR_PRF_MONITOR_CLASS_LOADS;
end;

procedure TSampleProfiler.OnShutDown;
begin
  //Nothing to do
end;

procedure TSampleProfiler.ClassLoadStarted(classId: ULONG);
begin
  SyncEnter;
  Beep(200 + Random(300), 20);
  SyncExit
end;

When a .NET application is executed every class that is loaded from an assembly by the CLR causes a short, random frequency noise to occur. If you were to load a large .NET application, such as the SharpDevelop development environment, you will get a lengthy static-like noise from your speaker. You can get the same behaviour with both Microsoft Visual Studio.NET and Borland C#Builder, despite them not being .NET applications. As soon as you open or create a C# application these environments host a CLR session, since the form designer and Object Inspector are .NET objects.

Of course this sample code is all well and good but we do not yet have the information we need to enable the profiler and test it out.

Enabling The Profiler

When a managed application starts up, the .NET CLR checks the values of a couple of environment variables to see if a profiler should be loaded up into the process address space before proceeding. These variables are mentioned on the single page in the regular Framework SDK documentation that refers to the Profiling API (the Enabling Profiling page in the Debugging and Profiling Applications section).

To inform the CLR that profiling is required you set the COR_ENABLE_PROFILING environment variable with a value of 1. To indicate which profiler should be loaded you set COR_PROFILER to either the ProgID or the ClassID of the profiler object.

To profile a single application you can set these variables from a command prompt and then launch your application from there. It will then be profiled, as will any other .NET applications you launch in the same session (unless you set COR_ENABLE_PROFILING to 0 or to a blank value).

For example to enable the sample profiler from above using its ProgID, and profile a trivial Hello World type application, you could do this:

To enable it using its ClassID you could do this:

To enable profiling of all .NET applications you could set the environment variables globally using the System Properties dialog. You can access this from Control Panel (the System applet), your Windows desktop (right-click on My Computer and select Properties, or select My Computer and press Alt+Enter) or from anywhere by using the Windows key in conjunction with Break or Pause (Windows+Break).

On the Advanced page there is an Environment Variables... button that lets you add variables for all user accounts (the System variables group) or for the current user account (the User variables group).

Note: in order to profile ASP.NET applications, which are loaded into IIS whose environment would be very tricky to modify, you are obliged to set global environment variables to enable profiling.

When you use global environment variables every .NET application will be profiled by your profiler. You can alter this default behaviour using various schemes. For example you could have your profiler use an INI file that lists the applications it should profile. When the profiler starts up it can examine the process it is loaded into and compare it with the list. If it is loaded into an application that does not need profiling then it can leave the EventMask profiler field set to COR_PRF_MONITOR_NONE, thereby making its presence have no effect on the application.

You could alternatively use registry settings or even your own environment variable to achieve the same effect; environment variables can be read using the GetEnvironmentVariable routine in the SysUtils unit (added in Delphi 6) or the GetEnvironmentVariable Win32 API.

Testing And Debugging Profilers In Delphi

When developing a .NET profiler in Delphi you will naturally need to debug it from time to time. This can be accomplished quite straightforwardly in the same way you debug any normal DLL but there a couple of caveats (we'll look into those in a moment).

The application you wish to profile should be entered as the Host Application in the Run | Parameters... dialog.

When you press F9 (or choose Run | Run) in the IDE, the managed executable will be launched and will load your profiler into its address space. However that will only happen if the program has the two key profiling environment variables set in its environment.

You can ensure this is the case by either setting the variables globally or by setting them in a command prompt that is then used to manually launch the IDE. The IDE will get those environment variables in its own environment and will pass them onto the program being debugged.

However a better way is to use the new Environment Variables page in the environment options dialog (added in Delphi 6). This lets you specify a custom environment for applications launched from the IDE, which is a much simpler way to get what we require.

With that issue dealt with we can now look at the other one. A problem with the Delphi debugger means that typically when you wish to debug an in-process COM server loaded into a host application your initial attempts will be destined to complete failure. For some reason the debugger insists on you setting the Include remote debug symbols option on the Linker page of the project options dialog. This causes the linker to generate a big symbol file (named after your project with an .rsm extension) and then breakpoints, stepping and so on work fine in the IDE.

Profiler Event ID Parameters

When building anything other than a trivial profiler you will want more information from each event method that gets called. If you examine the various methods in ICorProfilerCallback you will see that most of them receive some form of ID parameter that identifies the item to which the event relates:

All these ID values are actually the memory addresses of the item in question. However since the layout of each item is undocumented they should be treated as opaque handles that identify an individual class or object or module etc.

The ICorProfilerInfo interface has various methods that can be used to turn an ID into something more useful such as a name, and the TBaseProfiler class implements a variety of helper methods that will do the hard work of translating various IDs into textual names. The job of translating appDomainIds, assemblyIds and moduleIds into names is reasonably straightforward thanks to the ICorProfilerInfo interface methods GetAppDomainInfo, GetAssemblyInfo and GetModuleInfo.


function TBaseProfiler.GetAppDomainFromAppDomainId(AppDomainId: ULONG;
  var AppDomainName: WideString): Boolean;
var
  CopiedChars, ProcessID: Cardinal;
  CAppDomainName: array[0..256] of WideChar;
begin
  Result := False;
  try
    CorProfilerInfo.GetAppDomainInfo(AppDomainId, SizeOf(CAppDomainName),
      CopiedChars, CAppDomainName, ProcessID);
    if CopiedChars < SizeOf(CAppDomainName) then
    begin
      AppDomainName := CAppDomainName;
      Result := True
    end
  except
  end
end;

function TBaseProfiler.GetAssemblyFromAssemblyId(AssemblyId: ULONG;
  var AssemblyName: WideString): Boolean;
var
  CopiedChars, AppDomainID, ModuleID: Cardinal;
  CAssemblyName: array[0..256] of WideChar;
begin
  Result := False;
  try
    CorProfilerInfo.GetAssemblyInfo(AssemblyId, SizeOf(CAssemblyName),
      CopiedChars, CAssemblyName, AppDomainID, ModuleID);
    if CopiedChars < SizeOf(CAssemblyName) then
    begin
      AssemblyName := CAssemblyName;
      Result := True
    end
  except
  end
end;

function TBaseProfiler.GetModuleFromModuleId(ModuleId: ULONG;
  var ModuleName: WideString): Boolean;
var
  CopiedChars: Cardinal;
  CModuleName: array[0..MAX_PATH] of WideChar;
begin
  Result := False;
  try
    CorProfilerInfo.GetModuleInfo(ModuleId, nil, SizeOf(CModuleName),
      @CopiedChars, CModuleName, nil);
    if CopiedChars < SizeOf(CModuleName) then
    begin
      ModuleName := CModuleName;
      Result := True
    end
  except
  end
end;

Managed threads don't have names so there is nothing much you can do with a threadId so the remaining translation task involves finding the class name from a classid or an objected, and finding the class and method names from a functionId.

Unfortunately the ICorProfilerInfo interface doesn't offer enough to complete these jobs on its own. The best it can offer for the classId is GetClassIDInfo, which will give you the moduleId of the module containing the class as well as a metadata token for the class. For an objectId it does offer the GetClassFromObject method that returns the classId for the object's class type, but again you can only get a metadata token from that. In the case of the functionId it offers GetTokenAndMetaDataFromFunction, which returns a metadata token for the method.

The Unmanaged Metadata API

In order to make use of these metadata tokens we need to resort to another API, the Unmanaged Metadata API. In particular we need to IMetaDataImport interface from cor.h (translated in cor.pas), which is documented in Metadata Unmanaged API.doc. IMetaDataImport allows you to manipulate metadata in a managed assembly with its 62 methods.

As well as supplying a metadata token for the method, the ICorProfilerInfo.GetTokenAndMetaDataFromFunction method returns an IMetaDataImport interface. You can then use IMetaDataImport.GetMethodProps to readily turn the metadata token into the method name.

In the case of finding a class name there is an extra step. ICorProfilerInfo.GetClassIDInfo gives you a class metadata token and a moduleId but no IMetaDataImport interface. For that you need to pass the moduleId into ICorProfilerInfo.GetModuleMetaData. With the returned IMetaDataImport interface you can then call its GetTypeDefProps method to extract the class name.


function TBaseProfiler.GetClassAndMethodFromFunctionId(FunctionId: ULONG;
  var ClassName, MethodName: WideString): Boolean;
var
  MethodMetaToken, ClassMetaToken, CopiedChars: Cardinal;
  MetaDataImport: IMetaDataImport;
  CMethodName, CClassName: array[0..256] of WideChar;
begin
  Result := False;
  try
    CorProfilerInfo.GetTokenAndMetaDataFromFunction(
      FunctionId, IMetaDataImport, IUnknown(MetaDataImport), MethodMetaToken);
    MetaDataImport.GetMethodProps(MethodMetaToken, @ClassMetaToken,
      CMethodName, SizeOf(CMethodName), @CopiedChars,
      nil, nil, nil, nil, nil);
    if CopiedChars < SizeOf(CMethodName) then
    begin
      MethodName := CMethodName;
      MetaDataImport.GetTypeDefProps(ClassMetaToken,
        CClassName, SizeOf(CClassName), @CopiedChars, nil, nil);
      if CopiedChars < SizeOf(CClassname) then
      begin
        ClassName := CClassName;
        Result := True
      end
    end
  except
  end
end;

function TBaseProfiler.GetClassFromClassId(ClassId: ULONG;
  var ClassName: WideString): Boolean;
var
  ClassMetaToken, ModID: Cardinal;
  MetaDataImport: IMetaDataImport;
  CopiedChars: Cardinal;
  CClassName: array[0..256] of WideChar;
begin
  Result := False;
  try
    CorProfilerInfo.GetClassIDInfo(ClassID, ModID, ClassMetaToken);
    MetaDataImport := IMetaDataImport(CorProfilerInfo.GetModuleMetaData(
      ModId, ofRead, IMetaDataImport));
    MetaDataImport.GetTypeDefProps(ClassMetaToken,
      CClassName, SizeOf(CClassName), @CopiedChars, nil, nil);
    if CopiedChars < SizeOf(CClassName) then
    begin
      ClassName := CClassName;
      Result := True
    end
  except
  end
end;

function TBaseProfiler.GetClassFromObjectId(ObjectId: ULONG;
  var ClassName: WideString): Boolean;
var
  ClassID: Cardinal;
begin
  Result := False;
  try
    ClassID := CorProfilerInfo.GetClassFromObject(ObjectID);
    Result := GetClassFromClassId(ClassID, ClassName);
  except
  end
end;

You might notice that all of these ID translations methods have an exception handler to hide any exceptions that might occur. The reason for this is that the Profiling API interface methods are all declared using the safecall calling convention and so if any error-indicating HResult is passed back they will raise an EOleSysError exception with the HResult stored as the exception's ErrorCode property.

If you try and translate one of the IDs into a name before the CLR has fully loaded its information you will indeed get a bad HResult causing an exception so the code hides these problems. In particular, trying to get the name of an assembly when it has started loading, but before it has finished loading will generate error CORPROF_E_DATAINCOMPLETE or $80131351 (the ID is not fully loaded/defined yet).

You should take note that there are a variety of possible HResult error return codes that could cause exceptions to occur in your profiler code, so be warned! You can find a list of the definitions of these errors in the C header file CorError.h (translated into cor.pas):


const
  //**** Common Language Runtime Profiling Services errors
  // Function not yet compiled.
  CORPROF_E_FUNCTION_NOT_COMPILED = $80131350;
  // The ID is not fully loaded/defined yet.
  CORPROF_E_DATAINCOMPLETE        = $80131351;
  // The Module is not configured for updateable methods.
  CORPROF_E_NOT_REJITABLE_METHODS = $80131352;
  // The Method could not be updated for re-jit.
  CORPROF_E_CANNOT_UPDATE_METHOD  = $80131353;
  // The Method has no associated IL
  CORPROF_E_FUNCTION_NOT_IL       = $80131354;
  // The thread has never run managed code before
  CORPROF_E_NOT_MANAGED_THREAD    = $80131355;
  // The function may only be called during profiler init
  CORPROF_E_CALL_ONLY_FROM_INIT   = $80131356;
  // Inprocess debugging must be enabled during init
  // Also returned when BeginInprocDebugging not called
  // before using the inprocess debugging services
  CORPROF_E_INPROC_NOT_ENABLED    = $80131357;
  // Can't get a JIT map becuase they are not enabled
  CORPROF_E_JITMAPS_NOT_ENABLED   = $80131358;
  // If a profiler tries to call BeginInprocDebugging more than
  // once, it will get this error.
  CORPROF_E_INPROC_ALREADY_BEGUN  = $80131359;
  // States that inprocess debugging not allowed at this point
  // (for example during GC callbacks or RuntimeSuspention callbacks
  CORPROF_E_INPROC_NOT_AVAILABLE  = $8013135A;
  // This is a general error used to indicated that the information
  CORPROF_E_NOT_YET_AVAILABLE     = $8013135B;

A Logging Profiler

A more realistic sample profiler is supplied as TLoggingProfiler in LoggingProfiler.pas. This implements most of the profiling event methods, turns the ID parameter into something more useful and logs a string describing the event. Here is a selection of the event methods from this profiler:


procedure TLoggingProfiler.OnInitialize;
begin
  NestingLevel := 0;
  EventMask := COR_PRF_ENABLE_REJIT or COR_PRF_DISABLE_INLINING;
                                    or COR_PRF_MONITOR_FUNCTION_UNLOADS;
                                    or COR_PRF_MONITOR_CLASS_LOADS;
                                    or COR_PRF_MONITOR_MODULE_LOADS;
                                    or COR_PRF_MONITOR_ASSEMBLY_LOADS;
                                    or COR_PRF_MONITOR_APPDOMAIN_LOADS;
                                    or COR_PRF_MONITOR_JIT_COMPILATION;
                                    or COR_PRF_MONITOR_GC;
                                    or COR_PRF_ENABLE_OBJECT_ALLOCATED;
                                    or COR_PRF_MONITOR_OBJECT_ALLOCATED;
                                    or COR_PRF_MONITOR_THREADS;
                                    or COR_PRF_MONITOR_CODE_TRANSITIONS;
                                    or COR_PRF_MONITOR_ENTERLEAVE;
                                    or COR_PRF_MONITOR_CCW;
                                    or COR_PRF_MONITOR_SUSPENDS;
                                    or COR_PRF_MONITOR_CACHE_SEARCHES;
  Log('CLR has started')
end;
...
procedure TLoggingProfiler.AppDomainCreationStarted(appDomainId: ULONG);
var
  AppDomainName: WideString;
begin
  SyncEnter;
  if GetAppDomainFromAppDomainId(appDomainId, AppDomainName) then
    Log('AppDomain creation started,  name: %s', [AppDomainName])
  else
    Log('AppDomain creation started,  ID: $%x', [appDomainID]);
  Inc(NestingLevel);
  SyncExit  
end;
...
procedure TLoggingProfiler.AssemblyLoadFinished(assemblyId: ULONG;
  hrStatus: HResult);
var
  AssemblyName: WideString;
begin
  SyncEnter;
  Dec(NestingLevel);
  if GetAssemblyFromAssemblyId(assemblyId, AssemblyName) then
    Log('Assembly load finished, name: %s, status: $%x', [AssemblyName, hrStatus])
  else
    Log('Assembly load finished, ID: $%x, status: $%x', [assemblyID, hrStatus]);
  SyncExit    
end;
...
procedure TLoggingProfiler.ClassLoadStarted(classId: ULONG);
var
  ClassName: WideString;
begin
  SyncEnter;
  if GetClassFromClassId(classId, ClassName) then
    Log('Class load started,  name: %s', [ClassName])
  else
    Log('Class load started,  ID: $%x', [classID]);
  Inc(NestingLevel);
  SyncExit  
end;
...
procedure TLoggingProfiler.JITCompilationFinished(functionId: ULONG;
  hrStatus: HResult; fIsSafeToBlock: Integer);
var
  ClassName, MethodName: WideString;
begin
  SyncEnter;
  Dec(NestingLevel);
  if GetClassAndMethodFromFunctionId(functionID, ClassName, MethodName) then
    Log('JIT compilation finished, name: %s.%s, status: $%x',
      [ClassName, MethodName, hrStatus])
  else
    Log('JIT compilation finished, ID: $%x, status: $%x', [functionID, hrStatus]);
  SyncExit    
end;
...
procedure TLoggingProfiler.RuntimeSuspendStarted(
  suspendReason: COR_PRF_SUSPEND_REASON);
var
  Msg: String;
begin
  SyncEnter;
  Msg := 'CLR suspend started for ';
  case suspendReason of
    COR_PRF_SUSPEND_OTHER: Msg := Msg + 'some reason';
    COR_PRF_SUSPEND_FOR_GC: Msg := Msg + 'garbage collection';
    COR_PRF_SUSPEND_FOR_APPDOMAIN_SHUTDOWN: Msg := Msg + 'an AppDomain shutdown';
    //not implemented for the v1 JITter
    COR_PRF_SUSPEND_FOR_CODE_PITCHING: Msg := Msg + 'code pitching';
    COR_PRF_SUSPEND_FOR_SHUTDOWN: Msg := Msg + 'shutdown';
    COR_PRF_SUSPEND_FOR_INPROC_DEBUGGER: Msg := Msg + 'in-process debugging';
    COR_PRF_SUSPEND_FOR_GC_PREP: Msg := Msg + 'garbage collection preparation';
  end;
  Log(Msg);
  Inc(NestingLevel);
  SyncExit
end;
...
procedure TLoggingProfiler.ObjectsAllocatedByClass(cClassCount: ULONG;
  var classIds, cObjects: TULONGarray);
begin
  SyncEnter;
  Log('Classes of objects in generation 0 = %d', [cClassCount]);
  SyncExit
end;

Logging Output

You can see by examining the content of some of the above methods that the breadth of information supplied to profilers is wide and varied. You might also note that all the methods log their event details using the Log method. Log is another helper method (actually a pair of overloaded methods) in the base profiler class TBaseProfiler. It currently operates by building up a string using Format and passing it to the OutputDebugString API but this could readily be changed to support logging to a file, for example.

The methods also make use of a NestingLevel variable to try and indicate events that occur during some other operation. When, say, an assembly starts being loaded the nesting level is incremented (i.e. the output will be indented) and when the assembly stops loading the nesting level is decremented (unindented).

Debug strings generated by a program running within a debugger will be picked up by the debugger. In the case of the Delphi IDE you must ensure you have the Output messages option is enabled on the Event log page of the debugger setting dialog (Tools | Debugger Options...).

If it is set then you will see the messages in the Event Log:

You can just about see the nesting going on in the Event Log window, but it is a bit difficult due to the font used. You also tend to get a bit of "noise" in the Event Log; each debug string is prefixed with ODS (as in OutputDebugString) and suffixed with the information about the process name and process ID.

When running an application either outside the IDE, or from within Delphi but with IDE Debugging disabled (see Reference 4 for an article that shows how to reveal a hidden menu item to do this automatically) you can still keep an eye on debug strings using DebugView, one of a number of great system utilities available from http://www.sysinternals.com.

Now you can see the nesting a lot more clearly. The implementation of the Log methods is reasonably straightforward as shown below. The code shows that the nesting level value is doubled and that many spaces are placed before the output message in the debug string. Also, to ensure DebugView picks up the strings correctly the code is careful to append a line feed character to every one of them.


procedure TBaseProfiler.Log(const Msg: String);
var
  LogMsg: String;
begin
  LogMsg := StringOfChar(#32, NestingLevel * 2) + Msg + #10;
  OutputDebugString(PChar(LogMsg))
end;

procedure TBaseProfiler.Log(const Msg: String; Args: array of const);
begin
  Log(Format(Msg, Args))
end;

Handling Multiple Managed Threads

Since .NET applications have multiple threads (for example the garbage collector uses a separate thread to run object finalizers on, as detailed in Reference 5) it is useful to be able to identify events happening on different managed threads. In this example I took steps to ensure that each thread could record its own log output nesting level. The simple approach taken in this case is to have the NestingLevel variable declared in thread-local storage (TLS):


threadvar
  NestingLevel: Integer;

This means that no matter how many different threads call into the profiler each one has their own copy of the nesting level variable.

Note: this approach of using TLS in the profiler object to support the different managed threads works fine in .NET 1.x, since there is a one to one correspondence between a managed thread and a Win32 thread; any managed thread will run on exactly one Win32 OS thread. However in future versions of .NET this is unlikely to be the case, which is why there is an event called ThreadAssignedToOSThread.

Currently this event is called exactly once for each managed thread, but in the future it may be called several times. The upshot of all this is that the threadvar solution in the sample is a simple, short-term option that will fail to work correctly in future .NET releases.

For a long-term solution you should allocate some data structure for each managed thread during the ThreadCreated event and dispose of it during the ThreadDestroyed event. You can then use ICorProfilerInfo.GetCurrentThreadID to identify which managed thread is active at any given time.

An Exception Tracking Profiler

Another profiler is supplied as TExceptionProfiler in ExceptionProfiler.pas. This example logs details about all exceptions that occur in a .NET application using all the available exception-related events. Here is a portion of the class:


type
  TExceptionProfiler = class(TBaseProfiler, ICorProfilerCallback)
  protected
    // STARTUP/SHUTDOWN EVENTS
    procedure OnInitialize; override;
    procedure OnShutDown; override;
    // COR_PRF_MONITOR_EXCEPTIONS
    procedure ExceptionThrown(thrownObjectId: ULONG); safecall;
    procedure ExceptionSearchFunctionEnter(functionId: ULONG); safecall;
    procedure ExceptionSearchFunctionLeave; safecall;
    procedure ExceptionSearchFilterEnter(functionId: ULONG); safecall;
    procedure ExceptionSearchFilterLeave; safecall;
    procedure ExceptionSearchCatcherFound(functionId: ULONG); safecall;
    procedure ExceptionUnwindFunctionEnter(functionId: ULONG); safecall;
    procedure ExceptionUnwindFunctionLeave; safecall;
    procedure ExceptionUnwindFinallyEnter(functionId: ULONG); safecall;
    procedure ExceptionUnwindFinallyLeave; safecall;
    procedure ExceptionCatcherEnter(functionId: ULONG; objectId: ULONG); safecall;
    procedure ExceptionCatcherLeave; safecall;
    // COR_PRF_MONITOR_CLR_EXCEPTIONS
    procedure ExceptionCLRCatcherFound; safecall;
    procedure ExceptionCLRCatcherExecute; safecall;
  end;

const
  ProfilerCoClassName = 'ExceptionProfiler';
  ProfilerDescription = 'Exception profiler';
  ProfilerGUID: TGUID = '{0B97459B-865C-4E6B-BDBC-8EF2980A4059}';
  ProfilerClass: TComClass = TExceptionProfiler;
...
procedure TExceptionProfiler.OnInitialize;
var
  EventMask: Cardinal;
begin
  NestingLevel := 0;
  EventMask := COR_PRF_MONITOR_EXCEPTIONS or COR_PRF_MONITOR_CLR_EXCEPTIONS;
end;

procedure TExceptionProfiler.ExceptionThrown(thrownObjectId: ULONG);
var
  ClassName: WideString;
begin
  SyncEnter;
  if GetClassFromObjectId(thrownObjectId, ClassName) then
    Log('Exception thrown, name %s', [ClassName])
  else
    Log('Exception thrown, ID: $%x', [thrownObjectID]);
  SyncExit;
end;

procedure TExceptionProfiler.ExceptionSearchFunctionEnter(functionId: ULONG);
var
  ClassName, MethodName: WideString;
begin
  SyncEnter;
  if GetClassAndMethodFromFunctionId(functionId, ClassName, MethodName) then
    Log('Exception handler search: entering method %s.%s', [ClassName, MethodName])
  else
    Log('Exception handler search: entering method, ID: $%x', [functionID]);
  Inc(NestingLevel);
  SyncExit;
end;

procedure TExceptionProfiler.ExceptionSearchCatcherFound(functionId: ULONG);
var
  ClassName, MethodName: WideString;
begin
  SyncEnter;
  if GetClassAndMethodFromFunctionId(functionId, ClassName, MethodName) then
    Log('Exception handler search: found catcher in method %s.%s', [ClassName, MethodName])
  else
    Log('Exception handler search: found catcher in method, ID: $%x', [functionID]);
  SyncExit;
end;

And here is some output from the profiler:

This could be improved by producing a stack trace at the point the exception is thrown. This is very achievable using the Debugging API (documented in DebugRef.doc) but is beyond the scope of this paper.

An Execution Flow Trace Profiler

Another example profiler is a function call trace profiler, which allows you to see the flow of execution through your .NET application as one function calls another, which calls some others and returns and so on. If you were paying attention earlier on when we saw the list of event methods in ICorProfilerCallback you might have noticed a distinct lack of events for tracing function calls. However by the same token there is a profiler constant for tracing functions as the execution flow enters and leaves them (COR_PRF_MONITOR_ENTERLEAVE). So how does this aspect of profiling operate?

Special Event Methods

In order to set up function tracing you need to call ICorProfilerInfo.SetEnterLeaveFunctionHooks. This routine takes three arguments, each of which is a reference to a standalone event routine you must create. The first one is called when a method is entered, the second when it is exited and the third when it is exited in an optimised way using a tail call. This last optimised method exit mechanism is not currently used in .NET 1.x and so the function can be a simple stub if you wish.

Each of the routines will be passed a functionId, which is passed according to the stdcall calling convention. Not only must these event routines be standalone, but according to the documentation they must be declared in a special way. In C++ they must be declared as __declspec(naked), which means the compiler will generate no prolog or epilog code (the code that sets up the stack before the routine executes and tidies it away afterwards). Additionally the routines must preserve any CPU registers they use and restore them before returning.

Naked Delphi Routines

In short we need to write some assembly code, but if we plan things correctly we only need to write it once. This is the basic structure of a truly naked Delphi routine that fits the bill:


procedure NakedEnter; assembler;
asm
  push eax
  push ecx
  push edx
  ...
  pop edx
  pop ecx
  pop eax
  ret 4
end;

There are a few things to note about the code:

We can get this routine to call a regular Delphi routine if desired, but we need to pass the functionId to it. That normally translates to pushing it onto the stack for most calling conventions but the Delphi default register calling convention passes the first three arguments in registers if they fit. The first argument is passed in EAX, the second in EDX and the third in ECX.

At the start of the routine the parameter is 4 bytes above the stack pointer, since the stack pointer points just past the last thing pushed on the stack and the stack operates in a top-down manner (it fills up from higher memory addresses and the stack pointer is decremented accordingly as this occurs). After pushing the three 4-byte registers it ends up 16 (or $10) bytes above the stack pointer so the code would look like this:


procedure NakedEnter; assembler;
asm
  push eax
  push ecx
  push edx
  mov eax, [esp + $10]
  call SomeRoutine 
  pop edx
  pop ecx
  pop eax
  ret 4
end;

We can take this a bit further if we like by calling a method of the profiler class. You may know that in a method Self is actually a hidden parameter. It is the first parameter and so needs to be passed in EAX, meaning the functionId would be passed in EDX as the truly second parameter, but what will look like the only parameter when the method is written out in high-level Delphi. So the following code would call a method called FuncEnter that was compiled with the register calling convention, assuming Profiler is a global variable holding the profiler instance.


procedure NakedEnter; assembler;
asm
  push eax
  push ecx
  push edx
  mov edx, [esp + $10]
  mov eax, Profiler
  call TTraceProfiler.FuncEnter
  pop edx
  pop ecx
  pop eax
  ret 4
end;

Another alteration we can make is to add the parameter declaration back into this naked routine. This will cause the compiler to generate prolog code that sets up a stack frame, so the parameter can be referenced by name and also tidy up the stack in the epilog code, but testing shows this does not seem to cause a problem for .NET applications:


procedure NakedEnter(functionID: ULONG); assembler; stdcall;
asm
  push eax
  push ecx
  push edx
  mov edx, functionID
  mov eax, Profiler
  call TTraceProfiler.FuncEnter
  pop edx
  pop ecx
  pop eax
end;

This is the pattern for each of the three naked routines, meaning we can declare our own event methods to trace function calls:


type
  TTraceProfiler = class(TBaseProfiler, ICorProfilerCallback)
  protected
    procedure OnInitialize; override;
    procedure FuncEnter(functionID: ULONG);
    procedure FuncLeave(functionID: ULONG);
    procedure FuncTailCall(functionID: ULONG);
  end;

const
  ProfilerCoClassName = 'TraceProfiler';
  ProfilerDescription = 'Trace profiler';
  ProfilerGUID: TGUID = '{0403B90F-6E5B-423C-A08D-760D2D851D2A}';
  ProfilerClass: TComClass = TTraceProfiler;
...
procedure TTraceProfiler.OnInitialize;
begin
  EventMask := COR_PRF_MONITOR_ENTERLEAVE or COR_PRF_DISABLE_INLINING;
  //Store reference to profiler object, so external proc.s can call its methods
  Profiler := Self;
  CorProfilerInfo.SetEnterLeaveFunctionHooks(NakedEnter, NakedLeave, NakedTailCall);
end;

procedure TTraceProfiler.FuncEnter(functionID: ULONG);
var
  ClassName, MethodName: WideString;
begin
  SyncEnter;
  if GetClassAndMethodFromFunctionId(functionID, ClassName, MethodName) then
    Log('%s.%s entered', [ClassName, MethodName])
  else
    Log('Method id: $%x entered', [functionID]);
  Inc(NestingLevel);
  SyncExit;
end;

procedure TTraceProfiler.FuncLeave(functionID: ULONG);
var
  ClassName, MethodName: WideString;
begin
  SyncEnter;
  Dec(NestingLevel);
  if GetClassAndMethodFromFunctionId(functionID, ClassName, MethodName) then
    Log('%s.%s exited', [ClassName, MethodName])
  else
    Log('Method id: $%x exited', [functionID]);
  SyncExit;
end;

procedure TTraceProfiler.FuncTailCall(functionID: ULONG);
var
  ClassName, MethodName: WideString;
begin
  SyncEnter;
  Dec(NestingLevel);
  if GetClassAndMethodFromFunctionId(functionID, ClassName, MethodName) then
    Log('%s.%s exited (tailcall)', [ClassName, MethodName])
  else
    Log('Method id: $%x exited', [functionID]);
  SyncExit;
end;
 

Exercises For The Reader

These sample profilers should give you a flavour of what you can do with the profiling API, but they only scratch the surface at present. The following short sections contain some ideas that you might wish to use to improve the samples and move towards a realistic profiling tool.

Caching Information

Currently every ID is converted to a name by negotiating with the ICorProfilerInfo and IMetaDataImport interfaces every time it is required. It is more common to record information about each ID as the item it represents is loaded/created. That way information about the ID can be retrieved locally when it is given to an event method on subsequent occasions, hopefully improving performance and thereby reducing the profiler's impact on the profilee.

There are caveats to this; primarily you need to take great care with objectIds. Since an objectId is the address of the object then a garbage collection sweep can render many objectIds meaningless. You can use the MovedReferences garbage collection event to identify which objects have moved and update your record of their objectIds accordingly.

Logging To A File

The sample profilers would be more practical if they logged their output to a file. In the case of different managed threads executing in managed applications each thread's output could be directed to a separate file to keep them all separate.

Timing Code

The execution flow tracing profiler could be made a practical tool by adding in code to perform timing of various operations, such as time spent in each method. This way you could identify the hotspots in your application and concentrate your efforts on looking at improving your coding and algorithms used in those methods.

Summary

This paper has looked at the .NET Profiling API, a rich support network for receiving a multitude of diverse events from your .NET applications. Whilst only showing a few simple examples this paper should still give you a strong indication of the power of this API and what you can do with it without much effort.

The supplied profiler framework makes implementing new profilers a very quick exercise, and you are encouraged to develop the samples further.

References

  1. The .NET Framework SDK Tool Developer Guide documentation directory.
    Wherever the .NET Framework SDK has been installed this is the Tool Developers Guide\docs subdirectory. Pertinent documents are Profiling.doc, Metadata Unmanaged API.doc and DebugRef.doc.

  2. The .NET Profiling API and the DNProfiler Tool, Matt Pietrek, MSDN Magazine December 2001, http://msdn.microsoft.com/msdnmag/issues/01/12/hood/default.aspx.

  3. Debugging Applications for Microsoft.NET and Microsoft Windows Chapters 10 & 11, John Robbins, Microsoft Press.

  4. Undocumented Delphi IDE Features, Brian Long, http://www.blong.com/Undocumented/IDEFeatures.htm.
    Includes information on how to use a normally hidden IDE menu item in Delphi 6 and 7 that allows you to run applications with IDE debugging disabled.

  5. Object Destructors and Finalizers in .NET Using C# and Delphi for .NET, Brian Long, http://bdn.borland.com/article/0,1410,29365,00.html.
    This in-depth article looks at the role and usage of destructors (or equivalent) in the managed world of .NET. It examines in detail the areas of garbage collection, finalizers, the dispose pattern and language specifics, and uses both C# and Delphi for .NET for source examples .

About Brian Long

Brian Long used to work at Borland UK, performing a number of duties including Technical Support on all the programming tools. Since leaving in 1995, Brian has spent the intervening years as a trainer, trouble-shooter and mentor focusing on the use of the C#, Delphi and C++ languages, and of the Win32 and .NET platforms. In his spare time Brian actively researches and employs strategies for the convenient identification, isolation and removal of malware. If you need training in these areas or need solutions to problems you have with them, please get in touch or visit Brian's Web site.

Brian authored a Borland Pascal problem-solving book in 1994 and occasionally acts as a Technical Editor for Wiley (previously Sybex); he was the Technical Editor for Mastering Delphi 7 and Mastering Delphi 2005 and also contributed a chapter to Delphi for .NET Developer Guide. Brian is a regular columnist in The Delphi Magazine and has had numerous articles published in Developer's Review, Computing, Delphi Developer's Journal and EXE Magazine. He was nominated for the Spirit of Delphi award in 2000.