Implementing SOS with SPT - Part 2 of N - DumpStackObjects

This is part 2 of N of my SOS series.

For part 2, I’m going to talk about DumpStackObjects.  This one is fairly straight forward, and we already have most of the functionality required built in SPT, so all that is left is to hook it up.

The general algorithm for scraping the stack is this:

  1. Find the stack limits (stackBase, stackLimit)
    • stackBase can be pulled from the current thread’s TEB (thread environment block)
    • stackLimit is the value of [e/r]sp for that thread.
  2. Get the bounds of the GC heap segments.  This algorithm can be found in CSDbgExt::EnumHeapSegments.
  3. Start at stackLimit, moving 1 pointer at a time, until we hit stackBase
    1. Read a pointer at our current stack location
    2. Use IClrProcess::IsValidObject to check if the address is valid or not
    3. If valid, also make sure it’s within the GC heap segments found in (2)
    4. We can find the MT using IXCLRDataProcess3::GetObjectData.  For strings, we can read their value easily using IXCLRDataProcess3::GetObjectStringData.

Once we’ve done this, there’s one more thing to consider: some registers may also contain CLR objects.  We can evaluate all the registers in a similar way to #3 above.

You can find the source for this on github here.

Fun with .NET remoting - Building a RealProxy implementation

[Steve’s note: I wrote this post a year or so ago and never had posted it.  It’s still relevant today, but I don’t recommend using shady things like this in production code without understanding the implications.]

A side project I worked on recently involved building an abstraction layer around DB access, specifically calling stored procedures.  My goal was to reduce code redundancy and add type-safety to the parameters of stored proc calls.  The road I went down involved creating an interface for a group of stored procs, with the interface methods mapping to the stored procs.  I might resurrect this as an example and post the code on github, because it actually worked out really well.

What's a RealProxy?

The .NET remoting infrastructure is built around 3 key classes:

  1. MarshalByRefObject
    • This is the "opt-in" marker for by-ref object marshaling.
  2. RealProxy
    • This is the base type for method/constructor call dispatch over remoting.  The CLR has special knowledge of this class, and this is the managed entry/exit point for calls dispatched through a TransparentProxy.  Calls enter and leave managed code via RealProxy.PrivateInvoke.
  3. __TransparentProxy
    • The other piece of the proxy pair.  The CLR has special knowledge of this class in terms of type casting and method invocation.

Implementing RealProxy only requires you to implement Invoke(IMessage msg) and do whatever you want inside it.  In my above example, I took the method name and parameters and handed them off to a SqlCommand and ran ExecuteReader, ExecuteNonQuery, etc.  The implementation details for the synchronous case aren't that exciting.

Supporting BeginInvoke/EndInvoke

One big feature I wanted to implement was asynchronously executing SQL via BeginInvoke/EndInvoke on the proxy methods.  Unfortunately, Microsoft made this impossible to do without resorting to some sketchy reflection.  Inside RealProxy.PrivateInvoke, there's a block of code that looks like this:

if (!this.IsRemotingProxy() && ((msgFlags & 1) == 1))
{
    Message m = reqMsg as Message;
    AsyncResult ret = new AsyncResult(m);
    ret.SyncProcessMessage(retMsg);
    retMsg = new ReturnMessage(ret, null, 0, null, m);
}

There are a few things going on here

  • The constants for msgFlags are defined in System.Runtime.Remoting.Messaging.Message. We'll need this value later on.  The interesting values are:
    • BeginAsync (eg a call to BeginInvoke) = 1
    • EndAsync (EndInvoke) = 2
    • OneWay (eg a method with a [OneWay] attribute on it) = 8

  • The consequence of the block of code above is that the remoting infrastructure will run BeginInvoke synchronously for any RealProxy that isn't a RemotingProxy.  I'll get into details on this in a bit.
  • IsRemotingProxy checks the _flags field on RealProxy.  This is set in the RealProxy constructor if the type is RemotingProxy.  We can't inherit from RemotingProxy (it's internal), but we can use reflection in our own constructor to set this to RealProxyFlags.RemotingProxy (1).  Once we do this, a "normal" Invoke implementation won't work correctly for async invocations.

How proxy invocation works

So back to msgFlags and how asynchronous method invocation works with the remoting infrastructure.  As I mentioned above, there's two possible code paths for an async invoke on a RealProxy, the first is the "normal" code path that you get when you invoke a delegate bound to a non-RemotingProxy instance.  In this case, the code path looks like this:

  1. someDelegate.BeginInvoke(...)
  2. Native CLR
  3. RealProxy.PrivateInvoke, msgFlags = 1 (BeginAsync)
  4. PrivateInvoke calls our Invoke(...)
    • Since our Invoke isn't aware of anything async (in fact, with the public API there's no way to know), it runs synchronously.
  5. PrivateInvoke sets up an AsyncResult instance with the result from Invoke(...) and returns this.  Note, at this point, everything has run synchronously, our BeginInvoke was exactly the same as an Invoke, except the last step of returning an IAsyncResult.  At this point, all our work is done.
  6. We call someDelegate.EndInvoke(iar)
  7. RealProxy.PrivateInvoke, msgFlags = 2 (EndAsync)
  8. PrivateInvoke calls EndInvokeHelper, which takes care of returning the return value or throwing an exception if one occurred.

The pro of this code path is that you (the RealProxy implementer) don't need to worry about the async code path, in fact, you can't.  The con is that there's no way to get a truly async invocation with this.

Now, the alternate code path is "enabled" by telling the infrastructure that we're a RemotingProxy.  This code path looks roughly like this

  1. someDelegate.BeginInvoke(...)
  2. Native CLR
  3. RealProxy.PrivateInvoke, msgFlags = 1
  4. PrivateInvoke calls our Invoke, this time however, it's expecting an IAsyncResult in return.
    • We set up whatever async stuff we want to do here and start it.
    • FUN FACT: You can actually return whatever you want here!  The CLR is missing the type check to make sure your ReturnMessage actually returns an IAsyncResult.  The resulting chaos that ensues from not returning the correct type is entertaining.  I made a post about it on StackOverflow awhile ago here.
  5. PrivateInvoke returns, we have an IAsyncResult
  6. We call someDelegate.EndInvoke(iar)
  7. PrivateInvoke calls Invoke, msgFlags = 2
  8. Invoke handles getting the correct return value or throwing an exception if one occurred.  Possibly through RealProxy.EndInvokeHelper as above.

As you can see here, there's a lot more burden on us to "get things right." with the implementation.  We now need to handle synthesizing an IAsyncResult, invoking a callback (if passed in), raising an exception in the EndInvoke (if thrown), and getting the value back.  Also, since none of this is exposed publically via the API, it makes things even more difficult because we now need to rely on reflection.  I'm going to drill into step 4 of the second code path and explain how it works.

Implementing an async-aware Invoke

The first problem of implementing an async-aware Invoke is getting the msgFlags.  This is the core of Invoke, and we'll be using it to branch to synchronous/BeginInvoke/EndInvoke/OneWay code paths.  Getting the value is pretty easy, looking at System.Runtime.Remoting.Messaging.Message, you can see there's a method called GetCallType.  Unfortunately this isn't exposed in any of the public interfaces Message implements, and Message itself is internal, so we need to use reflection to call it. (int)msg.GetType().GetMethod("GetCallType").Invoke(msg, null) will do the trick.

At this point, we need to branch for a few conditions.  The 4 main cases are: Begin, End, OneWay, Synchronous.  The simplest way is to create a method for each one and call them in Invoke depending on call type.  Begin is the most interesting case because it requires the most work, so I'm going to focus on that.

BeginInvoke is responsible for 3 main tasks: creating the IAsyncResult to return, starting the async work, and handling the task completion.  It turns out the framework exposes a class System.Runtime.Remoting.Messaging.AsyncResult that wraps a lot of the more complicated async completion logic.  For example, the message we get in Invoke doesn't contain the last two "extra" parameters to BeginInvoke, the AsyncCallback delegate and state (it accomplishes this through Message.GetAsyncBeginInfo.)  AsyncResult will get these for us, and once we give the AsyncResult the IMessage result, it'll handle invoking the callback if needed.

I’ll write more about EndInvoke at a later point, but it’s a fairly simple implementation.

Implementing SOS with SPT - Part 1 of N - DumpObj

I decided to have some fun re-implementing SOS methods using SPT.  I chose DumpObj to start with because it’s one of the more complicated methods in SOS.  Hopefully this will be part 1 of a multipart series.

Let’s start by looking at some sample output from !do on a hashtable…(sorry for the weird layout, I’m going to try to change my blog theme at some point soon).

0:036> !do 11faa984 
Name:        System.Collections.Hashtable
MethodTable: 7370d288
EEClass:     7337c4a0
Size:        52(0x34) bytes
File:        C:\Windows\Microsoft.Net\assembly\GAC_32\mscorlib\v4.0_4.0.0.0__b77a5c561934e089\mscorlib.dll
Fields:
      MT    Field   Offset                 Type VT     Attr    Value Name
7370ae80  4000c01        4 ...ashtable+bucket[]  0 instance 11faa9b8 buckets
7370c770  4000c02       18         System.Int32  1 instance        2 count
7370c770  4000c03       1c         System.Int32  1 instance        1 occupancy
7370c770  4000c04       20         System.Int32  1 instance        2 loadsize
737070b4  4000c05       24        System.Single  1 instance 0.720000 loadFactor
7370c770  4000c06       28         System.Int32  1 instance        2 version
73707208  4000c07       2c       System.Boolean  1 instance        0 isWriterInProgress
733223ec  4000c08        8 ...tions.ICollection  0 instance 00000000 keys
733223ec  4000c09        c ...tions.ICollection  0 instance 00000000 values
7370222c  4000c0a       10 ...IEqualityComparer  0 instance 00000000 _keycomparer
7370b350  4000c0b       14        System.Object  0 instance 00000000 _syncRoot

We can go through this line by line and figure out what’s going on.

  1. MethodTable : This can be found on an object via IXCLRDataProcess3::GetObjectData.
  2. Name : This is the type name, it can be obtained via IXCLRDataProcess3::GetMethodTableName with the method table from above.
  3. EEClass : IXCLRDataProcess3::GetMethodTableData will give you the EEClass, as well as the Module (which we need in a minute).
  4. Size : This is on the ClrObjectData we got in #1.
  5. File : Given the Module (from #3), we can get the assembly that contains it via IXCLRDataProcess3::GetModuleDataOnce we have the assembly, IXCLRDataProcess3::GetAssemblyName will give you the full path to an assembly.
  6. Name : This one is a little interesting.  As far as I can tell, IXCLRDataProcess doesn’t directly expose a way to get a field name, however, we can use IMetaDataImport to get the values.  The method of getting an instance of IMetaDataImport is fairly simple, once you know it’s implemented by the module itself.  We can use IXCLRDataProcess3::GetModule (with the module address from #3) to get an IUnknown.  That can be QI’d for IID_IMetaDataImport.  Once we have that pointer, we can GetMemberProps to get the name.

Fields

Fields are the fun part.  The whole algorithm for iterating over a classes fields is in ClrProcess::FindFieldByNameExImplThe basic idea is:

  1. Get to the root type in the inheritance hierarchy
  2. For each type, traverse the linked list of FieldDescs (obtained via GetMethodTableFieldData.FirstField, stop traversing when you’ve seen all instance fields and static files on that type. 
  3. Once you’ve seen them all, go to the next type in the hierarchy and repeat #2.  Note, NumInstanceFields includes the fields inherited from parent classes as well.  This is why we’re starting at the root (System.Object) and moving down, otherwise we wouldn’t know when to stop traversing each list of fields.

Once we have all the fields, we can start drawing each line of output.  Most are fairly trivial to get.

  1. MethodTable : Using the ClrFieldDescData (from IXCLRDataProcess3::GetFieldDescData) saved during iterating over the fields, we can get the FieldMethodTable.
  2. Field : This is the token for the field.
  3. Offset : This is simply the Offset from the field data.  However, instance fields need to be offset by sizeof(void*) to account for the type handle (MethodTable *) at the start of every object.  Static fields do not need to be offset since they aren’t actually stored on the object instance.
  4. Type : This is the same as the typename from above, using the FieldMethodTable.
  5. VT : As far as I can tell, there’s no simple “IsValueType” flag anywhere.  However, you can use the FieldType to figure it out.  It corresponds to the CorElementType enum in corhdr.h.  Anything before and including ELEMENT_TYPE_R8 is a value type, as well as ELEMENT_TYPE_VALUETYPE.  Everything else can be counted as class.
  6. Attr : I’ve only used two values here, “shared” (for IsStatic, Is[Context,Thread]Local), or “instance” for anything else.
  7. Value : The algorithm to read the values falls into two categories, which further break down into more subcategories.  The top level categories are instance vs static.  We’re not even going to try to get values for context local and thread local fields.

    Instance Fields

    Instance fields break down into 3 more subcategories, class, primitive, and value type.

    • Class is the simplest, just read one pointer at obj+offset+sizeof(void*).  Display that value
    • ValueType is similar, but one less level of indirection.  ValueTypes are stored in-type, not as a pointer to another object on the heap.  The address shown under value is simply obj+offset+sizeof(void*).  This address could then be passed to !dumpvc to get the value.
    • Primitive.  For these, we need to compute the size of the field via ClrProcess::GetSizeForType.  This method is simply a switch statement with the built-in primitives and their respective sizes.  Once we have a size, we can read S bytes at the correct offset (as above) into the object.

    Static Fields

    Static fields break down similar, but there is already logic in ClrProcess::GetStaticFieldValue to handle it.  The complication of static fields is we need to consider all app domains.  ClrProcess::GetStaticFieldValues handles all of this for us.  We can simply call it and display the results.  One thing I’ve noticed is that SOS displays “NotInit” on some fields, I’m not sure if it’s just doing a null check here, or there’s more logic to figure out if a domain has somehow initialized a static field yet.

    Putting that all together with some pretty formatting, we’ve got the same information as SOS’s !dumpobj.

    I’ve pushed this to my github repo in case anyone wants to check it out.

    Diving into SDbgExt2, the core interface

    The SDbgCore static library is the foundation of SDbgExt2.  It contains the definition of IXCLRDataProcess3 (I think it’s the 3rd revision, I just made the name up), as well as the core classes and interfaces IClrProcess and ISDbgExt.  It also contains many helper interfaces such as IClrObject(Array), and lots of callback interfaces.  We’ll go through these one by one.

    IXCLRDataProcess3

    This is the interface that makes everything work.  In the pervious version (.NET 4.0), almost all functions were executed via a single Request(…) method on the interface.  In .NET 4.5, these requests were split into strongly typed functions. 

    I started by reverse engineering the VTable via the embedded symbols in mscordacwks.dll.  The interface is implemented by ClrDataAccess, and is the 3rd interface in the VTable (hence IXCLRDataProcess3).  Next, I began reverse engineering the parameters.  This is obviously more difficult, but you can get an idea of the parameters in two ways. 

    First, since most functions are similar to their request counterparts in .NET 4, you can get an idea of the parameters via the request object.  Second, by inspecting the call sites in SOS, we can get an idea of the parameters being passed in.  It turns out that a large chunk of functions follow a few very similar signatures, so figuring out the parameter isn’t actually that tough.  Also as a side note, the x86 calling convention (stdcall) here makes it a lot easier to reverse engineer than x64, so I did most of my reversing on the x86 version.

    For the output structures, again I leveraged .NET 4.0’s structures.  Obviously there were some changes / additions / removals, but again by looking at the call site in SOS, it’s not too hard to figure out which fields do what.  Some of these structures are obviously incomplete, but there’s enough there to support the functionality I need.

    The result of this is IXCLRDataProcess3.idl, which MIDL compiles to a C++ interface and type library.

    ClrProcess

    ClrProcess is the lowest level new interface in SDbgExt2.  It contains various utility methods, field accessors, heap enumeration, sort-of-reflection, and thread inspection.  I tried to keep this interface as low-level as possible, which providing reusable methods for what would be used by the higher level methods.

    For example, accessing a static field (see ClrProcess::GetStaticFieldValue) via IXCLRDataProcess3 is a fairly complicated process.  This is the simplified algorithm:

    1. You need to find the actually field address (the FieldDesc)
      1. This can be done by scanning the method table recursively, and using IMetaDataImport to resolve a token to a name.
    2. Given a FieldDesc and app domain, figure out the module that contains it.
    3. Figure out if the module has data in the domain-neutral store or not.
      1. If so, use that to get the domain local module data
    4. Otherwise, use the app domain’s domain local module data.
    5. Using the DLM data, use the GCStaticDataStart, or NonGCStaticDataStart (for non-reference types) to find the base offset
    6. Add the field’s static offset to the base offset to get where in memory the field is
    7. Read from the memory offset to get the field value.

    Next up: more details on implementing ClrProcess

    Introducing SPT for .NET 4.5

    This has been a post long in the making.  I’d written this in December/January as a complete re-do of SPT for .NET 4.0,  (Side Note: Interestingly enough, in the mean time MS released ClrMD, which is similar to the direction I was going, but still significantly different (I’m focusing more on a lower level interface to be plugged into WinDBG, ClrMD is focused more on a managed vector).) but never released it publically.

    I’ve decided this time around to release the entire library under the GPL.  For anyone who would like to use this in commercial software, contact me and we can probably work something out.  I’ll be pushing the full source tree to my github account shortly.  https://github.com/steveniemitz/SDbgExt2.

    The initial release supports most of the original SPT commands:

    • DumpAspNetRequests
    • DumpDictionary
    • DumpSqlConnectionPools (aka !sqlblame)
    • DumpThreadPoolQueues
    • FindHttpContext
    • GetDelegateMethod

    In the next few days, I’ll be posting more about SDbgExt2 (and the WinDBG extension, SPT), it’s inner workings, and how the new IXCLRDataProcess3 interface works.  My main design goals for SDbgExt2 were:

    • Easily support a managed wrapper interface
    • Separate the core debugging interface from the “value add” interface.  As you can see, SDbgCore is a fairly small static library, which SPT links against.
    • Provide a cleaner code base which is easier to maintain and enhance in the future, as well as which abstracts IXCLRDataProcess as much as possible.
    • Have near-100% code coverage via unit tests from the start.  This is mostly achieved with pre-canned mini-dumps which the unit tests can operate on.  (Side note: for now I’m not releasing the mini-dumps as they contain machine-specific data, I may in the future release them if I can prove they contain no personal data, but for now the unit tests are being released as more of a reference guide.)

    I’m also posting pre-compiled, ready to run SPT.dll binaries for use in WinDBG (or elsewhere).

    Download here:  x86 / x64

    As always, feel free to post here or email me if you have any questions / comments.