Tuesday, May 19, 2009

Minimize Code Explosion of Generic Type

Generic is added to .net framework since version 2, which highly increase the re-usability of commonly used algorithms. It's well known that jit compiler will generate concrete type with given generic type argument at run time. So, it's possible that there will be code explosion if a lot of concrete types are created.

What kind of explosion?
According to the compilation model of .net application. The C#/VB code is first compiled into IL code. Then the jit compiler will compile the IL code into native code on demand. The jit compiler will also generate concrete type with specified type arguments. So, there is only one copy of IL code with generic type argument still in place.
What get duplicated is the native code generated by jit compiler. There is a copy for every method for each concrete type.
Another kind of data has duplication is EEClass and MethodTable. EEClass and MethodTable is type specific data. Strictly speaking, such data don't get duplicated because they are unique to each concrete type.

How .net tries to avoid explosion

In .net framework, two methods are adopted to minimize code explosion.
1. Different invokes of a generic method with the same type argument share the same copy of native code. This only takes effect when these invokes are in the same appdomain.
2. The CLR considers all reference type arguments to be identical. It does this based on the fact that reference variables are pointers (kind of, not accurate expression) to object on the heap. They can be manipulated in the same way.

Verify the optimization
In order to verify that the optimization method acutally behaves that way, we create the following sample and debug it with windbg.

static void Main()
{
List<int> intList = new List<int>();
List<object> objList = new List<object>();
List<system.delegate> delList = new List<system.delegate>();
}

Input sxe ld:mscorlib to instruct windbg to break when the application loads mscorlib module
When windbg breaks, input .loadby sos mscorwks to load sos.dll
Input .chain to confirm the sos extension has been successfully loaded
C:\WINDOWS\Microsoft.NET\Framework\v2.0.50727\sos: image 2.0.50727.3053, API 1.0.0, built Fri Jul 25 22:08:38 2008
[path: C:\WINDOWS\Microsoft.NET\Framework\v2.0.50727\sos.dll]

Input !bpmd Test.exe Test.Program.Main to set a managed breakpoint in Main method
Input p command several times until we see System.Collections.Generic.List`1 object on the managed stack with !dso command. The output below shows objects we are interested in:
ESP/REG  Object   Name
0019e3a4 01e6bc04 System.Collections.Generic.List`1[[System.Delegate, mscorlib]]
0019e5c4 01e6bbec System.Collections.Generic.List`1[[System.Object, mscorlib]]
0019e5c8 01e6bbc8 System.Collections.Generic.List`1[[System.Int32, mscorlib]]

Input !do 01e6bc04 to dump the first object and we get:
Name: System.Collections.Generic.List`1[[System.Delegate, mscorlib]]
MethodTable: 008126a4
EEClass: 698fca68
Size: 24(0x18) bytes
(C:\Windows\assembly\GAC_32\mscorlib\2.0.0.0__b77a5c561934e089\mscorlib.dll)
Fields:
MT    Field   Offset                 Type VT     Attr    Value Name
69b140bc  40009d8        4      System.Object[]  0 instance 01e6bc1c _items
69b42b38  40009d9        c         System.Int32  1 instance        0 _size
69b42b38  40009da       10         System.Int32  1 instance        0 _version
69b40508  40009db        8        System.Object  0 instance 00000000 _syncRoot
69b140bc  40009dc        0      System.Object[]  0   shared   static _emptyArray
Domain:Value dynamic statics NYI
002efb90:NotInit
...

Input !dumpmt -md 008126a4 to dump method table for this object. We get:
EEClass: 698fca68
Module: 698d1000
Name: System.Collections.Generic.List`1[[System.Delegate, mscorlib]]
mdToken: 0200028d  (C:\Windows\assembly\GAC_32\mscorlib\2.0.0.0__b77a5c561934e089\mscorlib.dll)
BaseSize: 0x18
ComponentSize: 0x0
Number of IFaces in IFaceMap: 6
Slots in VTable: 77
--------------------------------------
MethodDesc Table
Entry MethodDesc      JIT Name
69a96a70   69914934   PreJIT System.Object.ToString()
69a96a90   6991493c   PreJIT System.Object.Equals(System.Object)
69a96b00   6991496c   PreJIT System.Object.GetHashCode()
69b072f0   69914990   PreJIT System.Object.Finalize()
69aef320   69913310   PreJIT System.Collections.Generic.List`1[[System.__Canon, mscorlib]].Add(System.__Canon)
69b03f00   69913318   PreJIT System.Collections.Generic.List`1[[System.__Canon, mscorlib]].System.Collections.IList.Add(System.Object)

And we do the same thing to dump method table for the 2nd and 3rd object. The output is:
EEClass: 698fca68
Module: 698d1000
Name: System.Collections.Generic.List`1[[System.Object, mscorlib]]
mdToken: 0200028d  (C:\Windows\assembly\GAC_32\mscorlib\2.0.0.0__b77a5c561934e089\mscorlib.dll)
BaseSize: 0x18
ComponentSize: 0x0
Number of IFaces in IFaceMap: 6
Slots in VTable: 77
--------------------------------------
MethodDesc Table
Entry MethodDesc      JIT Name
69a96a70   69914934   PreJIT System.Object.ToString()
69a96a90   6991493c   PreJIT System.Object.Equals(System.Object)
69a96b00   6991496c   PreJIT System.Object.GetHashCode()
69b072f0   69914990   PreJIT System.Object.Finalize()
69aef320   69913310   PreJIT System.Collections.Generic.List`1[[System.__Canon, mscorlib]].Add(System.__Canon)
69b03f00   69913318   PreJIT System.Collections.Generic.List`1[[System.__Canon, mscorlib]].System.Collections.IList.Add(System.Object)

EEClass: 698f6c3c
Module: 698d1000
Name: System.Collections.Generic.List`1[[System.Int32, mscorlib]]
mdToken: 0200028d  (C:\Windows\assembly\GAC_32\mscorlib\2.0.0.0__b77a5c561934e089\mscorlib.dll)
BaseSize: 0x18
ComponentSize: 0x0
Number of IFaces in IFaceMap: 6
Slots in VTable: 77
--------------------------------------
MethodDesc Table
Entry MethodDesc      JIT Name
69a96a70   69914934   PreJIT System.Object.ToString()
69a96a90   6991493c   PreJIT System.Object.Equals(System.Object)
69a96b00   6991496c   PreJIT System.Object.GetHashCode()
69b072f0   69914990   PreJIT System.Object.Finalize()
69fd3b60   699ac468   PreJIT System.Collections.Generic.List`1[[System.Int32, mscorlib]].Add(Int32)
69fd2f80   699ac470   PreJIT System.Collections.Generic.List`1[[System.Int32, mscorlib]].System.Collections.IList.Add(System.Object)


From the output, we can easily identify that the method for objList and delList are the same, but the method for intList is different. So we've verified that the code for concrete type of reference type argument are shared.
Although the code is shared, these objects' EEClass are different. So they are actually different types.

Given the debugging skill above, we can also easily verify that different generic instances defined with the same type argument in different scope have the same EEClass.

References:
Drill Into .NET Framework Internals to See How the CLR Creates Runtime Objects

No comments: