Introduction
This blog entry contains notes on debugging memory problems in an IronPython
application on Windows. I also provide scripts that are helpful in the
process. I should also mention that it is specific to IronPython 1.*.
The Debugger I was using is windbg.exe.
It is a GUI application, but feels more like a command line. It is certainly more
powerful than Visual Studio Debugger.
First steps
First thing after starting windbg should be loading
SOS.dll, which is an extension to windbg providing commands specific to
debugging .NET apps (all starting with !). There is !help command that provides description for
these commands (invoke after loading sos).
.load sos
!help
If you have problems it might be because of the wrong sos.dll version.
There is a different version for each .net runtime. Since IronPython runs in
.net 2.0 you need sos for that specific version. You can find it in the .net sdk
directory and specify the full path for the load command.
Say freeze ;)
Windbg can attach to a process by id (F6
shortcut, but there is also a menu item for it). From that moment on, the
debugee is freezed until you detach from it.
Before we actually do any debugging we need to have something to debug. Prepare
application so that after it does some considerable leaking, there is a moment in its execution
that you would attach windbg to it. This moment should be after doing thorough
garbage collection to ensure that during debugging we are dealing only with
alive
objects. You could have code like this:
from System import GC
from System.Diagnostics import Process
from System.Threading import Thread
for _ in range(2):
GC.Collect()
GC.WaitForPendingFinalizers()
print "attach now to process id %d" %\
Process.GetCurrentProcess().Id
Thread.Sleep(100000)
Next thing is to find objects that leak. In our case the biggest ones were Dictionaries
so it was enough to just list all dictionary objects. Sometimes it is unclear which
objects leak, this
blogpost
suggest a strategy for finding out leaking objects. For this task you will find the
dumpdiff.py helpful.
This following listing shows output of !dumpheap command the heap with given
substring in the type name. This is always the first command after attaching
to a process.
!dumpheap -type System.Collections.Generic.Dictionary
...
1b489a58 000e2e00 124
02713600 000e2e00 134720
02b756b0 000e2e00 2503008
24b496a0 000e2e00 2503008
...
Now copy one of the addresses and issue gcroot command to find some of the
chains of references holding the object alive. I'll take the last address.
It is worth mentioning that 24b496a0 points to
an array that holds the entries of the dictionary.
!gcroot 24b496a0
...
13ab6360(<Unloaded Type>)->
13ab63bc(IronPython.Runtime.FieldIdDict)->
13ab63c8(System.Collections.Generic.Dictionary`2[[IronPython.Runtime.SymbolId, IronPython],[System.Object, mscorlib]])->
13ab6424(System.Collections.Generic.Dictionary`2+Entry[[IronPython.Runtime.SymbolId, IronPython],[System.Object, mscorlib]][])->
186787d4(<Unloaded Type>)->
...
The previous listing shows only a fragment of the output of the !gcroot command.
Arrows indicate directions of references. In the output of !gcroot command, the root
(a static object or value on some stack) will be at the top whereas the
object you invoked the !gcroot with will be towards the bottom.
Difficulties in debugging an IronPython app
Python is a dynamic
language and that makes it more difficult to debug IronPython than C# for
example (with windbg). There are two problems. Firstly, types defined in
IronPython appear under windbg as Unloaded Type. Even if the type
names would be shown (no clue why they are not), they would be C# types (or
should I say .NET native types) like OldType and UserType: the guts of the
IronPython implementation, not the classes defined in our IronPython code.
Secondly, the attributes
of an object are not fields. They are stored in a dictionary, and to make
things worse, that dictionary is not keyed by strings but by SymbolIDs.
Somewhere in memory lies a table containing strings to which SymbolIDs hold indexes.
To find out the attribute name or the type name there are
many commands needed. I have created
two scripts to automate it. Explaining the windbg scripts is out of scope for this
entry, but I think I will post about it soon.
attr script
The first script, attr, takes two addresses as arguments, one to the
array holding the dictionary entries, second to the value of the
attribute, and prints the name of the attribute. Below I include the same
fragment of the gcroot output plus example use of the attr. It will display the name under
which 186787d4 is kept in the __dict__ of 13ab6360. Note that I pass to
attr the address of the array not 13ab6360.
...
13ab6360(<Unloaded Type>)->
13ab63bc(IronPython.Runtime.FieldIdDict)->
13ab63c8(System.Collections.Generic.Dictionary`2[[IronPython.Runtime.SymbolId, IronPython],[System.Object, mscorlib]])->
13ab6424(System.Collections.Generic.Dictionary`2+Entry[[IronPython.Runtime.SymbolId, IronPython],[System.Object, mscorlib]][])->
186787d4(<Unloaded Type>)->
...
$$>a<c:\path\attr 13ab6424 186787d4
Name: System.String
MethodTable: 790fc6cc
EEClass: 790fc62c
Size: 110(0x6e) bytes
(C:\Windows\assembly\GAC_32\mscorlib\2.0.0.0__b77a5c561934e089\mscorlib.dll)
String: _GUIThreadTestCase__guiControl
The above listing assumes you save the attr script in c:\path, and the
dollars are required, pounds will not do.
Make sure you invoke it giving an address of a dictionary entry
array keyed by SymbolIds like in the above example.
So the above gives the name of one attribute. To get more attributes
you can invoke !dumparray on the entry array, which will display
all the values there which you can then check individually with
attr script (or write another script to automate it ^^).
!dumparray -details <entryArrayAddress>
The attr script may not work if the executable you are debugging is
running from a path that contains spaces. There is one constant in the
script that would have to be increased to make up for that additional space.
usrtype script
The next script is less reliable that the first one, it worked half
the time I invoked it on a Unloaded Type object. I think
it works for UserType, so I named it usrtype. It takes just one
argument which is the address of an object and returns the name of
its Python type.
$$>a<c:\path\usrtype <address>
I again assume you have it saved in c:\path.
!dumpobj
Now the most frequently used command, !dumpobj.
It displays the values of the fields of an object. Of course
it displays only the fields defined in the .net class. It
knows what the object's type is, because each object on
the managed heap has the pointer to its type in its first
4 bytes (or 8 if it is 64 bit machine).
Since all the leaking I was chasing ware caused by
design
flaw in implementation of events in IronPython, I usually
looked for object of type ReflectedEvent in the chain displayed
by the !gcroot. ReflectedEvent stores the name of the event. It
was usually also useful to inspect the IronPython.Runtime.Calls.Method
further down the chain,
from which you can get the name of the method handling the event
as well as the class this method is defined in.
Tips & Tricks
I hope you will find them useful.
Testing that the memory leak is gone
So you got rid of the memory leak, but how do you test
that it is gone? I mean an automated unit test!
obj = objectThatMightLeak()
weakRefToObj = WeakReference(obj)
del obj
for _ in range(2):
GC.Collect()
GC.WaitForPendingFinalizers()
assert !weakRefToObj.IsAlive
WeakReference will allow GC to collect the object, and after
collection it will answer if the object was garbage collected. The del obj line is very important,
ware it not there the test would have no chance of passing.
Marking objects
Let's say you suspect that some object is not being garbage collected, and its class
is defined in Python.
Because of the dynamic nature of Python, its class is not represented as .NET type, hence you
can't simply find it with !dumpheap -type <type_name> command.
My mother works in laboratory, when she analyzes samples, she sometimes marks them with
special substance to make some elements stand out. You can mark Python objects in a similar way.
from System import GopherStyleUriParser
pythonObject.tag = GopherStyleUriParser()
GopherStyleUriParser is a good enough mark, there is low probability you
will have a second one on the heap, and because it is .NET type you can find it easily with
!dumpheap -type Gopher. Then invoke !gcroot on it
and you will find that the chain goes through your object.
Update: Previously I was recommending using ArithmeticException instead of GopherStyleUriParser for the tag; don't do this. As soon as you would get rid of the other leaks, the object would be kept alive by the instance of ArithmeticException, which captures the stack trace.
Nuking events
That one is a dirty hack. All the trouble I had with memory leaks were because of the
events, how about nuking them?
The reason the IronPython team didn't just use the
.net implementation of events was that they wanted to allow removing handlers by object equality,
rather then by referential equality.
By referential equality I mean comparing objects' references
(is operator in Python).
So how about defining an object that is equal to everything?
class EqualityForAll(object):
def __eq__(_, __):
return True
That is a very fine fellow, which we can use to get all the handlers out of
an event.
e = EqualityForAll()
for _ in irange(100)
control.OnClick -= e
Ok, that will kill at most 100 handlers, but there is no way you can
find out how many there are. And not, it will not throw an exception if
there are no more handlers. If I remember correctly from looking at the
implementation code (open source is great), there are some exception being
thrown under the hood, but are swallowed higher up. This makes the operation
costly if there are no handlers, as the exception in .NET are much more
expensive than in CPython.
Summary
I thought that finding and stopping the leaks would be easy, but it took many days.
Perhaps fixing the IronPython implementation could have taken less time. Anyway,
if you think that the bug should be fixed, please vote for it. It was given a low
priority and it is not easy to do, so without votes it might not make it to IronPython 2.0.