Programmer's Weblog: Debugging memory problems in an IronPython app

Introduction

This blog entry contains notes on debugging memory problems in an IronPython application on Windows. I also provide scripts that are helpful in the process. I should also mention that it is specific to IronPython 1.*.

The Debugger I was using is windbg.exe. It is a GUI application, but feels more like a command line. It is certainly more powerful than Visual Studio Debugger.

First steps

First thing after starting windbg should be loading SOS.dll, which is an extension to windbg providing commands specific to debugging .NET apps (all starting with !). There is !help command that provides description for these commands (invoke after loading sos).

 
.load sos 
!help

If you have problems it might be because of the wrong sos.dll version. There is a different version for each .net runtime. Since IronPython runs in .net 2.0 you need sos for that specific version. You can find it in the .net sdk directory and specify the full path for the load command.

Say freeze ;)

Windbg can attach to a process by id (F6 shortcut, but there is also a menu item for it). From that moment on, the debugee is freezed until you detach from it.

Before we actually do any debugging we need to have something to debug. Prepare application so that after it does some considerable leaking, there is a moment in its execution that you would attach windbg to it. This moment should be after doing thorough garbage collection to ensure that during debugging we are dealing only with alive objects. You could have code like this:

from System import GC
from System.Diagnostics import Process
from System.Threading import Thread
for _ in range(2):
    GC.Collect()
    GC.WaitForPendingFinalizers()
print "attach now to process id %d" %\
        Process.GetCurrentProcess().Id
Thread.Sleep(100000)

Next thing is to find objects that leak. In our case the biggest ones were Dictionaries so it was enough to just list all dictionary objects. Sometimes it is unclear which objects leak, this blogpost suggest a strategy for finding out leaking objects. For this task you will find the dumpdiff.py helpful.

This following listing shows output of !dumpheap command the heap with given substring in the type name. This is always the first command after attaching to a process.

!dumpheap -type System.Collections.Generic.Dictionary
...
1b489a58 000e2e00      124     
02713600 000e2e00   134720     
02b756b0 000e2e00  2503008     
24b496a0 000e2e00  2503008  
...

Now copy one of the addresses and issue gcroot command to find some of the chains of references holding the object alive. I'll take the last address. It is worth mentioning that 24b496a0 points to an array that holds the entries of the dictionary.

!gcroot 24b496a0
...
13ab6360(<Unloaded Type>)->
13ab63bc(IronPython.Runtime.FieldIdDict)->
13ab63c8(System.Collections.Generic.Dictionary`2[[IronPython.Runtime.SymbolId, IronPython],[System.Object, mscorlib]])->
13ab6424(System.Collections.Generic.Dictionary`2+Entry[[IronPython.Runtime.SymbolId, IronPython],[System.Object, mscorlib]][])->
186787d4(<Unloaded Type>)->
...

The previous listing shows only a fragment of the output of the !gcroot command. Arrows indicate directions of references. In the output of !gcroot command, the root (a static object or value on some stack) will be at the top whereas the object you invoked the !gcroot with will be towards the bottom.

Difficulties in debugging an IronPython app

Python is a dynamic language and that makes it more difficult to debug IronPython than C# for example (with windbg). There are two problems. Firstly, types defined in IronPython appear under windbg as Unloaded Type. Even if the type names would be shown (no clue why they are not), they would be C# types (or should I say .NET native types) like OldType and UserType: the guts of the IronPython implementation, not the classes defined in our IronPython code.

Secondly, the attributes of an object are not fields. They are stored in a dictionary, and to make things worse, that dictionary is not keyed by strings but by SymbolIDs. Somewhere in memory lies a table containing strings to which SymbolIDs hold indexes.

To find out the attribute name or the type name there are many commands needed. I have created two scripts to automate it. Explaining the windbg scripts is out of scope for this entry, but I think I will post about it soon.

attr script

The first script, attr, takes two addresses as arguments, one to the array holding the dictionary entries, second to the value of the attribute, and prints the name of the attribute. Below I include the same fragment of the gcroot output plus example use of the attr. It will display the name under which 186787d4 is kept in the __dict__ of 13ab6360. Note that I pass to attr the address of the array not 13ab6360.

... 13ab6360(<Unloaded Type>)->
13ab63bc(IronPython.Runtime.FieldIdDict)->
13ab63c8(System.Collections.Generic.Dictionary`2[[IronPython.Runtime.SymbolId, IronPython],[System.Object, mscorlib]])->
13ab6424(System.Collections.Generic.Dictionary`2+Entry[[IronPython.Runtime.SymbolId, IronPython],[System.Object, mscorlib]][])->
186787d4(<Unloaded Type>)->
...
$$>a<c:\path\attr 13ab6424 186787d4
Name: System.String
MethodTable: 790fc6cc
EEClass: 790fc62c
Size: 110(0x6e) bytes
(C:\Windows\assembly\GAC_32\mscorlib\2.0.0.0__b77a5c561934e089\mscorlib.dll)
String: _GUIThreadTestCase__guiControl

The above listing assumes you save the attr script in c:\path, and the dollars are required, pounds will not do.

Make sure you invoke it giving an address of a dictionary entry array keyed by SymbolIds like in the above example.

So the above gives the name of one attribute. To get more attributes you can invoke !dumparray on the entry array, which will display all the values there which you can then check individually with attr script (or write another script to automate it ^^).

!dumparray -details <entryArrayAddress>

The attr script may not work if the executable you are debugging is running from a path that contains spaces. There is one constant in the script that would have to be increased to make up for that additional space.

usrtype script

The next script is less reliable that the first one, it worked half the time I invoked it on a Unloaded Type object. I think it works for UserType, so I named it usrtype. It takes just one argument which is the address of an object and returns the name of its Python type.

$$>a<c:\path\usrtype <address>

I again assume you have it saved in c:\path.

!dumpobj

Now the most frequently used command, !dumpobj. It displays the values of the fields of an object. Of course it displays only the fields defined in the .net class. It knows what the object's type is, because each object on the managed heap has the pointer to its type in its first 4 bytes (or 8 if it is 64 bit machine).

Since all the leaking I was chasing ware caused by design flaw in implementation of events in IronPython, I usually looked for object of type ReflectedEvent in the chain displayed by the !gcroot. ReflectedEvent stores the name of the event. It was usually also useful to inspect the IronPython.Runtime.Calls.Method further down the chain, from which you can get the name of the method handling the event as well as the class this method is defined in.

Tips & Tricks

I hope you will find them useful.

Testing that the memory leak is gone

So you got rid of the memory leak, but how do you test that it is gone? I mean an automated unit test!

obj = objectThatMightLeak()
weakRefToObj = WeakReference(obj)
del obj
for _ in range(2):
GC.Collect()
GC.WaitForPendingFinalizers()
assert !weakRefToObj.IsAlive

WeakReference will allow GC to collect the object, and after collection it will answer if the object was garbage collected. The del obj line is very important, ware it not there the test would have no chance of passing.

Marking objects

Let's say you suspect that some object is not being garbage collected, and its class is defined in Python. Because of the dynamic nature of Python, its class is not represented as .NET type, hence you can't simply find it with !dumpheap -type <type_name> command.

My mother works in laboratory, when she analyzes samples, she sometimes marks them with special substance to make some elements stand out. You can mark Python objects in a similar way.

from System import GopherStyleUriParser
pythonObject.tag = GopherStyleUriParser()

GopherStyleUriParser is a good enough mark, there is low probability you will have a second one on the heap, and because it is .NET type you can find it easily with !dumpheap -type Gopher. Then invoke !gcroot on it and you will find that the chain goes through your object.

Update: Previously I was recommending using ArithmeticException instead of GopherStyleUriParser for the tag; don't do this. As soon as you would get rid of the other leaks, the object would be kept alive by the instance of ArithmeticException, which captures the stack trace.

Nuking events

That one is a dirty hack. All the trouble I had with memory leaks were because of the events, how about nuking them?

The reason the IronPython team didn't just use the .net implementation of events was that they wanted to allow removing handlers by object equality, rather then by referential equality. By referential equality I mean comparing objects' references (is operator in Python).

So how about defining an object that is equal to everything?

class EqualityForAll(object):
def __eq__(_, __):
return True

That is a very fine fellow, which we can use to get all the handlers out of an event.

e = EqualityForAll()
for _ in irange(100)
control.OnClick -= e

Ok, that will kill at most 100 handlers, but there is no way you can find out how many there are. And not, it will not throw an exception if there are no more handlers. If I remember correctly from looking at the implementation code (open source is great), there are some exception being thrown under the hood, but are swallowed higher up. This makes the operation costly if there are no handlers, as the exception in .NET are much more expensive than in CPython.

Summary

I thought that finding and stopping the leaks would be easy, but it took many days. Perhaps fixing the IronPython implementation could have taken less time. Anyway, if you think that the bug should be fixed, please vote for it. It was given a low priority and it is not easy to do, so without votes it might not make it to IronPython 2.0.

2 comments:

leppie said...: Thanks for the tips :); February 5, 2008 at 7:21 AM
Anonymous said...: My code has no bugs. I just clicked the compile button on a 10,000 line source file I wrote from scratch and it works perfectly.

I am very interested in IronPython though and this might be useful to me.; February 5, 2008 at 7:27 AM

Programmer's Weblog

Monday, February 4, 2008

Debugging memory problems in an IronPython app