Sunday, February 7, 2010

IronPython Dictionaries Memory Cost

Maintaining a mapping as a dictionary can be quite expensive in terms of memory, more than one would expect. A couple of days ago I have checked with windbg how much a single entry in a dictionary mapping pairs to ints costs. The code I measured is:

d = {(1, 2): 3}

I have executed it in IronPython 2.6.

The graph below depicts what I saw on the heap. Each box represents a single word in memory. In a 32 bit system, a word is 4 bytes.

Every object in .NET runtime has an overhead of two words. These are the pointer to its class and some house-keeping data, which I don't know much about. Allegedly some part of it is used for object locking.

The graph shows objects below buckets array, these are the only that count. A dictionary is a hashtable which is implemented with an array of buckets. Each item in the dictionary is kept in a bucket. The size of the Bucket object determines the size of an entry in the dictionary.

There are 23 boxes in the Bucket's subtree, which means its size is 23 * 4 = 92 bytes. Quite a lot. A million numbers in that dictionary would take almost 100 megabytes!

The main reason it comes out as that many is the generality of the dictionary. The fact that it can store objects of arbitrary types means that the numbers must be boxed. .NET generic collections, when specialized for ints, would store numbers in place saving a lot of space.