IronPython Dictionaries Memory Cost
Sunday, February 7, 2010
Maintaining a mapping as a dictionary can be quite expensive in terms of memory, more than one would expect. A couple of days ago I have checked with windbg how much a single entry in a dictionary mapping pairs to ints costs. The code I measured is:
d = {(1, 2): 3}
I have executed it in IronPython 2.6.
The graph below depicts what I saw on the heap. Each box represents a single word in memory. In a 32 bit system, a word is 4 bytes.
Every object in .NET runtime has an overhead of two words. These are the pointer to its class and some house-keeping data, which I don't know much about. Allegedly some part of it is used for object locking.
The graph shows objects below buckets array, these are the only that count. A dictionary is a hashtable which is implemented with an array of buckets. Each item in the dictionary is kept in a bucket. The size of the Bucket object determines the size of an entry in the dictionary.
There are 23 boxes in the Bucket's subtree, which means its size is 23 * 4 = 92 bytes. Quite a lot. A million numbers in that dictionary would take almost 100 megabytes!
The main reason it comes out as that many is the generality of the dictionary. The fact that it can store objects of arbitrary types means that the numbers must be boxed. .NET generic collections, when specialized for ints, would store numbers in place saving a lot of space.
Labels: IronPython
Unobtrusive highlighting of trailing whitespace in Vim
Sunday, September 13, 2009
Many programmers highlight trailing whitespace red to expose that unnecessary gunk that is otherwise hard to spot. I did not much care about trailing whitespace, it was never a problem for me, though now the distributed version control systems tend to complain about this.
Highlighting the trailing whitespace is effective but has this unwanted side-effect of a red thing appearing under the cursor when typing space at the end of line (most of the time I type space at the end of line it seems).
This set of commands will only highlight the trailing whitespace on opening the buffer and leaving the insert mode. Inspired by the Vim tip wiki.
highlight ExtraWhitespace ctermbg=red guibg=red au ColorScheme * highlight ExtraWhitespace guibg=red au BufEnter * match ExtraWhitespace /\s\+$/ au InsertEnter * match ExtraWhitespace /\s\+\%#\@<!$/ au InsertLeave * match ExtraWhiteSpace /\s\+$/
Some notes: ctermbg=red is for vim, guibg is for gvim. The second line is to prevent any color scheme to override the highlight settings.
OK, that feels better.
Labels: Vim
Extreme programming too extreme?
Tuesday, June 9, 2009
Is extreme programming as a methodology over? Has Kent Beck killed it himself? Until now he insisted that you not write a single line of code without a failing test. He was talking about his daughter who allegedly can not even imagine writing code without tests first (in his book about test driven development)!
Extreme programming is a set of good practices but the problem is the emphasis on taking them to the extreme thus ignoring the cost/benefit ratio. There is an obvious trade-off here.
It would be optimal if people wrote just the right amount of tests, not too little and not too much, to maximise the ROI. That of course is very hard to judge, but for sure the answer is not 100% coverage in all cases, as Kent kindly observes in his post.
Extreme programming is flawed as a methodology, but I want to argue that it is good for learning the craft.
People have natural inclinations for not testing, not integrating continuously but programming a feature for days not syncing with the main code base, underestimating features and than putting in long hours, etc. Extreme programming is like a correction program for these unwholesome inclinations. :)
Aristotle noted in the Nicomachean Ethics that the best way to reach the golden mean is to aim at the opposite extreme.
In the dimension of testing, for example, people will naturally tend to write too little tests, usually none. Practising extreme programming forces us to go to other extreme: from none tests to 100% coverage no matter the cost. This teaches that you can write tests you didn't previously realized, but more importantly it changes the mindset of the programmer. After practising extreme programming you will find yourself uncomfortable without the safety net of tests, and while you will not necessarily want 100% coverage you will be in a much better position to judge how much testing is really needed.
I am glad that I was a part of an extreme programming team for almost 2 years now. We might have been too extreme, but too extreme in the good direction. 4:1 ratio of tests to code that we have might be an overkill, but it is certainly better than no tests at all. I feel similarly about other practices.
Labels: extremeprogramming
Applied computer science
Friday, May 22, 2009
"Don't worry about people stealing an idea. If it's original, you will have to ram it down their throats." It looks like one of those optimistic sentences that are supposed to lift spirits. And some people must believe in it, I suppose, because otherwise it would not circulate.
Each time I stumble upon it, it reminds me of NP problems. In general: finding a creative solution to a problem seems to me to be a lot like finding a solution to an instance of a NP problem.
Just a reminder, an NP problem is such for which there exists a nondeterministic Turing machine solving it in polynomial time. Most people think that P!=NP, so problems that are in NP and not in P require exponential time to solve, but just polynomial time to check if a solution is correct.
So here, a whole class of problems, for all of which it is hard to find a solution, but easy to see that a given solution is good. Boy was it worth taking those computational complexity classes at the university ;).
Sony PRS 505 ebook reader review
Saturday, May 16, 2009
I use Sony PRS 505 for more than a month now (see how it compares with other readers). It came with 100 classic books, all of which look very tempting. I have already read over 1000 pages of David Copperfield.
(BTW, the Reader shows the beginning of the first chapter of the new "IronPython in Action" book, by my colleagues Michael Foord and Christian Muirhead.)
They say that the battery life is 5000 page turns, but mine died after about 1000 pages of mentioned David Copperfield. I have been playing with it, and sometimes going back and forth, but for sure that was not 4000 page turns worth of playing. Anyway, the battery life is very good, but the 5000 page turns is misleading at best.
The device comes with software for windows. It is not really a problem for a Linux user, because you can mount the reader like a usb stick, and copy the books directly - that is how I do it at least. There is calibre project that is supposed to be heaps better software for the reader than the official one, but it doesn't start on my system and I don't see any reason to investigate -- I'm fine with cp.
I have it for over a month and think it was worth every single pound I payed for it. I write the review now, because I have finally bought an ebook on-line and put it on the reader. That feels right. Although, in the meantime I had to buy one dead tree book simply because there was no ebook available. Hopefully that will not happen often as amazon's Kindle gains popularity.
My biggest wish is that the screen was larger. Looks like the upcoming Kindle DX is going to be the right size, though it is going to have a lower resolution than the Sony Reader. I bet that other manufacturers are already working on a larger version to compete with DX.
The small size is not really right for books with code and mathematical formulae (most cs papers and technical books). I found that putting the reader into horizontal mode (buried down in the settings) helps a lot with that.
Labels: review
Unlock Huawei E220 - soap on a rope
Sunday, May 10, 2009
I have successfully unlocked myself a T-Mobile Huawei E220 modem. I followed the tutorial at unlockE220.com. However, at step 4 I could not find the specified string. I suspect that might be due to firmware upgrade on the device. Anyway, I have worked around it, so please follow the tutorial, and if you get stuck at step 4, read what follows here.
The unlock code is supposed to be string of 8 digits. First let's find all the strings in the Flash.bin, created in one of the former steps; I used the strings command line program (in cygwin) to do it.
strings Flash.bin > 'code-candidates.txt'
Now, the code-candidates.txt contains all the strings, one per line. We are only interested in the ones composed from digits and at least 8 character long. Here is a python code that reads the file and prints all possible codes.
for l in open('code-candidates.txt').readlines():
l = l.strip()
if not all(map(lambda x: x <= '9' and x >= '0', list(l))):
continue
if 8 <= len(l):
print l
Assuming you saved the above code fragment as find.py, the command line to print the codes is:
strings Flash.bin > 'code-candidates.txt' python find.py
In my case it printed 10 or so codes. Try them with the unlock program provided at unlock220.com. Surprisingly, the unlock program reported success for 3 of the codes, but only after I tried the last one had my sim card got accepted.
Twitter outage
Sunday, March 29, 2009
Twitter has been down for me for about a week or so. Every time I logged in it would say "something is terribly wrong". I tought it was down for everybody - you know twitter :) - until I read Menno's post where he reports about masive tweeting at PyCon. I have just reset my password and everything is right again. Odd.Labels: twitter
Popular
- Beautiful Code: Resolver One
- Debugging memory leaks in IronPython apps
- Why Vim's modes frustrate newbies
- Four-monitor desktop
- xmonad
- Antipattern: static subject to observer map
