Programmer's Weblog

pinfruit - my mnemonic web app for memorizing numbers

Thursday, January 13, 2011

I have created a web app to memorize numbers. It implements the mnemonic major system. The system works on the principle that it is easier to remember words/sentences/stories than numbers. The web app facilitates finding sequences of words that encode a number in the mentioned system. I called the app pinfruit.

My first take on implementing the system was incorrect. I simply associated digits to letters. According to wikipedia the mnemonic major system assigns digits to sounds, rather then the letters of the alphabet. And rightly so, I immediately noticed that it is easier to work with sound associations.

My mistake originated in that I first learned the technique in Poland, and in Polish it does not matter if you assign digits to letters or sounds because both correspond. Unfortunately, in English this is not the case, the major difficulty in learning the language.

I have used pinfruit to memorize mainly pins and phone numbers, but also bank site's user identifiers, which often come as numbers.

The trick is to create a memorable (vivid or absurd) sentence/story for a number. My old phone number: 07514590312, can be encoded with husky leader leaps midway Hanoi, school weather helps Madonna or sickly water leaps hometown. I refer you to the wikipedia article for the detailed description of the system.


From layers to hexagonal architecture

Sunday, August 15, 2010

The traditional layered architecture consists of data, domain and presentation layers.

Separating the presentation layer from the domain layer is rather easy. The presentation layer objects query the domain for data to display on the screen. The persistence layer is another story. The fact that it appears below the domain layer prohibits any of the classes in it knowing about the domain entities.

But consider a Fetcher class for example, a class that has responsibility for querying some domain object from the database. It has the domain type in the signature and thus has to be defined in or above the domain layer. This is leaking of the data access object out of the persistence layer.

This problem has been recognized and here we have an architecture which works around it by separating out the domain objects package. But the result is not as clear and compelling as the layered architecture.

The civilisation progressed and an architecture with nothing below the domain model has been discovered. One can see this in the ddd sample.

The persistence concern is implemented as part of the infrastructure layer. Infrastructure is that vertical layer that depends on the three other layers. Domain knows nothing about the infrastructure.

Another example is the Hexagonal Architecture.

The domain sits in the core of the application, with the persistence aspect implemented by an adapter. The adapter layer corresponds to the infrastructure and the interfaces layer combined from the ddd sample architecture diagram. The important part is that the domain does not depend on anything else.

Such layering is achieved by defining service interfaces in the domain layer for persistence purposes. These abstract interfaces are implemented by the adapters in the outer layer of the application and injected into the objects that need them.

As a result nothing in the domain layer needs to import anything from the hibernate package, or whatever persistence technology is being used. The domain layer has a custom made, abstract interface to persistence service, in its own terms. The domain code can be expressed without the details of the persistence technology being used.




Saturday, March 27, 2010

Growing Object-Oriented Software, Guided by Tests by Steve Freeman and Nat Pryce is a book I was long waiting for. GOOS demonstrates the techniques and highlights the patterns as they show up in an application grown through the book. An albeit small but a real world project is being developed from scratch in a way were tests play as important a role as object-oriented design.

It is an excellent opportunity to see how two skilled developers are growing application. The authors work in TDD in a style characterised by extensive mocking to avoid breaking encapsulation. In design they use fine grain classes. All together the style demonstrated in the book is very consistent.


IronPython Dictionaries Memory Cost

Sunday, February 7, 2010

Maintaining a mapping as a dictionary can be quite expensive in terms of memory, more than one would expect. A couple of days ago I have checked with windbg how much a single entry in a dictionary mapping pairs to ints costs. The code I measured is:

d = {(1, 2): 3}

I have executed it in IronPython 2.6.

The graph below depicts what I saw on the heap. Each box represents a single word in memory. In a 32 bit system, a word is 4 bytes.

Every object in .NET runtime has an overhead of two words. These are the pointer to its class and some house-keeping data, which I don't know much about. Allegedly some part of it is used for object locking.

The graph shows objects below buckets array, these are the only that count. A dictionary is a hashtable which is implemented with an array of buckets. Each item in the dictionary is kept in a bucket. The size of the Bucket object determines the size of an entry in the dictionary.

There are 23 boxes in the Bucket's subtree, which means its size is 23 * 4 = 92 bytes. Quite a lot. A million numbers in that dictionary would take almost 100 megabytes!

The main reason it comes out as that many is the generality of the dictionary. The fact that it can store objects of arbitrary types means that the numbers must be boxed. .NET generic collections, when specialized for ints, would store numbers in place saving a lot of space.



Extreme programming too extreme?

Tuesday, June 9, 2009

Is extreme programming as a methodology over? Has Kent Beck killed it himself? Until now he insisted that you not write a single line of code without a failing test. He was talking about his daughter who allegedly can not even imagine writing code without tests first (in his book about test driven development)!

Extreme programming is a set of good practices but the problem is the emphasis on taking them to the extreme thus ignoring the cost/benefit ratio. There is an obvious trade-off here.

It would be optimal if people wrote just the right amount of tests, not too little and not too much, to maximise the ROI. That of course is very hard to judge, but for sure the answer is not 100% coverage in all cases, as Kent kindly observes in his post.

Extreme programming is flawed as a methodology, but I want to argue that it is good for learning the craft.

People have natural inclinations for not testing, not integrating continuously but programming a feature for days not syncing with the main code base, underestimating features and than putting in long hours, etc. Extreme programming is like a correction program for these unwholesome inclinations. :)

Aristotle noted in the Nicomachean Ethics that the best way to reach the golden mean is to aim at the opposite extreme.

In the dimension of testing, for example, people will naturally tend to write too little tests, usually none. Practising extreme programming forces us to go to other extreme: from none tests to 100% coverage no matter the cost. This teaches that you can write tests you didn't previously realized, but more importantly it changes the mindset of the programmer. After practising extreme programming you will find yourself uncomfortable without the safety net of tests, and while you will not necessarily want 100% coverage you will be in a much better position to judge how much testing is really needed.

I am glad that I was a part of an extreme programming team for almost 2 years now. We might have been too extreme, but too extreme in the good direction. 4:1 ratio of tests to code that we have might be an overkill, but it is certainly better than no tests at all. I feel similarly about other practices.



Sony PRS 505 ebook reader review

Saturday, May 16, 2009

I use Sony PRS 505 for more than a month now (see how it compares with other readers). It came with 100 classic books, all of which look very tempting. I have already read over 1000 pages of David Copperfield.

(BTW, the Reader shows the beginning of the first chapter of the new "IronPython in Action" book, by my colleagues Michael Foord and Christian Muirhead.)

They say that the battery life is 5000 page turns, but mine died after about 1000 pages of mentioned David Copperfield. I have been playing with it, and sometimes going back and forth, but for sure that was not 4000 page turns worth of playing. Anyway, the battery life is very good, but the 5000 page turns is misleading at best.

The device comes with software for windows. It is not really a problem for a Linux user, because you can mount the reader like a usb stick, and copy the books directly - that is how I do it at least. There is calibre project that is supposed to be heaps better software for the reader than the official one, but it doesn't start on my system and I don't see any reason to investigate -- I'm fine with cp.

I have it for over a month and think it was worth every single pound I payed for it. I write the review now, because I have finally bought an ebook on-line and put it on the reader. That feels right. Although, in the meantime I had to buy one dead tree book simply because there was no ebook available. Hopefully that will not happen often as amazon's Kindle gains popularity.

My biggest wish is that the screen was larger. Looks like the upcoming Kindle DX is going to be the right size, though it is going to have a lower resolution than the Sony Reader. I bet that other manufacturers are already working on a larger version to compete with DX.

The small size is not really right for books with code and mathematical formulae (most cs papers and technical books). I found that putting the reader into horizontal mode (buried down in the settings) helps a lot with that.



Beautiful Code: Resolver One

Tuesday, January 20, 2009

Resolver One, aka the Python Spreadsheet, has a beautiful design at its core, which I would like to present to the programmer community. No knowledge of Python is necessary to understand this article.

The goal for the application is to provide seamless integration of spreadsheet (formulae in the cells) with sequential code. In MS Excel the VBA integration is anything but seamless. In the rest of this post I explain how Resolver One integrates Python with spreadsheet functionality.


To ensure that we are on the same page, I'll pinpoint some facts about spreadsheets. In a traditional spreadsheet there are two types of content a cell can hold:

  • constant: like number, date or text
  • formula: an expression usually referencing other cells

A user can enter this content by typing to a cell. Additionally, a user can specify formatting for a cell by various GUI means; for example, make the cell's font bold by clicking a toolbar button.

How do you add sequential code to it? Resolver's solution is to turn the spreadsheet into sequential program. User's custom code merges with the rest of the spreadsheet thus expressed.

Spreadsheet as a program

In Resolver One, what you see in the grid is the result of executing the code displayed in the coding pane.

Following is the code that appears in a document created with File | New command. I have pruned it by dropping comments and not required import statements.

from Library.Workbook import Workbook

workbook = Workbook()

Constants = {}
Formatting = {}
workbook.Populate(Constants, Formatting)

The above featured code creates a workbook with three empty worksheets.

Where did the code come from? Resolver One generated it from the model. By model I mean the data structure that remembers constants, formatting and formulae input by the user through the GUI interface.

The drill is: Resolver One generates code from the model, executes the program, takes the resulting workbook object and displays it in the tabbed grid view. This is called recalculation.

Every time the user changes the model through the GUI commands, a new recalculation is triggered. The next snippet presents code after me changing the model by typing to some cells in the grid, I want to give you a feel of the generated code. I put number 1 into A1 and formula =A1*2 into B1.

from Library.Workbook import Workbook

workbook = Workbook()

Constants = {
    'Sheet1': {
        (1, 1): 1,
Formatting = {}
workbook.Populate(Constants, Formatting)

workbook["Sheet1"].B1 = workbook["Sheet1"].A1*2

User Code

All the code generated from the model is divided among three uneditable sections in the coding pane:

  • imports and worksheet creation
  • constants and formatting
  • formulae

In addition, there are three editable sections for user code, one after each of the generated ones. The user code is "on the same rights" with the generated code which is creating the workbook to display; the merge is perfect :).


That would have been all and buttons aren't strictly necessary to discuss here, but they fit quite well. A button can be created in the user code and set as the value of a cell. Of course a button has a click handler that a user would define as a function. This handler is not executed during the recalculation of course. So when is it executed, and more importantly what can it do?

The first question is quite easy. The handler gets called when user clicks the button. He can't click it until he can see it, which is after the recalculation is finished the workbook is displayed on the grid.

There is nothing magic in what a button can do. Python has lexical scopes, so the handler has access to anything that was in scope where the function was defined, in particular this means the workbook object. For instance, handler could create two hundred new buttons and place them all over the grid.

When a user clicks a button the handler is called after which Resolver One refreshes the grid to display any changes made to the displayed workbook. There is no recalculation, because the model didn't change. This also means, that any changes made by the button handler will disappear after the next recalculation is finished, because there will be simply different workbook object being displayed.

Note: Resolver One 1.4, next major release, will provide means for changing the model from a button handler.


So that is the architecture of Resolver One, brought to you by one of the programmers on the team. Every other feature in the application revolves around this foundation. Of course the model contains more stuff, for which code needs to be generated; but it all ends up in one of the three sections in coding pane. To fully grok different features in Resolver One, you need to understand recalculation.