Relations between Coins and other datasets

7.16.11

Classification of images has 2 distinct fields – differing in the quantity of classes and categories involved. Most studied are the multi-category datasets where subsets of images drastically differ in appearance from the others.

Sample images from the Caltech-101 dataset. Link

These datasets while difficult, are relatively easy when compared to the multi-class datasets. This is because images belonging to a particular category can be visually different from the other images. For instance a soccer ball is very different from an airplane when color histograms are compared.

Birds and flowers have recently become popular datasets because while the images can still be very different, they are more similar than prior datasets. For this reason they belong in the multi-class type of classification.

Flowers dataset. Link

Caltech-UCSD Birds 200. Link

Coins dataset. Link

As it can be seen, coins is perhaps an extreme example of the multi-class image classification problem. While a numismatic expert may argue that coins from different regions of the world could be split by category, and denominations within the region are actually classes, for the untrained eye (ie. computers) they are pretty much all the same. Unlike flowers where color and petal shape can aid in the classification process coins have neither. Also coins have no “poses” as is the case with birds which has been shown to be an important feature. These omissions have all been used in the recent past in creating state-of-the-art classification algorithms for the Bird and Flower datasets. While birds and flowers have a relatively small number of visual characteristics coins have complex scenes that have been painstakingly catalogued by humans.

Chrome + Mac + I/O Error

9.15.10

A month ago I started getting repeated kernel errors that claimed that an “I/O Error” had occurred. When this took place the entire computer would freeze for 1min and then resume as if nothing had happened. I was/am using Snow Leopard and the latest version of Google Chrome on a MacBook Pro.

I tried everything:
  • Fixing any disk errors
  • Fixing any permission errors
  • Reinstalling Snow Leopard
  • Reinstalling Chrome
Finally after much frustration I removed the application settings and the problem went away. A month later however the problem cam back. I almost tried all the above methods again, however instead of removing all the settings instead I cleared the cache. I haven’t had a problem ever since!

Transitioning to Python from Matlab – Day 2

5.20.10

A great limit to Matlab the very high price for both individual licenses and distributed/parallel computing toolbox. To this python already has several good alternatives. That said, the documentation and capabilities of these options is highly convoluted.

Parallel Python

Parallel Python (PP) is a great start in getting code to run in several computers. It works on a versatile client/server model, where generic servers must be launched to do the actual computations and the user must right a client which interacts with the several servers. The nice thing about this module is how fast you are up and running. Basically you pass arguments, and a function pointer to PP and it handles the distribution and gathering of results. Load balancing, passing extra data, and anything that can’t be done with a single function will be outside the realm of this module.

MPI4Py

MPI is something that I never used, the documentation is sparing, and overall very mysterious. That said, the potential is great. There are several MPI implementations for Python, and I picked this one because it seemed the easiest to install and start using. Essentially when using MPI4Py your script is called in parallel on multiple CPU cores and/or computers and the MPI element makes sure that all the instances talk to each other. What is cool is that variables in your program are both local to an instance, but at the same time can be shared seamlessly with your other instances. The documentation is very good, but the actual MPI standards are very odd and worth learning more about particularly because it can be used with computer clusters.