Addressing level of expertise


The question/answer model for classifying coins is reasonable for applications where the expertese of the person who trained the system and the user of the system are equal. For instance in the bird dataset presented by [1] the questions and answers were mostly understood by the users, and thus the “human” element can be used to increase classification accuracy. However, the paper notes that the deterministic  experiments yielded a higher accuracy than when tested with actual humans. Was this due to lack of complete understanding of the questions and answers? The paper addressed this issue by allowing users to judge their confidence of their answers. While a reasonable method, there is little work done in the field of confidence accuracy from the psychological standpoint. Is it fair to be confident about confidence?

Back to the coins, there are a few intuitions which seem important to be noted.First, coins are ultimately annotated  by students who copy from primary sources written by experts.  Second, ideally the expert uses a more precise vocabulary than the student (where “precise” is a metric not yet defined). Third, ideally the entropy present in the answers given by experts is far higher than that of students. That is, students answer questions in more general terms than experts such that an expert’s answer is far more helpful in classifying a coin than a student’s answer. The problem arises due to the fact that coins are annotated by experts and no graceful method for generalizing the questions/answers exists implicitly in the dataset. All questions trivially present in the database such as “Name the figure on the reverse of the coin from this list” has no clear general form. That is, not without knowing about linguistics because the general form of the question could be “Is the figure on the reverse of the coin a male or female or other?”

One way of generalizing questions/answers is to use WordNet. This is not a trivial issue, but has been researched at length. Instead of asking the question “Figure on the obverse of coin” and offer over 100 answers to choose from, more abstract questions could potentially be used by abstracting the answers, and then further refining the answer as needed. Bellow is a tree of the abstraction of answers where the green circles represent actual answers encoded in the dataset and the black circles represent abstract answers.

Many terms are not included in this chart, and many of the generalized answers are not intuitive or what questions can be made such that the answers make sense. These are points of further research. What does become apparent is that several answers which are encoded in the dataset can be generalized into much broader terms. Example: instead of having answers “Ares” and “Luna” side by side its more intuitive (a metric that needs further defining) for a student to be first asked about “Greek deities” and “Roman deities”.

This is not a new idea and has been partially been explored in [2] where image classification is improved by the use of WordNet ontologies. What has not been done however, is added the interactive nature of question and answer which is crucial for difficult datasets such as coins.
  1. “Visual Recognition with Humans in the Loop”, Branson, Wah, Schroff, Babenko, Welinder, Perona, Belongie. 2010
  2. “Exploiting ontologies for automatic image annotation”, Srikanth, Varner, Bowden, Moldovan. SIGIR 2005.