Answer similarity

8.28.11

In the question/answer model for coin identification similarity is an important metric. Prior methods in question/answer systems have shown good results by using image similarity and across-question similarity, however answers have been mostly been ignored.

Let’s assume for any given question, there is a discreet set of possible answers, one of which is the “correct” one. In prior work if the wrong answer is given, it can drastically skew the accuracy of results. This is particularly the case in the Coin dataset. If a coin is annotated to have a “young male” on the obverse side but the user mistakes it for “Zeus” this will make identifying the coin almost impossible. Current systems recover from such situations by asking the user more questions as to dampen the incorrect answer. This is however clumsy as the more questions that are asked, more likely is there a chance for incorrect answers to be given (or rather, there is a fixed error rate for all questions).

It’s for this reason that it would seem good to have a sort of metric to measure how similar answers are as to not to consider “young male” confused with “Zeus” to be completely incorrect. Many issues arise which are curious by themselves as possible research paths:
  • “young male” vs. “boy”. WordNet can be used to find similarity of single words, not phrases. How to address phrase similarity is a problem.
  • “man” vs. “Zeus”. WordNet does not have proper nouns. This seems a problem suited for ontological studies.
  • “Zeus” vs. “Jupiter”. This is the most curious issue because it crosses several problems. First it is similar to the ontological problem as both could be thought of as “man” and thus share some similarity. Or some equivalency could also exist, like a pseudonym. Second, and much more interesting, there exists no historical/mythical genealogical database similar to WordNet. It would be very helpful if similarity between “Henry II” and “Henry III” could be calculated because that is much more similar to “Buddha” who shares no historical or mythical connection.