POTW 2/26/07: Overview of the TREC 2003 Question Answering Track by Voorhees
Overview of the TREC 2003 Question Answering Track by Voorhees is a nice introduction to the QA task, as defined by TREC. The first two sections layout the two main tasks:
- The Passages Task
- Main Task - Divided into three sub tasks
- Factoids
- Lists
- Definitions
The passages task is used to test whether a QA system can find factoids in fairly short spans of text (250 characters). As compared to the factoids sub-task under the main task, this approach allows answers to be a little more “loose”. Answers are judged correct if they contained the right answer, the document that they came from supports the answer and that the answer was “responsive”. Responsive, in my understanding, means answers have the correct units or they refer to the actual item of interest and not a replica or imitation. The evaluation judged the accuracy of the answers as the fraction of correct answers as determined by human judges.
The main task was broken up into three sub tasks. Participants had to do all three tasks. The factoid task is pretty much the same as the passage task, but more strict in that exact answers had to be returned, not just passages. The list task requires systems to return one or more list of items that make up an answer. For example, if the question was “what are the 50 states that make up the United States?” a correct answer would have to enumerate all 50 states. Finally, the definition task required the system to define things, such as “Who is George Bush?” or “what is information retrieval?” Much of section 2.3 details how the definition task was evaluated. If anything, it makes you think about how you go about, as a person, finding a correct answer to a question. Many people, I think, take for granted that google or something gives you a correct answer to your questions.
Section 4 is an extensive discussion of how the evaluation was completed.
In my opinion, having worked on a QA system, I would say there are a couple of other areas that are of interest in QA. First, though, it should be noted that QA is a very hard problem, not only do most systems rely on an IR system for retrieval, which is less than perfect, but then you need to find the exact answer in the given passage. At any rate, I also think there are several other areas that warrant research and are more interesting in some ways. First, are questions of that require longer answers that may span multiple passages or that answer questions targeted towards a more demanding audience. For instance, how could you answer “why, in scientific terms, is the sky blue?” in 250 characters or less? Second, of interest are questions that require essay style answers or answers that fit into positive and negative sections, such as “what are the pros and cons of tariffs on China?” Granted, these are much harder, but I think they are much more realistic in some ways. To me, definition questions are fairly quickly answered by searches on Google or Wikipedia. To some extent this is also true for list questions as well.
Now that we have a nice intro to the QA task, we will start to look into how people implement QA systems.
Popularity: 3% [?]


Paper of the Week » Blog Archive » POTW 3/11/07: Discussion of “COGEX: A Logic Prover for Question Answering” wrote,
[...] Prover for Question Answering by Moldovan, Clark, Harabagiu and Maiorano. If you recall from a few weeks ago, the LCC system was the top performer at TREC 2003 (see last week’s POTW for a discussion [...]
Link | March 16th, 2007 at 8:05 am