POTW 3/4/07: Answer Mining by Combining Extraction Techniques with Abductive Reasoning by Harabagiu, et. al.
This weeks paper, “Answer Mining by Combining Extraction Techniques with Abductive Reasoning“, lays out, at a high level, the capabilities of the highest performing QA system at TREC 2003, namely Language Computer Corporation’s QA system. The first section or two lay out the groundwork for the competition, much as was already done in the Voorhees paper from last week. The real meat of the paper starts in the section titled “The architecture of the QA system”.
The system is divvied up into several components, as displayed in Figure 1 of the document. They are the question processing unit, document processing, factoid answer processing, list answer processing and definition answer processing. All documents are processed in the same way and passages are retrieved based on the keywords in question. Depending on the type of question, some passages are removed if they do not have the right answer type. Passages having a higher number of expected answer types are favored for the list based approach.
For factoid questions, LCC used their CICERO LITE system to provide extractions and answers were calculated based on the extractions and/or the expected answer types. The extraction process had to identify a variety of semantic classes, such as quantity, date, people, city, etc. The paper then discusses the types of questions that were answered by these approaches, as well as some special case scenarios related to manner of death (kind of morbid, but death is often a fascination of this kind of research, in my experience). From here, there is a discussion of the role of theorem proving in the algorithm (see page 5), but the details are left to another paper (guess what will be the paper next week?) I must admit, I don’t fully understand page 5 and 6 just yet, mostly, I think, because I’m not familiar with the syntax they are using, so maybe reading the next paper will make it more clear.
Page 6 continues with discussion of finding answers for definition questions, which relies on a pattern matching approach to find answers based on 38 internally developed patterns, some of which are in Table 5 in the paper.
Page 7 finishes of with a discussion of their list based approach, which uses a threshold cutoff approach that determines the similarity between the first and last answers in the list, and all those in between, cutting off the answers when they reach a threshold.
The rest of the paper is performance evaluation and references.
Popularity: 3% [?]

