POTW 3/26/07: “Lingo: Search Results Clustering Algorithm Based on Singular Value Decomposition” by Osinski, Stefanowski, and Weiss

by grant.ingersoll in Algorithms, Computer Science, Natural Language Processing (NLP), clustering

Paper of the Week for March 26 is “Lingo: Search Results Clustering Algorithm Based on Singular Value Decomposition” by Osinski, et. al.  Aah, back to matrices…

Popularity: 3% [?]

POTW 3/19/07: Discussion of “An Analysis of the AskMSR Question-Answering System”

by grant.ingersoll in Algorithms, Computer Science, Information Retrieval, Microsoft, Natural Language Processing (NLP), Question Answering

Better late than never is what they say… I’ve been absolutely slammed this week trying to get out a proposal and finish up my slides for ApacheCon Europe. So, here goes on the POTW for this week. If you recall, we are reading “An Analysis of the AskMSR Question-Answering System” by Brill, Dumais and Banko.

Microsoft’s AskMSR system, as detailed in this paper, takes a different tact for trying to solve the QA problem. Unlike most systems that use sophisticated NLP techniques, AskMSR makes the bet that somewhere on the web the exact answer is written in just the right way to answer the question. Their approach is touted to be much simpler and much more efficient. Also, they suggest this approach is complementary to the other approaches, even though in this paper they are trying to see just how far they can get.

AskMSR relies on the Internet as a massively redundant data source. Section 2 of the paper lays out the System Architecture, which can be br0ken down into 4 sections:

  1. Query Reformulation
    1. Takes the input query and rewrites into a number of different possible answers. As a last resort, they also produce some less precise versions made up of non-stopwords. No POS tagger or parser is used.
  2. n-gram mining
    1. Given the rewrites, they submit them to a search engine, get the page summaries and use n-grams to calculate candidate answers and score them.
  3. Filtering
    1. Given the n-grams, AskMSR then filters and reweights the answers based on answer-type, as specified by a few in-house, handwritten filters. This is most likely one area where more advanced NLP techniques could be used to determine.
  4. n-gram tiling
    1. The tiling algorithm merges similar answers and creates longer answers from overlapping answers.

Section 3 then details the experiments that were run, how they performed and how the different components contributed. Ironically, they use Google for their search engine, but my guess is there current system does not. Section 3.2 lays out how important each of the individual components are to the process. While query rewrites weighting are pretty important (11.2% drop in performance when all are weighted equally), n-gram filtering is the single most important part, contributing nearly 18% over the baseline. Tiling rounds things off by contributing approximately 14% to the process.

Section 4 has a discussion of how the various components contribute to errors. Interestingly, many of the errors are due to their projection of answers onto the TREC-9 corpus and wouldn’t really be an issue in a live system. Another interesting error source is the inability to query the search engine for numbers without including the number in the query:

“For example, a good rewrite for the query How many islands does Fiji have wold be <<Fiji has <NUM> islands >> but we are unable to give this type of query to the search engine”

It seems to me, though, that this could be addressed by some type of bounded approach.  For instance, why not just iterate through trying some numbers, within reason.  I wonder, too, how good their potential answers would be if they just put in any number.  I tried the Google query: “Fiji number islands” and the top three results have the answer and the third one, I think would be handled by the n-gram approach.

Section 5 is a good discussion on knowing when the AskMSR system does not know.  They use a decision tree that uses a number of factors, like query length, number of stopwords, etc. The thought process behind this is that why bother trying to answer questions that you know have a very low likelihood of being answered in the first place.  For example, they know that they don’t do very well on how questions, so they could simply say they don’t know the answer or just return keyword search results for the question (in a live system).  Of course, it all depends on how tolerant your users are of incorrect answers.

Well, that’s it for this week.  Haven’t decided on next week yet.  May do another QA paper, may look into some clustering algorithms.  Anyone have anything in particular they want to look at?

Popularity: 6% [?]

POTW 3/19/07: “An Analysis of the AskMSR Question-Answering System” by Brill, et. al.

by grant.ingersoll in Algorithms, Computer Science, Information Retrieval, Natural Language Processing (NLP), Question Answering

Paper of the Week for March 19, 2007 is “An Analysis of the AskMSR Question-Answering System” by Bill, Dumais and Banko. Let’s have a look at how Microsoft does QA, shall we?

Popularity: 8% [?]

POTW 3/11/07: Discussion of “COGEX: A Logic Prover for Question Answering”

by grant.ingersoll in Algorithms, Computer Science, Information Retrieval, Natural Language Processing (NLP), Question Answering

This week we are digging deeper into the published work of Language Computer Corporation’s QA (they have an online demo on their home page) system by looking at COGEX: A Logic Prover for Question Answering by Moldovan, Clark, Harabagiu and Maiorano.  If you recall from a few weeks ago, the LCC system was the top performer at TREC 2003 (see  last week’s POTW for a discussion of the overall system.)

In this paper, LCC demonstrates that using a theorem proving strategy to validate answers found through traditional QA means can improve performance by over 30% on TREC style questions.  To support this claim, LCC takes us through, at a high level, the steps involved in creating a theorem prover for use in QA.

Section 1 introduces us to the problem at hand and why they think COGEX is useful for QA and what types of challenges were overcome to implement it.  To summarize, COGEX creates logical representations of the questions, candidates and other world and language knowledge to re-rank candidate answers and remove incorrect answers.  It should be noted that it doesn’t identify the candidates to begin with, just “proves” they are correct.  The main technical challenges in the approach occur in two main areas: creating the logically representation of the necessary world knowledge and other inputs and the high failure rates and long processing time required to apply the prover (I wonder if LCC has overcome these for their online demo, or if they don’t use the theorem prover in the demo.)

Section 2 outlines how the theorem prover is integrated into the QA system.  They have a module called the Axiom Builder which takes in various free text inputs and WordNet glosses and builds logical representations of these.  On a side note, I wonder about the storage and retrieval algorithms used for these.  I would imagine one needs a way of quickly looking up the appropriate pieces, especially when it comes to the Wordnet glosses.  All of this input is fed into the justification module, which tries to prove out the theorem, relying on some relaxation techniques if the proof fails.

Getting into the guts of the program, sections 3 and 4 discuss in detail how to create logical representations of the text and the world knowledge incorporated into the system.  For the text, they use what they call a “logic form” which is a middle ground between a syntactic parse and a deep semantic representation.  In other words, they capture the various pieces of the structure of the text that they think are important, in this case: “(1) syntactic subjects, (2) syntactic objects, (3) prepositional attachments, (4) complex nominals, and (5) adjectival/adverbial adjuncts” (page 2).  To build the logic form, they map the words to the predicates, using the base form plus part of speech info.  Nouns are supplemented with a tag that allows it to be referred to from other predicates.  From here, they use grammar rules to construct the actual representation.  Since there are so many grammar rules, they observe that the top ten most frequently used rules cover more than 90% of the cases that they need in WordNet.  If I understand this correctly, they are saying common grammar rules such as Subject Verb Object, etc. are sufficient for most of their cases.

The next part  of the paper details how to use WordNet glosses to build a bank of world knowledge for use in the theorem prover by developing lexical chains linking synsets across hierarchies.  This is useful for increasing passage retrieval and extractions, as well.  The chains codify topically related concepts and are used in the proving algorithm.  Several examples are given at the end of section 4 for those interested in seeing more details.

Before getting to how the theorem prover works in section 6, section 5 discusses how NLP Axioms are built for the various language nuggets they are interested in, such as complex nominals, possessives, etc.  One thing to note is these axioms have strengths associated with them that play a role in scoring the answers in the prover.  Weaker axioms result in weaker confidence in a given proof.

Sections 6 and 7 discuss how the axioms are used and show an in-depth example of the process in action.  Reading the syntax is a bit mind numbing, but essentially it relies on a couple key pieces: hyperresolution, paramodulation and proof by contradiction.   See the subsection of section 6 on Inference Rules for definitions of hyperresolution and paramodulation.  In my naive understanding, they are ways of reducing and substituting to make the proof more manageable.  Proof by contradiction is simply a way of proving something by assuming it is incorrect and then logically deducing that if it were incorrect than some known fact would be wrong.  In the example of section 7, they assume there is NOT a company/organization that developed the Mosaic Internet browser which is contradicted by the fact that their passage states that NCSA developed Mosaic.  One interesting thing to note here, is this type of proof would not work well for subjective things or on passages that are lying or incorrect.

Section 8 discusses results.  Namely, they claim a 30% improvement in correct answers for TREC 2002.  Quite significant results with the trade off lying in the time and effort it takes to both codify the prover and actually run it on real systems.  This is often the trade off in NLP systems in the current state of things.  It seems you can have either fast or good, but rarely can you have both when it comes to deep, complex problems such as QA.

Popularity: 6% [?]

POTW: 3/11/07: “COGEX: A Logic Prover for Question Answering” by Moldovan, et. al.

by grant.ingersoll in Algorithms, Computer Science, Information Retrieval, Natural Language Processing (NLP), Question Answering

Following on from last weeks look at Language Computer Corporations TREC 2003 entry, we are going to dig deeper into the theorem prover part of the system and look at “COGEX: A Logic Prover for Question Answering” by Moldovan, et. al.

Popularity: 3% [?]