This assignment is due Wednesday, November 23 30 at 11:59PM.

  1. Goals
  2. Background
  3. Implementing WSD using Resnik’s Similarity Measure
  4. Programming
  5. Files

Goals

Through this assignment you will:

[Back to Top]

Background

Please review the class slides and readings in the textbook on lexical semantics, including WordNet, and word sense disambiguation. Also please read the article Section 5.1, describing Resnik’s word sense disambiguation in groupings approach in detail.

Note: You will be implementing a somewhat simplified version of Resnik’s approach as detailed below.

For additional information on NLTK’s WordNet API and information content measures, see:

[Back to Top]

Implementing Word Sense Disambinguation and Similarity using Resnik’s Similarity Measure

Based on the examples in the text, class slides, and other resources, implement a program to perform Word Sense Disambiguation based on noun groups, using Resnik’s method and WordNet-based similarity measure. Then compute and compare similarity scores for a set of human judgments. Specifically, your program should:

  • Load information content values for WordNet from a file.
  • Read in a file of (probe word, noun group) pairs
  • For each (probe word, noun group) pair:
    1. Use “Resnik similarity” based on WordNet and information content to compute the preferred WordNet sense for the probe word given the noun group context.
    2. On a single line, for each (probe word, noun group word) pair:
  • On a separate line, print out the preferred sense, by synsetID, of the word.
  • Read in a file of human judgments of similarity between pairs of words.
  • For each word pair in the file:
  • Lastly, compute and print the Spearman correlation between the similarity scores you have computed and the human-generated similarity scores in the provided file as:
  • NOTE: You do not need to select senses for all words, only for the probe word; this is a simplification of the word group disambiguation model in the paper.
    NOTE: You may treat all the words in context groups as nouns. You are not responsible for cross-POS similarity.

    [Back to Top]

    Programming

    Create a program hw8_resnik_wsd.sh that implements the disambiguation specified as above invoked as:

    hw8_resnik_wsd.sh <information_content_file> <wsd_test_filename> <judgment_file> <output_filename>

    Implementation Resources

    Resnik’s similarity measure relies on two components:

    Files

    All files are found in /dropbox/22-23/au571/hw8/ on patas:

    Test, Gold Standard, and Example

    Submission Files

    [Back to Top]