University of Arizona, Department of Computer Science

CSc 120: Rhyming Words

Background

Most of us have, at some point in our lives, struggled to find just the right rhyme to some word. Figuring out good rhymes isn’t always easy, but it might help if we could easily come up with a list of possible rhyming words to choose from. This program is a step in this direction.

This problem involves writing a program to find all the words that rhyme with some given input word. To do this, it will be necessary to reason about the pronunciation of different words; this information will be read from a file (the "pronunciation dictionary") that gives information about the pronunciations of a wide variety of words. For this assignment, we will use pieces of the file PronunciationDictionary.txt as pronunciation dictionaries. In particular, for testing and debugging your code you can use smaller subsets of this file (several such subsets are given here, and you can create your own).

This document explains how to interpret the pronunciation information given in the pronunciation dictionary.

Definitions

It turns out that there are many different kinds of rhymes (see Wikipedia for a discussion). For the purposes of this assignment, we will focus on perfect rhymes. Wikipedia defines this kind of rhyme as satisfying the following two conditions:

  1. The stressed vowel sound (i.e., the primary stress) in both words must be identical, as well as any subsequent sounds. For example, "sky" and "high"; "skylight" and "highlight".
  2. The sound phoneme that precedes the stressed vowel in the words must not be the same. For example, "bean" and "green" is a perfect rhyme, while "leave" and "believe" is not.

In the rest of this document, "rhyme" refers to perfect rhymes unless otherwise specified.

This file gives several examples of words that form perfect rhymes and words that don't, and shows how we can figure out whether or not two words rhyme.

Expected Behavior

Write a program, in a file rhymes.py, that finds words that rhyme with a given word. Your program should behave as follows:

  1. Use input() (without arguments) to read the name of the pronunciation dictionary file pfile. Do not prompt the user for input. Do not hard-code the file name into your program.

  2. Read pfile and organize the information appropriately as directed below.

  3. Use input() (without arguments) to read in a word word. Do not prompt the user for input.

  4. Collect all the words that rhyme with word. If word has more than one pronunciation (e.g., "invalid", "object", "record"), then collect all the words that rhyme with each pronunciation. It is not necessary to eliminate duplicates. Some examples are shown here.

    Ignore upper/lower-case differences when matching the input word with words in the pronunciation dictionary.

  5. Print out the words so collected (see Output format below).

Input Format

Each line of the input file describes the pronunciation for a word, and has the following format:

Output Format

Print out, one word per line, all of the words that rhyme with any of the pronunciations of the given word. They can be printed out in any order, and in any case (upper or lower).

It may happen, in some cases, that the same rhyming word is found more than once, causing the output list of words to contain duplicates. This is OK: you do not have to remove any such duplicates before printing out the rhymes. An example of this situation is for the input word SWORD, whose pronunciation is given as

S   AO1   R   D
As mentioned above, the word RECORD has three different pronunciations:
R  AH0  K  AO1  R  D
R  IH0  K  AO1  R  D
R  EH1  K  ER0  D
In this case, the first two pronunciations of RECORD, which differ in the vowel sound of the first syllable, both rhyme with SWORD. But while these two pronunciations of the word "RECORD" are different, the words themselves are the same. So when we enter them into the list of rhyming words, it looks like we have a duplicate in the list of words rhyming with SWORD.

Programming Requirements

You will need some kind of data structure to store the pronunciation(s) of various words, so that you can look up the pronunciation of any given word. Use a dictionary for this. Your dictionary should map each word to all of the different pronunciations of that word. For example:

{
    'OBJECT' : [ ['AA1' 'B' 'JH' 'EH0' 'K' 'T'], 
                 ['AH0' 'B' 'JH' 'EH1' 'K' 'T'] ],
    'OBJECTOR' : [ ['AH0' 'B' 'JH' 'EH1' 'K' 'T' 'ER0'] ],
    'SOUR' : [ ['S' 'AW1' 'ER0'], 
               ['S' 'AW1' 'R'] ],
    'BECAUSE' : [ ['B' 'IH0' 'K' 'AO1' 'Z'], 
                  ['B' 'IH0' 'K' 'AH1' 'Z'], 
                  ['B' 'IH0' 'K' 'AA1' 'Z'], 
                  ['B' 'IH0' 'K' 'AH0' 'Z'] ],
    ...
}

In this example, using this pronunciation dictionary, the words OBJECT and SOUR have two pronunciations each, so the dictionary maps each of these words to a list containing two elements; each element is a list of phonemes and gives one pronunciation of the word. OBJECTOR has just one pronunciation, so the dictionary maps it to a list containing just a single element, which is a list of phonemes making up that pronunciation. The word BECAUSE has four(!) different pronunciations, so the dictionary maps this word to a list with four elements.

Examples

Some examples of query processing, on different pronunciation dictionaries, are shown here.

Development Strategy

Use top-down incremental development.

  1. Read in the pronunciation dictionary and organize it into an appropriate data structure (see Programming Requirements above). Think about the data structure you want to use—one has been suggested for you above, but anything else that satisfies the programming requirement stated will do. Think about how you will populate this data structure as you read in words; this involves figuring out what pieces of information you need to extract from the input you read in.
  2. The next thing you have to do is to read in a word and find out which words rhyme with it. What information about this word do you need before you can figure that out? How can you find that information?
  3. Finally, what do you do once you find that information?

And when you're done...

What can you do with this?