Introduction

You should be able to complete this assignment using only the elements of Ruby presented in lecture through November 30, and/or mentioned in this write-up. If you find yourself thinking that need to use other elements of the language, it perhaps indicates you're not fully recognizing the capabilities of what's been covered.

The Usual Stuff

Make an a9 symlink that references /cs/www/classes/cs372/fall22/a9.

Test using a9/tester (or a9/t).

Problem 1. (30 points) `switched.rb`

The U.S. Social Security Administration makes available year-by-year counts of names on Social Security Card applications for births dating back as far as 1880. Over time, some names change from predominantly male to predominantly female or vice-versa. For this problem you are to create a Ruby program switched.rb to look for names that change from predominantly male to predominantly female in a given span of years.

switched.rb takes two command-line arguments: a starting year and an ending year. Here's a run:

$ ruby switched.rb 1951 1958
             1951   1952   1953   1954   1955   1956   1957   1958
Dana         1.19   1.21   1.26   1.29   1.00   0.79   0.67   0.64
Jackie       1.40   1.29   1.14   1.13   1.11   0.94   0.72   0.57
Kelly        4.27   2.72   3.74   2.13   2.32   1.77   0.98   0.51
Kim          2.59   1.82   1.47   1.08   0.61   0.30   0.17   0.12
Rene         1.42   1.32   1.16   1.23   1.14   0.88   0.87   0.89
Stacy        1.07   0.80   0.64   0.47   0.45   0.36   0.29   0.21
Tracy        1.49   1.15   1.00   0.73   0.56   0.56   0.59   0.59

First, note that all numbers in the leftmost column are greater than one and all numbers in the rightmost column are less than one. The 1.19 for Dana in 1951 indicates that in 1951 there were 1.19 times as many male babies named Dana as there were female babies named Dana. We can see that in a9/yob/1951.txt, which has the 1951 data:

$ grep Dana, a9/yob/1951.txt
Dana,F,1076
Dana,M,1284

The format of the a9/yob/YEAR.txt data files is simple: each line has the name, sex, and the associated count, separated by commas.

Note that the argument to grep, "Dana," has a trailing comma so that "Danae" doesn't turn up, too.

By 1958 things had changed—there were only 0.64 males named Dana for every female named Dana:

$ grep Dana, a9/yob/1958.txt
Dana,F,2388
Dana,M,1534

switched reads the a9/yob/YEAR.txt files for all the years in the range specified by the command line arguments and looks for names for which the male/female ratio is > 1.0 in the first year and < 1.0 in the last year. For all the names it finds, it prints the male/female ratio for all the years from the first year through the last year. Names are printed in alphabetical order.

As a specific example, Dana is included in the list for 1951 through 1958 because males/females in 1951 was 1.19 (> 1.0) and males/females in 1958 was 0.64 (<1.0). The ratios for the middle years are not examined to decide whether to include a name; they are shown only to provide a more complete picture of the data between the endpoints.

Note that there's a big shift for Kim from 1954 through 1957. I wonder if that's because the actress Kim Novak had a breakout role in 1955's Picnic.

Spring 2016 student Mai Tong noticed this for "Addison":

             1995  1996  1997  1998  1999  2000  2001  2002  2003  2004  2005  2006  2007  2008  2009  2010
Addison      1.11  0.82  0.68  0.53  0.53  0.44  0.36  0.28  0.25  0.21  0.13  0.06  0.03  0.02  0.02  0.02

Ms. Tong wrote,

I looked up on Google to find the reason why and I found that the name means "Son/Child of Adam", which is interesting since Addison is now commonly used as a girl's name rather than boy's. Possibly due to uncommon use for a girls name back in the early 1990s? But looking closely, there's a show called Grey's Anatomy (premiered in 2005) with a popular female-lead named Dr. Addison Montgomery which might be why the ratio dropped dramatically in 2005 and onwards.

Back to the details, if no names meet the criteria, switched prints "no names found" and exits by calling exit:

$ ruby switched.rb 2011 2012
no names found

IMPORTANT: To eliminate the less significant results, a name is included only if both the male and female counts in both the first and last year are greater than or equal to 100. By that criteria the name Lavern is not included for 1949-1951, and no other names turn up, either:

$ ruby switched.rb 1949 1951
no names found

Here's the underlying data:

$ grep Lavern, a9/yob/1949.txt
Lavern,F,93
Lavern,M,120

$ grep Lavern, a9/yob/1951.txt
Lavern,F,95
Lavern,M,85

There was a M/F shift from 120/93 in 1949 to 85/95 in 1951 but because not all four of those counts are >= 100, Lavern is not included. There's no grep shown above for 1950 because that data is inconsequential: inclusion is determined solely based on counts for the first and last year.

It's interesting to combine switched with a Bash for-loop that runs the program with a gradually shifting range. When your switched is done, try this:

for i in $(seq 1970 2012); do ruby switched.rb $i $(($i+9)); echo ===; done

Two obvious extensions to switched would be command-line options to adjust the 100-baby minimum and to look for female to male flips for a name. You might find those interesting to implement and experiment with, but neither are required.

switched does no error handling whatsoever. Behavior is only defined in the case of being given two command line arguments in the range of 1880 to 2021, and the first must be less than the second.

Implementation notes for `switched`

I intend this problem to be an exercise in using the Hash class. I encourage you to devise a data structure yourself but in case you run into trouble, here are a few thoughts on my approach to the problem: http://www.cs.arizona.edu/classes/cs372/fall22/a9/switched-hint.html

The data files are in a9/yob—two directory levels below where you'll be running switched.rb. My code opens the data file for a given year like this:

f = open("a9/yob/#{year}.txt")

Given f, you might then read one line at a time with f.gets, or read them all with f.readlines, or use f.each { ... } to process each line with the associated block.

It's easy to drown in the data on a problem like this. You might start by having your code that reads the YEAR.txt files discard data for everybody but "Dana" and then use p x (equivalent to puts x.inspect) to dump out your data structure. Then try adding in a male-only and a female-only name, like "Delbert" and "Mimi" in 1951. Alternatively, you might edit down copies of some data files to just a few lines of interest.

Watch out for bugs related to integer division. (Use .to_f to get a Float when needed.)

The M/F ratios are formatted using a %7.2f format with printf, demonstrated on the command line with ruby -e:

$ ruby -e 'printf("%7.2f\n", 1277.0/1076.0)'
   1.19

Names are left-justified in a 10-wide field using a printf format of %-10s.

You may have questions about the data files. Before mailing to us or posting on Piazza, take a look at the data files and see if you can answer the question yourself. The files are in a9/yob. Those same files will be used for testing when grading.

Problem 2. (20 points) `label.rb`

In lecture we've seen that Array.inspect, which is used by Kernel.p and by irb, does not accurately depict an array that contains multiple references to the same array and/or references itself. Example:

>> a = []

>> b = [a,a]

>> p b
[[], []]

By simply examining the output of p b we can't tell whether b references two distinct arrays or has two references to the same array.

Another problem is that if an array references itself, Ruby "punts" and shows "[...]":

>> a = []

>> a << a

>> p a
[[...]]

The problems continue if we then get a Hash involved:

>> h = {}

>> h[a] = h

>> p h
{[[...]]=>{...}}

For this problem you are to write a method label(x) that produces a labeled representation of x, where x is a string, number, array or hash. Arrays and hashes may in turn reference other arrays and hashes, and be cyclic.

Here's what label shows for the first case above.

>> a = []; b = [a,a]

>> puts label(b)
a1:[a2:[],a2]

The outermost array is labeled as a1. Its first element is an empty array, labeled a2. The second element is a reference to that same empty array, but its contents are not shown, only the label a2.

Note: For reasons described later, we'll use puts label(...) to show the string that label produces.

Here's another step, and the result:

>> c = [b,b]

>> puts label(c)
a1:[a2:[a3:[],a3],a2]

Note that the label numbers are not preserved across calls. The array that this call labels as a3 was labeled as a2 in the previous example.

To explore relationships between the contents of a, b, and c we could wrap them in an array:

>> puts label([a,b,c])
a1:[a2:[],a3:[a2,a2],a4:[a3,a3]]

Here's a simple cyclic case. The third element in a is a reference to a; representing that reference with a label lets us see the cycle.

>> a = []

>> a = [10,20]

>> a << a

>> puts label(a)
a1:[10,20,a1]

Let's try a simple hash:

>> h = {}

>> h["one"] = 1

>> h[2] = "two"

>> puts label(h)
h1:{"one"=>1,2=>"two"}

Note that hashes are labeled as "hN". The key/value pairs are shown with "thick" arrows. Pairs are separated with commas, and curly braces surround the list of pairs.

Let's add some arrays into the mix:

>> h = {}

>> a = [2,2,2]

>> a << a

>> h["twos"] = a

>> h["words"] = %w{just a test}

>> puts label(h)
h1:{"twos"=>a1:[2,2,2,a1],"words"=>a2:["just","a","test"]}

Note that arrays and hashes have separate numbering: the above labeling shows both h1 and a1, for example.

Let's try h[h] = h:

>> h[h] = h

>> puts label(h)
h1:{"twos"=>a1:[2,2,2,a1],"words"=>a2:["just","a","test"],h1=>h1}

label(h) eventually reaches the key/value pair made by h[h] = h. Because h has already been labeled as h1, the presence of that key/value pair is shown as h1=>h1. Another example:

>> a = [1,2,3]

>> a << [[[a,[a]]]]

>> a << a

>> puts label(a)
a1:[1,2,3,a2:[a3:[a4:[a1,a5:[a1]]]],a1]

One more example:

>> a = [1,2,3]

>> b = {"lets" => "abc", 1 => a}

>> 3.times { a << b }

>> a << "end"

>> puts label([a,b,{100=>200}])
a1:[a2:[1,2,3,h1:{"lets"=>"abc",1=>a2},h1,h1,"end"],h1,h2:{100=>200}]

The trivial case for label(x) is that x is not an array or hash. If so, label(x) returns x.inspect:

>> puts label(4)
4

>> puts label("testing")
"testing"

Along with the two cases immediately above, here are some simple cases that are good for getting started:

>> puts label([7])
a1:[7]

>> puts label([[7]])
a1:[a2:[7]]

>> puts label([[70],[80,"90"]])
a1:[a2:[70],a3:[80,"90"]]

Keep in mind that your solution must be able to accommodate an arbitrarily complicated structure but the only types you'll encounter are numbers, strings, arrays and hashes.

All the examples shown above use puts label(...) rather than having irb display the result of label(...). For contrast, let's look at a label call without puts:

>> label([1,"two",["3"]])
=> "a1:[1,\"two\",a2:[\"3\"]]"

label returns a string and irb uses inspect to show an unambiguous representation of results, so any quotes in the result are "escaped" with a backslash, so that the result is shown as a valid string literal. Using puts lets us avoid that clutter:

>> puts label([1,"two",["3"]])
a1:[1,"two",a2:["3"]]
=> nil

Implementation notes for `label.rb`

I think of this as a fairly hard problem to solve given only the above and what you've seen in class, but for those who wish to have a challenge, I'll say nothing here about how to approach it. If you have trouble getting started, however, see http://www.cs.arizona.edu/classes/cs372/fall22/a9/label-hint.html

IMPORTANT: You must match the sequence of label numbers that my solution produces. That essentially requires you to traverse the structure in the same order I do, which is depth-first. Here's an example that evidences that depth-first traversal:

>> puts label([[[10]],[[21,22]]])
a1:[a2:[a3:[10]],a4:[a5:[21,22]]]

Note that the deeply nested [10] was labeled with a3 before the second element of the top-level list was labeled with a4.

Also, note that I iterate through key/value pairs in a hash using h.each {|k,v| ... }.

Don't hesitate to dig into the test program for `label`

There's only one test for label.rb. It's this: ruby a9/label1.rb.

You may need to dig around in a9/label1.rb to figure out exactly what code is being executed for a particular failure and associated "diff". It may be useful to copy a9/label1.rb into your directory and hack it up, maybe adding Kernel.p calls or making other changes to track down a bug.

In various places in the output from a9/label1.rb you'll see something like this:

Line 35 in a9/label1.rb:
label([a,b,c]):
   a1:[a2:[],a3:[a2,a2],a4:[a3,a3]]

That first line tells us that the test originates at line 35 in a9/label1.rb. Here's that line:

sv("label([a,b,c])", bb)

sv, standing for "show values", is a helper method that uses Ruby's Kernel.eval method to evaluate the expression specified by the first argument. The second argument, bb, is the current set of variable bindings, which are used by eval.

Historical note

label is a simplified version of the Image(x) procedure from the Icon library. In CSC 550A/B, String and List Processing, taught in the 1980s, Ralph Griswold used Image(x) as an example. Icon's Image(x), and label(x), which you are about to write in relatively few lines of code, are incredibly handy for dealing with cycling data structures but I know of no equally powerful function that's part of the standard library in any other language.

From time to time I've asked in various IRC channels and web forums in search of something similar in other languages but those questions are often met with "Why would you need that?", "There's no reason for an array to reference itself!", etc. Of course, graphs and "a maze of twisty little passages, all alike" are immediate applications of self-referential data structures.

If you ever come across anything like the label(x) method you're about to write, let me know!

Problem 3. (30 points) `vstring.rb`

This problem builds on the VString example shown on slides 220-226. I suggest you review those slides before reading further.

For this problem you are to implement a hierarchy of four Ruby classes: VString, ReplString, MirrorString, and IspString. VString stands for "virtual string"—these classes create the illusion of very long strings but relatively little data is used to create that illusion.

VString serves as an abstract superclass of ReplString, MirrorString, and IspString; it simply provides some methods that are used by those subclasses.

ReplString represents strings that consist of zero or more replications of a "base" string. Example:

$ irb
>> load "vstring.rb"

>> s1 = ReplString.new("abc", 2)
=> ReplString("abc",2)

Note that irb used s1.inspect to produce the string 'ReplString("abc",2)'. That string shows the type, base string, and replication count.

VString subclasses support only a handful of operations: size, [n], [n,m], inspect, to_s, and each. The semantics of [n] and [n,m] are the same as for Ruby's String class with one exception, described below.

Starting with s1 created above, here are some examples:

>> s1.size         => 6

>> s1.to_s         => "abcabc"

>> s1.to_s.class   => String

>> s1[0]           => "a"

>> s1[2,4]         => "cabc"

>> s1[-5,2]        => "bc"

>> s1[-3,10]       => "abc"

>> s1[10]          => nil

A ReplString can represent a very long string:

>> s2 = ReplString.new("xy", 1_000_000_000_000)
=> ReplString("xy",1000000000000)

>> s2.size               => 2000000000000

>> s2[-1]                => "y"

>> s2[-2,2]              => "xy"

>> s2[1_000_000_000]     => "x"

>> s2[1_000_000_001]     => "y"

>> s2[1_000_000_001,7]   => "yxyxyxy"

Some operations are impractical on very long strings. For example, s2.to_s would require a vast amount of memory, but if the user asked for it, we'd let it run. Similarly, s2[n,m] has the potential to produce an impractically large result.

A MirrorString represents a string concatenated with a reversed copy of itself. Examples:

>> s3 = MirrorString.new("1234") => MirrorString("1234")

>> s3.size                => 8

>> s3.to_s                => "12344321"

>> s3[-1]                 => "1"

>> s3[2,4]                => "3443"

An IspString represents a string with instances of another string interspersed between each character:

>> s4 = IspString.new("xyz", "...")  => IspString("xyz","...")

>> s4.to_s                   => "x...y...z"

>> s4[0]                     => "x"

>> s4[1]                     => "."

>> s4[s4.size-1]             => "z"

In the above examples the strings used in the subclass constructors are instances of String but they can be an arbitrary nesting of VStrings, too:

>> s5 = IspString.new(MirrorString.new("abc"), ReplString.new(".",3))
=> IspString(MirrorString("abc"),ReplString(".",3))

>> s5.to_s
=> "a...b...c...c...b...a"

>> s6 = ReplString.new(s5,1_000_000_000_000_000)
=>ReplString(IspString(MirrorString("abc"),ReplString(".",3)),100000
0000000000)

>> s6.size
=> 21000000000000000

>> s6[s6.size-20,100]
=> "...b...c...c...b...a"

>> s6[(s5.size)*(1_000_000_000_000_000-2),50]
=> "a...b...c...c...b...aa...b...c...c...b...a"

To emphasize that the "V" in VString stands for "virtual", consider this interaction:

>> s=MirrorString.new(ReplString.new("abc",100000000000000000000000000 0000000000000000000000000000000000000000000000000000000000000000000000 0000000000000000000000000000000000000000000000000000000000000000000)) =>MirrorString(ReplString("abc",1000000000000000000000000000000000000 000000000000000000000000000000000000000000000000000000000000000000000 0000000000000000000000000000000000000000000000000000000000))

>> s.size =>6000000000000000000000000000000000000000000000000000000000000000000 000000000000000000000000000000000000000000000000000000000000000000000 0000000000000000000000000000

>> s2 = IspString.new(s,s) =>IspString(MirrorString(ReplString("abc",100000000000000000000000000 000000000000000000000000000000000000000000000000000000000000000000000 00000000000000000000000000000000000000000000000000000000000000000000)), MirrorString(ReplString("abc",10000000000000000000000000000000000000000 00000000000000000000000000000000000000000000000000000000000000000000000 0000000000000000000000000000000000000000000000000000)))

>> s2.size.to_s.size
=> 328

The iterator each is available for ReplString, MirrorString, and IspString. It produces all the characters in turn, as one-character strings. It returns the receiver.

>> s1 = ReplString.new("abc",2)  => ReplString("abc",2)

>> s1.each {|x| puts x}
a
b
c
a
b
c
=> ReplString("abc",2)

Be sure to 'include Enumerable' in VString, so that methods like map, sort, and all? work:

>> MirrorString.new("abc").map { |c| c }
=> ["a", "b", "c", "c", "b", "a"]

>> MirrorString.new(ReplString.new("abc",2)).sort
=> ["a", "a", "a", "a", "b", "b", "b", "b", "c", "c", "c", "c"]

>> MirrorString.new(
              ReplString.new("a", 100000)).all? { |c| c == "a" }
=> true

Using Ranges to subscript VStrings is not supported:

>> s1 = MirrorString.new("ab")  => MirrorString("ab")
>> s1[0..-1]
NoMethodError: ...

We won't test with ranges.

TL;DR

Here's a quick summary of what's required of the VString subclasses and their constructors:

ReplString.new(s, n) where s is a non-empty String or a VString and n > 0.
MirrorString.new(s) where s is a non-empty String or a VString.
IspString.new(s1,s2) where s1 and s2 are non-empty Strings or are VStrings.
size, inspect, and to_s methods
Subscripting with [n] and [start,len], with the same semantics as for Ruby's String class.
Subscripting with a Range is not supported.
VString implements the iterator each and includes Enumerable.

The `String` semantics exception

There is one exception to matching the semantics of String when subscripting with [start,len]. Note this behavior of String:

>> s="abc"   => "abc"

>> s[3]      => nil

>> s[3,1]    => ""

>> s[4,1]    => nil

You might find it interesting to think about why Strings have that behavior but we won't match it with VString. Instead, if start is out of bounds, nil is produced:

>> s = ReplString.new("abc",2)
=> ReplString("abc",2)

>> s[5,10]   => "c"

>> s[6,10]   => nil

`VString` itself is never directly tested

When grading, tests will only exercise the VString subclasses; VString will not be tested directly. (Note that none of the examples above do anything with VString itself.)

Implementation Notes

Division of labor between VString and its subclasses

If you've completed 335 you might not need this advice but here's what I recommend for what to put where with respect to VString and and its subclasses:

VString
- Methods: [], to_s, and each. No initialize method.
- VString should have no instance variables. Instead, have all instance variables in the subclasses. (Recall that with the Shape example in the slides, all the data was held by Rectangle and Circle, aside from a label in Shape.)
Subclasses of VString (ReplString, MirrorString, IspString)
- Methods: initialize, size, char_at, and inspect.
- Each subclass should have instance variables that support the behavior of the instances. For example, a straightforward implementation of ReplString would have two instance variables: one to hold the base string and one to hold the replication count.

In my solution all of the subclass methods are very short—typically a line or two of code.

It may help to imagine that there will eventually be dozens of VString subclasses instead of just the three specified here. Having that mindset, and wanting to write as little code as possible to implement those dozens of subclasses, should motivate you to do as much as possible in VString and as little as possible in the subclasses. However, VString should have absolutely no code that knows anything about any of its subclasses.

Implementation of subscripting

VString itself should have this method:

def [](start, len = 1)
    ...
end

That defaulted second argument allows this one method to handle both s[n] and s[start,len].

Implement VString.[] in terms of calls to size and char_at, which in turn will resolve to the implementation of those methods in the three subclasses.

As example of resolution, consider the following code:

>> s = ReplString.new("abc",10)
>> c = s[5]

To evaluate s[5] Ruby will first see if ReplString.[] exists. It doesn't, so the call s.[](5) will resolve to VString.[]. In turn, VString.[] calls char_at to actually produce a one-character string. That char_at will resolve to ReplString.char_at.

With deeply nested constructions of VString subclasses there's a potential of subscripting having exponential behavior. The simple way to avoid that problem is to have only one place in your VString.[] method where char_at is called. (Save the value with c = char_at(...) if you need to use it in multiple places.)

Strong recommendation: VString.[](start, len = 1) can be called with a negative starting position. As a first step, normalize a negative starting position, which is relative the end of the string, to the corresponding starting position that is relative to the start of the string. For example, given s = ReplString("abc",2), s[-2] is equivalent to s[4]. By doing that normalization you'll avoid the need for code like this:

if position >= 0 then
    ...code to handle positive positions...
else
    ...code to handle negative positions...

Don't get tangled up considering various cases of nesting

Watch out for thinking like this: "What if I've got a ReplString that holds a MirrorString of a MirrorString of a ReplString of an IspString made of ..."

Instead, just keep in mind that what you can count on is that the values used in the VString constructors support size, s[n], s[start,len], and inspect. Have a duck-typing mindset and write code that uses those operations. (Yes, there's to_s, too, but it's problematic with long strings; avoid using it.)

Those double-quotes in `inspect`

Getting the double-quotes right for inspect can be a little tricky. Start by getting the following examples working:

>> s = ReplString.new("a\nb", 2)
=> ReplString("a\nb",2)

>> s2 = MirrorString.new("x\ty")
=> MirrorString("x\ty")

>> s3 = ReplString.new(s2,10)
=> ReplString(MirrorString("x\ty"),10)

Saving a little typing when testing

To save myself some typing when testing, I've got these lines at the end of my vstring.rb:

if !defined? RS
    RS=ReplString
    MS=MirrorString
    IS=IspString
end

With those in place, I can do something like this:

>> s = IS.new(MS.new(RS.new("abc",2)),"-")
=> IspString(MirrorString(ReplString("abc",2)),"-")

Code for `VString.each`

The slides don't say enough about how to put an each method in a class like VString but I'd like you to see what it looks like, so here is code for an each method for VString. Simply add it as-is to your class VString.

def each
    for i in 0...size
        yield self[i]
    end
    self
end

Note that self[i] will call VString.[], which will in turn use the char_at implementation in the subclass for the instance at hand.

Put all four classes in `vstring.rb`

Put the code for all four classes in vstring.rb. VString needs to be first.

Don't hesitate to dig into the test program for `vstring.rb`

The only test program for vstring.rb is a9/vstring1.rb. All the tester cases look something like this:

ruby a9/vstring1.rb 1m

Then, in a9/vstring1.rb, the command line argument (1m above) is used to select a test case:

case ARGV[0]
when "1r" then
    p s = ReplString.new("abc",10)
    show_ends s
    
when "1m" then
    p s = MirrorString.new("1234")
    show_ends s
    
...

We see that 1m makes a simple MirrorString.

Problem 4. Extra Credit `vstring-extra.txt`

For three points of extra credit, devise and implement another subclass for VString. For example, a fourth subclass I considered having you implement for VString was to be created like this:

XString.new("a", "bb", 10)

It represents the following sequence of characters, which ends with 10 "a"s and 20 "b"s:

abbaabbaaabbb...aaaaaaaaaabbbbbbbbbbbbbbbbbbbb

Add whatever new VString subclass you come up with to your vstring.rb and create vstring-extra.txt, a plain text file that talks about your creation and shows it action with irb.

The burden of proof for this extra credit is on you, not me!

Problem 5. (30 points) `optab.rb`

When I first conceived this problem, several years ago now, one of my favorite quotes came to mind:

Far better is it to dare mighty things, to win glorious triumphs, even though checkered by failure, than to take rank with those poor spirits who neither enjoy much nor suffer much because they live in the gray twilight that knows neither victory nor defeat.—Theodore Roosevelt

One way to learn about a language's types and operators is to manually create tables that show what type results from applying a binary operator to various pairs of types. For this problem you are to write a Ruby program, optab.rb, that generates such tables for Java, Ruby, and Haskell.

Here's a run of optab.rb:

$ ruby optab.rb ruby "*" ISA
 * | I  S  A 
---+---------
 I | I  *  * 
 S | S  *  * 
 A | A  S  *

Here's what optab.rb's three arguments mean:

The first argument, ruby, specifies Ruby as the language of interest for this run.
The second argument, "*", specifies the operator of interest. We'll make a practice of putting quotes around the operator because some operators, like * and <, are shell metacharacters. (Instead of quoting the asterisk we could also suppress its special meaning with a backslash: \*.)
The third argument, ISA, specifies types of interest. The letters I, S, and A stand for Integer, String, and Array, respectively.

optab's output is a table showing the type that results from applying the operator to various pairs of types. The row headings on the left specify the type of the left-hand operand. The column headings along the top specify the type of the right-hand operand.

Here are some notes on interpreting the table shown above:

The upper-left entry, an I, shows that Integer * Integer produces a Integer.
The lower-left entry, A, shows that Array * Integer produces an Array.
The S in the bottom of the middle row shows that Array * String produces a String.
The *'s indicate that Integer * String, String * String, and three other type combinations produce an error.

Here's an example with Java:

$ ruby optab.rb java "*" IFDCS
 * | I  F  D  C  S 
---+---------------
 I | I  F  D  I  * 
 F | F  F  D  F  * 
 D | D  D  D  D  * 
 C | I  F  D  I  * 
 S | *  *  *  *  *

I, F, D, C, and S stand for int, float, double, char, and String, respectively.

Here's how optab is intended to work:

For the specified operator and types, try each pairwise combination of types with the operator by executing that expression in the specified language and seeing what type is produced, or if an error is produced. Collect the results and present them in a table.

The table above was produced by optab.rb generating, running, and analyzing the output of twenty-five different Java programs. Here's what the first of those twenty-five Java programs looked like:

$ cat checkop.java
public class checkop {
    public static void main(String args[]) {
        f(1 * 1);
        }
    private static void f(Object o) {
        System.out.println(o.getClass().getName());
        }
    }

Note the third line, f(1 * 1); That's an int times an int because the first operation to try is I * I.

Remember: Ruby code in optab.rb wrote that Java program!

In Ruby, the expression `some-command-line` is called command expansion. It causes the shell to execute that command line. The complete output of the command is captured, turned into a string, possibly with many newlines, and is the result of `...`. (Note that ` is a "back quote".)

Here's a demonstration of using `...` to compile and execute checkop.java:

>> result = `bash -c "javac checkop.java && java checkop" 2>&1`
=> "java.lang.Integer\n"

The extra stuff with bash -c ... 2>&1 is to cause error output, if any, to be captured too.

Here's the checkop.java that's generated for I * S:

public class checkop {
    public static void main(String args[]) {
        f(1 * "abc");
        }
    private static void f(Object o) {
        System.out.println(o.getClass().getName());
        }
    }

Note that it is identical to the checkop.java generated for I * I with one exception: the third line is different: instead of being 1 * 1 it's 1 * "abc". Let's try compiling and running the checkop.java just above, generated for I * S:

$ irb
>> result = `bash -c "javac checkop.java && java checkop" 2>&1`
=> "checkop.java:3: error: bad operand types for binary operator
'*'\n                    f(1 * \"abc\");\n                     ^\n
  first type:  int\n  second type: String\n1 error\n"

javac detects incompatible types for * in this case and notes the error. java checkop is not executed because the shell conjunction operator, &&, requires that its left operand (the first command) succeed in order for execution to proceed with its right operand (the second command).

That output, "checkop.java:3: ..." can be analyzed to determine that there was a failure. Then, code maps that failure into a "*" entry in the table that's produced.

Let's try Haskell with the / operator. "D" is for Double.

$ ruby optab.rb haskell "/" IDS
 / | I  D  S 
---+---------
 I | *  *  * 
 D | *  D  * 
 S | *  *  *

For the first case, I / I, Ruby generated this file, checkop.hs:

$ cat checkop.hs
(1::Integer) / (1::Integer)
:type it

Note that just a plain 1 was good enough for Java since the literal 1 has the type int but with Haskell we use (1::Integer) to be sure the type is Integer. (Yes; Integer, not Int.)

Let's try running it. For Java we used javac and java. We'll use ghci for Haskell and redirect from checkop.hs. Be sure to specify the -ignore-dot-ghci option, too!

$ irb
>> result = `bash -c "ghci -ignore-dot-ghci < checkop.hs" 2>&1`
=> "GHCi, version 8.0.1: http://www.haskell.org/ghc/  :? for
help\nPrelude> \n:1:1: error:\n    • No instance
for (Fractional Integer) arising from a use of ‘/’\n
...lots more..."

Ouch—an error! That's going to be a "*". Here's the checkop.hs file generated for D * D:

$ cat checkop.hs
(1.0::Double) * (1.0::Double)
:type it

Let's try it:

>> result = `bash -c "ghci -ignore-dot-ghci < checkop.hs" 2>&1`
=> "GHCi, version 8.0.1: http://www.haskell.org/ghc/  :? for
help\nPrelude> 1.0\nPrelude> it :: Double\nPrelude> Leaving
GHCi.\n"

If we look close at the output above we see it, with a type: it :: Double.

In pseudo-code, here's what optab needs to do:

For each pairwise combinations of types specified on the command line...

Generate a source code file in the specified language to test the combination at hand.
Run the file using command expansion (`...`).
Analyze the command expansion result, determining either the type produced or that an error was produced.
Add an appropriate entry for the combination to the table—either a single letter for the type or an asterisk to indicate an error.

The examples above show Java and Haskell testing programs and their execution. You'll need to figure out how to do the same for Ruby, but let us know if you have trouble with that. The obvious route with Ruby is creating and running a file but you can use Kernel.eval instead. If you take the eval route, you'll probably need to do a bit of reading and figure out how to catch a Ruby exception using rescue.

I chose the names checkop.java and checkop.hs for the generated files but you can use any names you want.

Here is a complete program that generates a file named hello.java and runs it:

$ cat -n a9/mkfile.rb
     1  prog = <<X
     2  public class hello {
     3      public static void main(String args[]) {
     4          System.out.println("Hello, #{ARGV[0]}!");
     5          }
     6      }
     7  X
     8  # IMPORTANT: That X just above MUST BE IN THE FIRST COLUMN!
     9
    10  f = File.new("hello.java","w")
    11  f.write(prog)
    12  f.close
    13  result = `bash -c "javac hello.java && java hello" 2>&1`
    14  puts "Program output: (#{result.size} bytes)", result

Lines 1-7 use a "here" document to initialize the variable prog with a multi-line string literal. The <<X on line 1 starts the "here" document. The X by itself on line 7 ends the "here" document. (The analogous Python construction is a multi-line f-string.)

Note that at line 4 the first command-line argument for a9/mkfile.rb is incorporated into the Java string literal.

Let's run mkfile.rb with a single command-line argument:

$ ruby a9/mkfile.rb whm
Program output: (12 bytes)
Hello, whm!

Here's the Java source file that was created and run:

$ cat hello.java
public class hello {
    public static void main(String args[]) {
        System.out.println("Hello, whm!");
        }
    }

Copy a9/mkfile.rb into your a9 directory on lectura and try it, to help you get the idea of generating a program, running it, and then doing something with its output.

If you like to be tidy, you can use File's delete class method to delete hello.java: File.delete("hello.java").

The following table shows what types must be supported in each language, and a good expression to use for testing with that type.

Letter	Haskell	Java	Ruby
I	(1::Integer)	1	1
F	(1.0::Float)	1.0F	1.0
D	(1.0::Double)	1.0	not supported
B	True	true	true
C	'c'	'c'	not supported
S	"abc"	"abc"	"abc"
O	not supported	new Object()	not supported
A	not supported	not supported	[1]

optab.rb is not required to do any error checking at all. In particular, you can make these assumptions:

The first argument is haskell, java, or ruby.
The second argument is a valid operator in the language specified.
The third argument is a string of single-letter type specifications, and those types are supported for the language at hand.

Behavior is undefined for all other cases. I won't test any error cases.

I hope that everybody recognizes that there needs to be language-specific code for running the Java, Haskell, and Ruby tests but ONE body of code can be used to process command-line arguments, launch the language-specific tests, and build the result table.

Think in terms of an object-oriented solution, perhaps with an OpTable class that handles the language-independent elements of the problem. OpTable would have subclasses JavaOpTable, HaskellOpTable, and RubyOpTable with language-specific code.

My solution has a method OpTable.make_table that handles table generation. It calls a subclass method tryop(op, lhs, rhs) to try an operator with a pair of operands and report what's produced (a type or an error). With Java a call might be tryop("+", "1", '"abc"'); it would return "S". In contrast, RubyOpTable.tryop("+", "1", '"abc"') produces "*".

Note that testing the Java cases can be slow. With my version, ruby optab.rb java "*" IFDCS takes almost 30 seconds to run on lectura. The same test for Haskell takes about five seconds.

Problem 6. Extra Credit `optab-extra.txt`

For three points of extra credit per language, have your optab.rb support up to three additional languages of your choice. PHP, Python, and Perl come to mind as easy possibilities.

Details:

At least three types must be supported for each language.
You may introduce types in addition to those shown above.
Submit a plain text file optab.txt, that shows your extended version in action.
Demonstrate at least three operators for each language.
For Python, you can do either Python 2 or Python 3, but not both for credit.

The burden of proof for this extra credit is on you, not me!

Problem 7. Extra Credit `observations.txt`

Submit a plain text file named observations.txt with...

(a) (1 point extra credit) An estimate of how many hours it took you to complete this assignment. Put that estimate on a line by itself, like this:

Hours: 3.5

There should be only one "Hours:" line in observations.txt. (It's fine if you care to provide per-problem times, and that data is useful to us, but report it in some form of your own invention that doesn't contain the string "Hours:". Thanks!)

Feedback and comments about the assignment are welcome, too. Was it too long, too hard, too detailed? Speak up! I appreciate all feedback, favorable or not.

(b) (1-3 points extra credit) Cite an interesting course-related observation (or observations) that you made while working on the assignment. The observation should have at least a little bit of depth. Think of me thinking "Good!" as one point, "Excellent!" as two points, and "Wow!" as three points. I'm looking for quality, not quantity.

Turning in your work

Use a9/turnin to submit your work. You can run a9/turnin as often as you want. We'll grade your final pre-deadline submission.

Remember a9/turnin can also serve as a simple backup/snapshot facility. Don't hesitate to use it for such!

a9/turnin -l shows your submissions.

My solutions

To give you an idea about the size of my solutions, here's what I see as of press time:

$ wc $(grep -v txt a9/delivs)  # this command should work for you, too
  57  142 1241 switched.rb
  22   57  611 label.rb
 104  216 1821 vstring.rb
 148  328 3440 optab.rb
 331  743 7113 total

My code has few comments.

Miscellaneous

Point values of problems correspond closely to the "assignment points" mentioned in the syllabus. For example, a 10-point problem would correspond to about 1% of your final grade in the course.

If you're worried about whether a solution meets the restrictions, mail it to 372f22—we'll be happy to look it over. But don't wait too long; there may be a crunch at the end.

Feel free to use comments as you see fit, but no comments are required in your code.

In Ruby, # is comment to end of line, unless in a string literal or regular expression. There's no Ruby analog for /* ... */ in Java and {- ... -} in Haskell but you can comment out multiple lines by making them an embedded document—lines bracketed with =begin/=end starting in column 1. Example:

=begin
    Just some
    comments here.
=end

Silly-looking, huh? I agree! (It looks like a construction that escaped from the 1960s.)

Remember that late assignments are not accepted and that there are no late days; but if circumstances beyond your control interfere with your work on this assignment, there may be grounds for an extension. See the syllabus for details.

My estimate is that it will take a typical CS junior 8-12 hours to complete this assignment.

Our goal is that everybody gets 100% on this assignment AND gets it done in an amount of time that is reasonable for them.

Assignments are not take-home tests! We hope you'll make use of Piazza, email, Discord and office hours if problems arise. If you're getting toward twelve hours into this assignment and don't seem to be close to completing it, or you're simply worried about your "slope", email us! Give us a chance to speed you up!

Introduction

The Usual Stuff

Problem 1. (30 points) switched.rb

Implementation notes for switched

Problem 2. (20 points) label.rb

Implementation notes for label.rb

Don't hesitate to dig into the test program for label