Option: Make Your Own Assignment!

If you've got an idea for something you'd like to write in Ruby, you can propose that as a replacement for some or all of the problems on this assignment. To pursue this option, send me mail with a brief sketch of your idea. We'll negotiate on points and details.

The Usual Stuff

Make an a7 symlink that references /cs/www/classes/cs372/spring18/a7.
Test using a7/tester (or a7/t).
Use Ruby 2.2.4.

Don't forget about `label.rb`

In case you decided to do label.rb from assignment 6 as part of this assignment, here's a reminder to not forget it!

Don't get stuck on the regular expression problems

The first two problems, re.rb and bowl.rb, ask you to write regular expressions. The skills you practice on those problems will be a small help on optab.rb and a big help on gf.rb. They'll be of no help on vstring.rb. Don't feel like you need to finish re.rb and/or bowl.rb before starting on vstring.rb and optab.rb.

Don't hesitate to dig into the test programs

The tests for vstring.rb look like this:

ruby a7/vstring1.rb 1r
ruby a7/vstring1.rb 1m
...

The command line argument specifies the block of tests to run.

The tests for re.rb and gf.rb are similar, with command-line arguments selecting code to run in a7/re1.rb and a7/gf1.rb, respectively.

As with a6/label1.rb on assignment 6, you may need to dig around in a7/vstring1.rb et al. to figure out exactly what code is being executed for a particular failure and associated "diff". It may be useful to copy the test program into your directory and hack it up, maybe adding Kernel.p calls or making other changes to track down a bug.

Problem 1. (15 points, unevenly distributed across several sub-problems) `re.rb`

In this problem you are to write several methods. Each should return a regular expression that matches the specified strings (no more, no less) and, in some cases, creates named groups. It's fine to have the required methods use helper methods Here is an example specification:

Write a method letsdigs_re that produces a regular expression that matches strings that consist of one or more letters followed by three or more digits. The named groups lets and nums are set to the letters and digits, respectively. A dash may optionally appear between the two. If so, the named group dash is non-empty (i.e., not nil and not the empty string).

Here's a solution: (See slide 254 if you don't recognize the {3,} construct.)

def letsdigs_re
    /^(?<lets>[a-z]+)(?<dash>-?)(?<digs>\d+{3,})$/ 
end

Because the regular expression is the last expression in letsdigs_re, it is the return value. spring18/ruby/smg.txt contains a variant of show_match named smg that also shows named groups. Suggestion: Add smg to your ~/.irbrc.

Let's test letsdigs_re with smg:

>> smg(letsdigs_re, "abc123")
"<<abc123>>"
lets = <<abc>>, dash = <<>>, digs = <<123>>

>> smg(letsdigs_re, "abc-12")
no match

>> smg(letsdigs_re, "abc-1234")
"<<abc-1234>>"
lets = <<abc>>, dash = <<->>, digs = <<1234>>

A great resource for developing regular expressions is rubular.com. You can find letsdigs_re on Rubular here. Note that the "Your test string:" window has three test cases, one per line. There is a little bad news: Rubular doesn't seem to provide any way to build up a regular expression like shown on slide 269.

Here's an older resource: a video example of using show_match to gradually develop a regular expression: http://screencast.com/t/FO7OOIScCR39. (Note: That's a Flash video; I hope to redo it.)

The set of tests used for grading re.rb will be the set of tests used by a7/tester. The set will freeze 72 hours prior to the deadline.

Here are the methods you are to write for this problem:

(1 point) Write a method phone_re that produces a regular expression that matches strings that are phone numbers in any of these three forms: 555-1212, 800-555-1212, and (800) 555-1212.
(2 points) Write a method sentence_re that produces a regular expression that matches sentences, as follows: Sentences must begin with a capital letter. Sentences are composed of one or more words. Words are separated by exactly one blank. The sentence must end with a period, question mark, exclamation mark, or one of the two strings "!?" and "?!".
Two good sentences: "I shall test this!", "Xserwr AAA x."
A bad sentence: "it works!" (Doesn't start with a capital.)
See spring18/a7/sentence-hint.html if you have trouble getting started with this problem.
(3 points) Write a method hours_re that produces a regular expression that matches a specification of one or more office hours periods, like "MWF 12:00-13:00", or "T-R 9:30-12:30, F 19:00-0:00". Days are specified with one or more letters in the set [MTWRF], or a range from one day to another. Hours are in the range 0:00-23:55, with 5-minute resolution. Times before 10:00 can be optionally specified with a leading zero, like 09:45. Multiple periods are separated by a comma and a space.
(3 points) Here are some examples of ls -l output:
```
-rw-r-----   1 whm whm    543 Mar 14 18:19 ttl.rb
-rw-r--r--   1 whm whm   6555 Dec 10  2009 w.dat
lrwxrwxrwx   1 whm whm      6 Mar 19 20:56 a7/t -> tester
drwx------ 183 whm empty 1306 Mar 21 20:21 /home/whm/.
drwx--x--x  51 whm wheel  318 Jan 30 22:13 /x/closed
```
The first character indicates file type—d for directory, l for symbolic link, - for regular files. There are other types, too.
The next nine characters, which are three three-character groups, show the permissions. The first group of three characters is "user" permissions—what the owner of the file is permitted to do with the file. The next three characters are the group permissions. The third group of three is "other" permissions—what users who aren't the owner nor are in the appropriate group can do with the file. Ignoring some special cases, the first letter of the three-character groups is always either 'r' or '-'; the second is always 'w' or '-'; and the third is always 'x' or '-'. You'll never see something like 'rrw' or 'w-w'.
For this problem you're to write a method perms_re that produces a regular expression that matches lines of ls -l output for files whose "other" permissions (the third group of three) is not "---" and the file is not a symbolic link. In the lines above, w.dat and /x/closed meet that criteria.
When the match is successful, the named group perms should contain the three characters of "other" permissions (like "r--", "rwx", or "--x"), and the named group name should contain the file name, like "w.dat".
Assume that file names do not contain blanks, but you might find it interesting to consider handling that case, too.
Note that the format of the first ten characters never varies for ls -l output, but field widths for the rest of the line can vary. The upshot of this is that you can't assume that the file name begins in any particular column.
(3 points) For this problem you're to write a method vr_re that matches a string if it is a Vim range specification. A Vim range specification can be a single line specification or two line specifications separated by a comma.
Your vr_re should handle the following line specifications:
A simple line number, like 15
A dot (.) for the current line, or $ for the last line
A dot or $ followed by +N or -N, where N is a positive integer
A regular expression enclosed in slashes, like /abc/ or /^x/. Assume the regular expression doesn't contain any escaped slashes, like in /a\/b/. Here are some valid range specifications:
10 10,20 .,20 20,$ /abc/,. .-5,.+10 /begin/,/end/

When a match succeeds, the named groups from and to should be set if two line specifications are present. If there's only one, then $~["to"] should be nil or the empty string.
(3 points) Write a method path_re that produces a regular expression that matches UNIX paths and sets the named group dir to the directory name, base to the filename, minus extension, and ext to the extension, which is defined as everything in the filename to the right of the leftmost dot. If an element is not present, then $~[whatever-group] should be nil or the empty string.
Examples: (excerpt of output from ruby a7/re1.rb path < a7/re.path)
```
path: 'longest.java', dir = '', base = 'longest', ext = 'java'

path: '/etc/passwd', dir = '/etc/', base = 'passwd', ext = ''

path: '/cs/www/classes/cs372/spring16/tester/Test.rb', 
dir = '/cs/www/classes/cs372/spring16/tester/', base = 'Test', ext = 'rb'

path: '/home/whm/.bashrc', dir = '/home/whm/', base = '', ext = 'bashrc'
```

Miscellaneous

The above is skimpy on examples but you'll find plenty in a7/re.*. Those are input files for a7/re1.rb.

To be clear, re.rb, should consist of six methods. It should look something like this:

def phone_re
    ...
end

...four more here...

def path_re
    ...
end

A note on testing `re.rb`

For testing, a7/t re will test all the regular expressions but you can use the tester's -t option to test just one of the regular expressions. Example:

$ a7/t re -t phone

Along with "phone" there's "sen", "hours", "perms", "vr", and "path".

Problem 2. (3 points) `bowl.rb`

Here are the rules for the composition of the scoring string you are to match:

Each of the first n-1 frames can be classified as a strike, a spare, or an open:
- A strike is when all ten pins are knocked down with the first (and therefore only) ball of the frame.
  A strike is recorded as a single "X".
- A spare is when one or more pins are left standing after the first ball, and then are all knocked down by the second ball.
  A spare is recorded as the first ball count (the number of pins down) followed by a slash. If the first ball was a gutter ball—no pins down— it is recorded as a dash (-), not a zero. Examples of spares:
```
7/
9/
-/
```
- An open (or a "flat") is when pins are left standing after both balls of the frame.
  An open is recorded as the number of pins knocked by each of the two balls. A zero is recorded as a dash. Examples of opens:
```
9-
72
-3
--
```
The final frame, the tenth in real-life, is complicated because of bowling's bonus rules. I won't bother with those rules here, but here are the recording possibilities for the tenth frame:
- An open, like "9-" or "62".
- A spare and a "count ball". The count ball can be -, 1-9, or X. Examples: "7/9", "6/X", "-/-"
- A strike followed by two balls that are recorded like a spare or an open. Examples: "X9/", "X81", "X-/", "X--".
- Two strikes and a count ball. Examples: "XXX", "XX9", "XX-"
The strings representing frames are separated by exactly one space.

Note: If you've never been bowling, or maybe even if you have, the above description might be insufficient. I surely don't want lack of domain knowledge to be stumbling block on this problem, so come by during office hours if don't understand what's going on. This section of a Wikipedia article and/or this page might help, too.

Examples, shown with show_match from slide 232:

>> show_match bowl_re(3), "9/ 81 X9/"
=> "<<9/ 81 X9/>>"

>> show_match bowl_re(3), "9/ 81 X9/ XXX"
=> "no match"  (3-frame game specified but scores for a 4-frame game are present)

>> show_match bowl_re(1), "--"
=> "<<-->>"

>> show_match bowl_re(4), "-- X 9/ XX9"
=> "<<-- X 9/ XX9>>"

>> bowl_re(101) =~ "9- " * 100 + "9/X"
=> 0

>> bowl_re(101) =~ "0- " * 100 + "9/X"
=> nil

Note that like several of the re.rb problems, the task in this problem is simply to produce a regular expression that recognizes whether a string is a valid representation of an n-frame game. In particular, there's no computation of a numeric score.

Incidentally, I'd hoped to have a bowling outing with those of you in town during spring break. Time for that never materialized but if you'd like to go bowling sometime, mail me. If there's interest, we'll see if there's a time that works before the end of the semester. My treat, of course!

Problem 3. (22 points) `vstring.rb`

This problem builds on the VString example shown on slides 320-326. I suggest you review those slides before reading further.

For this problem you are to implement a hierarchy of four Ruby classes: VString, ReplString, MirrorString, and IspString. VString stands for "virtual string"—these classes create the illusion of very long strings but relatively little data is used to create that illusion.

VString serves as an abstract superclass of ReplString, MirrorString, and IspString; it simply provides some methods that are used by those subclasses.

ReplString represents strings that consist of zero or more replications of a "base" string. Example:

$ irb
>> load "vstring.rb"

>> s1 = ReplString.new("abc", 2)
=> ReplString("abc",2)

Note that irb used s1.inspect to produce the string 'ReplString("abc",2)'. That string shows the type, base string, and replication count.

VString subclasses support only a handful of operations: size, [n], [n,m], inspect, to_s, and each. The semantics of [n] and [n,m] are the same as for Ruby's String class with one exception, described below.

Starting with s1 created above, here are some examples:

>> s1.size         => 6

>> s1.to_s         => "abcabc"

>> s1.to_s.class   => String

>> s1[0]           => "a"

>> s1[2,4]         => "cabc"

>> s1[-5,2]        => "bc"

>> s1[-3,10]       => "abc"

>> s1[10]          => nil

A ReplString can represent a very long string:

>> s2 = ReplString.new("xy", 1_000_000_000_000)
=> ReplString("xy",1000000000000)

>> s2.size               => 2000000000000

>> s2[-1]                => "y"

>> s2[-2,2]              => "xy"

>> s2[1_000_000_000]     => "x"

>> s2[1_000_000_001]     => "y"

>> s2[1_000_000_001,7]   => "yxyxyxy"

Some operations are impractical on very long strings. For example, s2.to_s would require a vast amount of memory, but if the user asked for it, we'd let it run. Similarly, s2[n,m] has the potential to produce an impractically large result.

A MirrorString represents a string concatenated with a reversed copy of itself. Examples:

>> s3 = MirrorString.new("1234") => MirrorString("1234")

>> s3.size                => 8

>> s3.to_s                => "12344321"

>> s3[-1]                 => "1"

>> s3[2,4]                => "3443"

An IspString represents a string with instances of another string interspersed between each character:

>> s4 = IspString.new("xyz", "...")  => IspString("xyz","...")

>> s4.to_s                   => "x...y...z"

>> s4[0]                     => "x"

>> s4[1]                     => "."

>> s4[s4.size-1]             => "z"

In the above examples the strings used in the subclass constructors are instances of String but they can be an arbitrary nesting of VStrings, too:

>> s5 = IspString.new(MirrorString.new("abc"), ReplString.new(".",3))
=> IspString(MirrorString("abc"),ReplString(".",3))

>> s5.to_s
=> "a...b...c...c...b...a"

>> s6 = ReplString.new(s5,1_000_000_000_000_000)
=>ReplString(IspString(MirrorString("abc"),ReplString(".",3)),100000
0000000000)

>> s6.size
=> 21000000000000000

>> s6[s6.size-20,100]
=> "...b...c...c...b...a"

>> s6[(s5.size)*(1_000_000_000_000_000-2),50]
=> "a...b...c...c...b...aa...b...c...c...b...a"

To emphasize that the "V" in VString stands for "virtual", consider this interaction:

>> s=MirrorString.new(ReplString.new("abc",100000000000000000000000000 0000000000000000000000000000000000000000000000000000000000000000000000 0000000000000000000000000000000000000000000000000000000000000000000)) =>MirrorString(ReplString("abc",1000000000000000000000000000000000000 000000000000000000000000000000000000000000000000000000000000000000000 0000000000000000000000000000000000000000000000000000000000))

>> s.size =>6000000000000000000000000000000000000000000000000000000000000000000 000000000000000000000000000000000000000000000000000000000000000000000 0000000000000000000000000000

>> s2 = IspString.new(s,s) =>IspString(MirrorString(ReplString("abc",100000000000000000000000000 000000000000000000000000000000000000000000000000000000000000000000000 00000000000000000000000000000000000000000000000000000000000000000000)), MirrorString(ReplString("abc",10000000000000000000000000000000000000000 00000000000000000000000000000000000000000000000000000000000000000000000 0000000000000000000000000000000000000000000000000000)))

>> s2.size.to_s.size
=> 328

The iterator each is available for ReplString, MirrorString, and IspString. It produces all the characters in turn, as one-character strings. It returns the receiver.

>> s1 = ReplString.new("abc",2)  => ReplString("abc",2)

>> s1.each {|x| puts x}
a
b
c
a
b
c
=> ReplString("abc",2)

Be sure to 'include Enumerable' in VString, so that methods like map, sort, and all? work:

>> MirrorString.new("abc").map { |c| c }
=> ["a", "b", "c", "c", "b", "a"]

>> MirrorString.new(ReplString.new("abc",2)).sort
=> ["a", "a", "a", "a", "b", "b", "b", "b", "c", "c", "c", "c"]

>> MirrorString.new(
              ReplString.new("a", 100000)).all? { |c| c == "a" }
=> true

Using Ranges to subscript VStrings is not supported:

>> s1 = MirrorString.new("ab")  => MirrorString("ab")
>> s1[0..-1]
NoMethodError: ...

We won't test with ranges.

TL;DR

Here's a quick summary of what's required of the VString subclasses and their constructors:

ReplString.new(s, n) where s is a non-empty String or a VString and n > 0.
MirrorString.new(s) where s is a non-empty String or a VString.
IspString.new(s1,s2) where s1 and s2 are non-empty Strings or are VStrings.
size, inspect, and to_s methods
Subscripting with [n] and [start,len], with the same semantics as for Ruby's String class.
Subscripting with a Range is not supported.
VString implements the iterator each and includes Enumerable.

The `String` semantics exception

There is one exception to matching the semantics of String when subscripting with [start,len]. Note this behavior of String:

>> s="abc"   => "abc"

>> s[3]      => nil

>> s[3,1]    => ""

>> s[4,1]    => nil

You might find it interesting to think about why Strings have that behavior but we won't match it with VString. Instead, if start is out of bounds, nil is produced:

>> s = ReplString.new("abc",2)
=> ReplString("abc",2)

>> s[5,10]   => "c"

>> s[6,10]   => nil

`VString` itself is never directly tested

When grading, tests will only exercise the VString subclasses; VString will not be tested directly. (Note that none of the examples above do anything with VString itself.)

Implementation Notes

Division of labor between VString and its subclasses

If you've completed 335 you might not need this advice but here's what I recommend for what to put where with respect to VString and and its subclasses:

VString
- Methods: [], to_s, and each. No initialize method.
- VString should have no instance variables. Instead, have all instance variables in the subclasses. (Recall that with the Shape example in the slides, all the data was held by Rectangle and Circle, aside from a label in Shape.)
Subclasses of VString (ReplString, MirrorString, IspString)
- Methods: initialize, size, char_at, and inspect.
- Each subclass should have instance variables that support the behavior of the instances. For example, a straightforward implementation of ReplString would have two instance variables: one to hold the base string and one to hold the replication count.

In my solution all of the subclass methods are very short—typically a line or two of code.

It may help to imagine that there will eventually be dozens of VString subclasses instead of just the three specified here. Having that mindset, and wanting to write as little code as possible to implement those dozens of subclasses, should motivate you to do as much as possible in VString and as little as possible in the subclasses. However, VString should have absolutely no code that knows anything about any of its subclasses.

Implementation of subscripting

VString itself should have this method:

def [](start, len = 1)
    ...
end

That defaulted second argument allows this one method to handle both s[n] and s[start,len].

Implement VString.[] in terms of calls to size and char_at, which in turn will resolve to the implementation of those methods in the three subclasses.

As example of resolution, consider the following code:

>> s = ReplString.new("abc",10)
>> c = s[5]

To evaluate s[5] Ruby will first see if ReplString.[] exists. It doesn't, so the call s.[](5) will resolve to VString.[]. In turn, VString.[] calls char_at to actually produce a one-character string. That char_at will resolve to ReplString.char_at.

With deeply nested constructions of VString subclasses there's a potential of subscripting having exponential behavior. The simple way to avoid that problem is to have only one place in your VString.[] method where char_at is called. (Save the value with c = char_at(...) if you need to use it in multiple places.)

Strong recommendation: VString.[](start, len = 1) can be called with a negative starting position. As a first step, normalize a negative starting position, which is relative the end of the string, to the corresponding starting position that is relative to the start of the string. For example, given s = ReplString("abc",2), s[-2] is equivalent to s[4]. By doing that normalization you'll avoid the need for code like this:

if position >= 0 then
    ...code to handle positive positions...
else
    ...code to handle negative positions...

Don't get tangled up considering various cases of nesting

Watch out for thinking like this: "What if I've got a ReplString that holds a MirrorString of a MirrorString of a ReplString of an IspString made of ..."

Instead, just keep in mind that what you can count on is that the values used in the VString constructors support size, s[n], s[start,len], and inspect. Have a duck-typing mindset and write code that uses those operations. (Yes, there's to_s, too, but it's problematic with long strings; avoid using it.)

Those double-quotes in `inspect`

Getting the double-quotes right for inspect can be a little tricky. Start by getting the following examples working:

>> s = ReplString.new("a\nb", 2)
=> ReplString("a\nb",2)

>> s2 = MirrorString.new("x\ty")
=> MirrorString("x\ty")

>> s3 = ReplString.new(s2,10)
=> ReplString(MirrorString("x\ty"),10)

Saving a little typing when testing

To save myself some typing when testing, I've got these lines at the end of my vstring.rb:

if !defined? RS
    RS=ReplString
    MS=MirrorString
    IS=IspString
end

With those in place, I can do something like this:

>> s = IS.new(MS.new(RS.new("abc",2)),"-")
=> IspString(MirrorString(ReplString("abc",2)),"-")

Code for `VString.each`

The slides don't say enough how to put an each method in a class like VString but I'd like you to see what it looks like, so here is code for an each method for VString. Simply add it as-is to your class VString.

def each
    for i in 0...size
        yield self[i]
    end
    self
end

Note that self[i] will call VString.[], which will in turn use the char_at implementation in the subclass for the instance at hand.

Put all four classes in `vstring.rb`

Put the code for all four classes in vstring.rb. VString needs to be first.

Problem 4. Extra Credit `vstring-extra.txt`

For three points of extra credit, devise and implement another subclass for VString. For example, a fourth subclass I considered having you implement for VString was to be created like this:

XString.new("a","bb", 10)

It represents the following sequence of characters, which ends with 10 "a"s and 20 "b"s:

abbaabbaaabbb...aaaaaaaaaabbbbbbbbbbbbbbbbbbbb

You can't implement XString for credit but I'll be impressed if you do it for fun.

Add whatever new VString subclass you come up with to your vstring.rb and create vstring-extra.txt, a plain text file that talks about your creation and shows it action with irb.

The burden of proof for this extra credit is on you, not me!

Problem 5. (12 points) `gf.rb`

Here's something I saw in a book:

class Fixnum
    def hours; self*60 end  # 60 minutes in an hour
end

>> 2.hours  => 120

>> 24.hours => 1440

You are to write a Ruby method gf(spec) that dynamically adds (i.e., "monkey patches") a number of such methods into the Fixnum class, as directed by spec. Example:

gf("foot/feet=1,yard(s)=3,mile(s)=5280")

Using Kernel.eval, the above call to gf adds nine methods to Fixnum. Here are six of them: foot, feet, yard, yards, mile, and miles. The other three are mentioned below.

Respectively, on a pair-wise basis, those methods return the result of multiplying the receiver (the Fixnum) by 1, 3, and 5280.

$ irb -r ./gf.rb
>> gf("foot/feet=1,yard(s)=3,mile(s)=5280")
=> true

>> 1.foot         => 1

>> 10.feet        => 10

>> 5.yards        => 15

>> 3.miles        => 15840

>> 8.mile         => 42240

>> 1.feet         => 1

It would perhaps be useful to detect plurality mismatches like 8.mile and 1.feet and produce an error but that is not done.

In addition to the six methods mentioned above, these three are added to Fixnum, too: in_feet, in_yards, and in_miles:

>> (30.feet+10.yards).in_yards   => 20.0

>> 10_000.feet.in_miles          => 1.89393939393939

Two more examples:

>> gf("second(s)=1,minute(s)=60,hour(s)=3600,day(s)=86400")
=> true

>> (12.hours+30.minutes).in_days => 0.520833333333333

>> gf("inch(es)=1,foot/feet=12") => true

>> 18.inches.in_feet             => 1.5

>> 1.foot                        => 12

>> 1.foot.in_inches              => 12.0

Note that methods later generated by gf simply replace earlier methods of the same name. After the two calls gf("foot/feet=1") and gf("foot/feet=12"), 1.foot is 12.

Here are some rules about the mappings:

An individual mapping must be in one of these two forms:
- singular/plural=integer
- singular(pluralSuffix)=integer
None of the parts may be empty.
Mappings are separated by commas.
Only lowercase letters are permitted in the names.
No whitespace is allowed.

If any part of a specification is invalid, a message is printed and false is returned; but the result is otherwise undefined. Here is an example of the output in the case of an error:

>> gf("foot/feet=1,yards=3")
bad spec: 'foot/feet=1,yards=3'
=> false

Note that the error is not pin-pointed; the specification as a whole is cited as being invalid.

Here are more examples of invalid specifications:

gf("foot/feet=1,")          # trailing comma
gf("foot/feet=1.5")         # non-integer
gf("foot/=1")               # empty plural
gf("inch()=12")             # empty plural suffix
gf("foot/feet=1,Yard(s)=3") # capital letter

This is NOT a restriction but to get more practice with regular expressions I recommend that your solution not use any string comparisons. Instead use matches (=~) to break up the specification. And, using regular expressions will probably increase the likelihood that you accept exactly what's valid.

For this problem you are to use Kernel.eval (slide 288+) to add the methods to Fixnum, but as the slides show, using eval to generate code based on data supplied by another person can be perilous! eval is a powerful tool but you've got to be careful when using it.

Keep in mind that you're writing a method named gf, not a program. Helper methods are permitted.

In assignment 3's write-up for editstr it was mentioned that the bindings for x, len, etc. were the beginnings of a simple internal DSL (Domain Specific Language). The list [x 2, len, x 3, rev, xlt "1" "x"] uses the facilities of Haskell to specify computation in a new language that's specialized for string manipulation. Similarly, this problem provides another example of an internal DSL by using the facilities of Ruby to express computations involving unit conversions.

Incidentally, if you Google for ruby eval you'll find a lot of discussion about it, but be wary of those who say, "Never use eval!" Instead, recognize that eval is a powerful tool that should be used with caution, weighing risks and benefits, and ALWAYS being careful to consider the source of all data that can possibly contribute directly or indirectly to a string that is passed to eval.

Problem 6. (22 points) `optab.rb`

When writing this problem one of my favorite quotes came to mind:

Far better is it to dare mighty things, to win glorious triumphs, even though checkered by failure, than to take rank with those poor spirits who neither enjoy much nor suffer much because they live in the gray twilight that knows neither victory nor defeat.—Theodore Roosevelt

One way to learn about a language's types and operators is to manually create tables that show what type results from applying a binary operator to various pairs of types. For this problem you are to write a Ruby program, optab.rb, that generates such tables for Java, Ruby, and Haskell.

Here's a run of optab.rb:

$ ruby optab.rb ruby "*" ISA
 * | I  S  A 
---+---------
 I | I  *  * 
 S | S  *  * 
 A | A  S  *

Here's what optab.rb's three arguments mean:

The first argument, ruby, specifies Ruby as the language of interest for this run.
The second argument, "*", specifies the operator of interest. We'll make a practice of putting quotes around the operator because some operators, like * and <, are shell metacharacters. (Instead of quoting the asterisk we could also suppress its special meaning with a backslash: \*.)
The third argument, ISA, specifies types of interest. The letters I, S, and A stand for Fixnum (I for integer), String, and Array, respectively.

optab's output is a table showing the type that results from applying the operator to various pairs of types. The row headings on the left specify the type of the left-hand operand. The column headings along the top specify the type of the right-hand operand.

Here are some notes on interpreting the table shown above:

The upper-left entry, an I, shows that Fixnum * Fixnum produces a Fixnum. (Remember that we're using I, not F, to stand for integers.)
The lower-left entry, A, shows that Array * Fixnum produces an Array.
The S in the bottom of the middle row shows that Array * String produces a String.
The *'s indicate that Fixnum * String, String * String, and three other type combinations produce an error.

Here's an example with Java:

$ ruby optab.rb java "*" IFDCS
 * | I  F  D  C  S 
---+---------------
 I | I  F  D  I  * 
 F | F  F  D  F  * 
 D | D  D  D  D  * 
 C | I  F  D  I  * 
 S | *  *  *  *  *

I, F, D, C, and S stand for int, float, double, char, and String, respectively.

Here's how optab is intended to work:

For the specified operator and types, try each pairwise combination of types with the operator by executing that expression in the specified language and seeing what type is produced, or if an error is produced. Collect the results and present them in a table.

The table just above was produced by generating and then running each of twenty-five different Java programs and analyzing their output. Here's what the first one looked like:

$ cat checkop.java
public class checkop {
    public static void main(String args[]) {
        f(1 * 1);
        }
    private static void f(Object o) {
        System.out.println(o.getClass().getName());
        }
    }

Note the third line, f(1 * 1); That's an int times an int because the first operation to test is I * I.

Remember: Ruby code wrote that Java program!

In Ruby, the expression `some-command-line` is called command expansion. It causes the shell to execute that command line. The complete output of the command is collected, turned into a string, possibly with many newlines, and is the result of `...`. (Note that ` is a "back quote".)

Here's a demonstration of using `...` to compile and execute checkop.java:

>> result = `bash -c "javac checkop.java && java checkop" 2>&1`
=> "java.lang.Integer\n"

The extra stuff with bash -c ... 2>&1 is to cause error output, if any, to be collected too.

Here's the checkop.java that's generated for I * S:

public class checkop {
    public static void main(String args[]) {
        f(1 * "abc");
        }
    private static void f(Object o) {
        System.out.println(o.getClass().getName());
        }
    }

Note that it is identical to the checkop.java generated for I * I with one exception: the third line is different: instead of being 1 * 1 it's 1 * "abc". Let's try compiling and running the checkop.java just above, generated for I * S:

$ irb
>> result = `bash -c "javac checkop.java && java checkop" 2>&1`
=> "checkop.java:3: error: bad operand types for binary operator
'*'\n                    f(1 * \"abc\");\n                     ^\n
  first type:  int\n  second type: String\n1 error\n"

javac detects incompatible types for * in this case and notes the error. java checkop is not executed because the shell conjunction operator, &&, requires that its left operand (the first command) succeed in order for execution to proceed with its right operand (the second command).

That output, "checkop.java:3: ..." can be analyzed to determine that there was a failure. Then, code maps that failure into a "*" entry in the table that's produced.

Let's try Haskell with the / operator. "D" is for Double.

$ ruby optab.rb haskell "/" IDS
 / | I  D  S 
---+---------
 I | *  *  * 
 D | *  D  * 
 S | *  *  *

For the first case, I / I, Ruby generated this file, checkop.hs:

$ cat checkop.hs
(1::Integer) / (1::Integer)
:type it

Note that just a plain 1 was good enough for Java since the literal 1 has the type int but with Haskell we use (1::Integer) to be sure the type is Integer. (Yes; Integer, not Int.)

Let's try running it. For Java we used javac and java. We'll use ghci for Haskell and redirect from checkop.hs. Be sure to specify the -ignore-dot-ghci option, too!

$ irb
>> result = `bash -c "ghci -ignore-dot-ghci < checkop.hs" 2>&1`
=> "GHCi, version 8.0.1: http://www.haskell.org/ghc/  :? for
help\nPrelude> \n:1:1: error:\n    • No instance
for (Fractional Integer) arising from a use of ‘/’\n
...lots more..."

Ouch—an error! That's going to be a "*". Here's the checkop.hs file generated for D * D:

$ cat checkop.hs
(1.0::Double) * (1.0::Double)
:type it

Let's try it:

>> result = `bash -c "ghci -ignore-dot-ghci < checkop.hs" 2>&1`
=> "GHCi, version 8.0.1: http://www.haskell.org/ghc/  :? for
help\nPrelude> 1.0\nPrelude> it :: Double\nPrelude> Leaving
GHCi.\n"

If we look close at the output above we see it, with a type: it :: Double.

In pseudo-code, here's what optab needs to do:

For each pairwise combinations of types specified on the command line...

Generate a source code file in the specified language to test the combination at hand.
Run the file using command expansion (`...`).
Analyze the command expansion result, determining either the type produced or that an error was produced.
Add an appropriate entry for the combination to the table—either a single letter for the type or an asterisk to indicate an error.

The examples above show Java and Haskell testing programs and their execution. You'll need to figure out how to do the same for Ruby, but let us know if you have trouble with that. The obvious route with Ruby is creating and running a file but you can use Kernel.eval instead. If you take the eval route, you'll probably need to do a bit of reading and figure out how to catch a Ruby exception using rescue.

I chose the names checkop.java and checkop.hs for the generated files but you can use any names you want.

Below is an example of a complete program that generates a file named hello.java and runs it. Note that the program's command-line argument is interpolated into the "here document", which is a multi-line string literal. (See slide 89.)

$ cat a7/mkfile.rb
prog = <<X
public class hello {
    public static void main(String args[]) {
        System.out.println("Hello, #{ARGV[0]}!");
        }
    }
X
# IMPORTANT: That X just above MUST BE IN THE FIRST COLUMN!

f = File.new("hello.java","w")
f.write(prog)
f.close
result = `bash -c "javac hello.java && java hello" 2>&1`
puts "Program output: (#{result.size} bytes)", result

$ ruby mkfile.rb whm
Program output: (12 bytes)
Hello, whm!

Here's the file that was created:

$ cat hello.java
public class hello {
    public static void main(String args[]) {
        System.out.println("Hello, whm!");
        }
    }

Copy a7/mkfile.rb into your a7 directory on lectura and try it, to help you get the idea of generating a program, running it, and then doing something with its output.

If you like to be tidy, you can use File's delete class method to delete hello.java: File.delete("hello.java").

The following table shows what types must be supported in each language, and a good expression to use for testing with that type.

Letter	Haskell	Java	Ruby
I	(1::Integer)	1	1
F	(1.0::Float)	1.0F	1.0
D	(1.0::Double)	1.0	not supported
B	True	true	true
C	'c'	'c'	not supported
S	"abc"	"abc"	"abc"
O	not supported	new Object()	not supported
A	not supported	not supported	[1]

optab.rb is not required to do any error checking at all. In particular, you can make these assumptions:

The first argument is haskell, java, or ruby.
The second argument is a valid operator in the language specified.
The third argument is a string of single-letter type specifications; those types are supported for the language at hand.

Behavior is undefined for all other cases. I won't test any error cases.

I hope that everybody recognizes that there needs to be language-specific code for running the Java, Haskell, and Ruby tests but ONE body of code can be used to process command-line arguments, launch the language-specific tests, and build the result table.

Think in terms of an object-oriented solution, perhaps with an OpTable class that handles the language-independent elements of the problem. OpTable would have subclasses JavaOpTable, HaskellOpTable, and RubyOpTable with language-specific code.

My solution has a method OpTable.make_table that handles table generation. It calls a subclass method tryop(op, lhs, rhs) to try an operator with a pair of operands and report what's produced (a type or an error). With Java a call might be tryop("+", "1", '"abc"'); it would return "S". In contrast, RubyOpTable.tryop("+", "1", '"abc"') produces "*".

Note that testing the Java cases can be slow. With my version, ruby optab.rb java "*" IFDCS takes almost 30 seconds to run on lectura. The same test for Haskell takes about seven seconds.

Ruby's command expansion (`...`) works on Windows but I haven't tried to work out command lines that will behave as well as the examples above, which were done on lectura. The bottom line is that you'll probably need to do much of your testing on lectura. However, if you use an eval-based approach for Ruby, you can easily get that working on Windows. If you want to write code that runs on both Windows and UNIX, you can use RUBY_PLATFORM as a simple way to see what sort of system you're running on.

Problem 7. Extra Credit `optab-extra.txt`

For three points of extra credit per language, have your optab.rb support up to three additional languages of your choice. PHP, Python, and Perl come to mind as easy possibilities.

Details:

At least three types must be supported for each language.
You may introduce types in addition to those shown above.
Submit a plain text file optab.txt, that shows your extended version in action.
Demonstrate at least three operators for each language.
For Python, you can do either Python 2 or Python 3, but not both for credit.

The burden of proof for this extra credit is on you, not me!

Problem 8. Extra Credit `observations.txt`

Submit a plain text file named observations.txt with...

(a) (1 point extra credit) An estimate of how many hours it took you to complete this assignment. Put that estimate on a line by itself, like this:

Hours: 3.5

There should be only one "Hours:" line in observations.txt. (It's fine if you care to provide per-problem times, and that data is useful to us, but report it in some form of your own invention that doesn't contain the string "Hours:". Thanks!)

Feedback and comments about the assignment are welcome, too. Was it too long, too hard, too detailed? Speak up! I appreciate all feedback, favorable or not.

(b) (1-3 points extra credit) Cite an interesting course-related observation (or observations) that you made while working on the assignment. The observation should have at least a little bit of depth. Think of me thinking "Good!" as one point, "Excellent!" as two points, and "Wow!" as three points. I'm looking for quality, not quantity.

Turning in your work

Use a7/turnin to submit your work. Each run creates a time-stamped "tar file" in your current directory with a name like aN.YYYYMMDD.HHMMSS.tz. You can run a7/turnin as often as you want. We'll grade your final pre-deadline submission.

Use a7/turnin -l to list your submissions.

To give you an idea about the size of my solutions, here's what I see as of press time:

$ wc $(grep -v txt a7/delivs)
  22   57  611 label.rb
  29   47  599 re.rb
  21   42  294 bowl.rb
 104  216 1821 vstring.rb
  35   88  870 gf.rb
 148  328 3440 optab.rb
 359  778 7635 total

My code has few comments.

Note that each of the a7.*.tz files created in your directory by a7/turnin is essentially a time-stamped snapshot of your code. (If you need to recover a file and aren't familiar with tar, perhaps mail to 372s18—it's easy to accidentally overwrite your latest versions.)

Miscellaneous

Point values of problems correspond closely to the "assignment points" mentioned in the syllabus. For example, a 10-point problem would correspond to about 1% of your final grade in the course.

Feel free to use comments to document your code as you see fit, but note that no comments are required, and no points will be awarded for documentation itself. (In other words, no part of your score will be based on documentation.)

In Ruby, # is comment to end of line, unless in a string literal or regular expression. There's no Ruby analog for /* ... */ in Java and {- ... -} in Haskell but you can comment out multiple lines by making them an embedded document—lines bracketed with =begin/=end starting in column 1. Example:

=begin
    Just some
    comments here.
=end

Remember that late assignments are not accepted and that there are no late days; but if circumstances beyond your control interfere with your work on this assignment, there may be grounds for an extension. See the syllabus for details.

My estimate is that it will take a typical CS junior from 9 to 13 hours to complete this assignment.

Our goal is that everybody gets 100% on this assignment AND gets it done in an amount of time that is reasonable for them.

Assignments are not take-home tests! We hope you'll make use of Piazza, email and office hours if you run into trouble. We're happy to help you get started with any or all of these problems but in any event, if you put 8 hours into this assignment and don't seem to be close to completing it, it's definitely time to touch base with us. Specifically mention that you've reached six hours. Give us a chance to speed you up!

Option: Make Your Own Assignment!

The Usual Stuff

Don't forget about label.rb

Don't get stuck on the regular expression problems

Don't hesitate to dig into the test programs

Problem 1. (15 points, unevenly distributed across several sub-problems) re.rb

Miscellaneous

A note on testing re.rb

Problem 2. (3 points) bowl.rb

Problem 3. (22 points) vstring.rb

TL;DR

The String semantics exception

VString itself is never directly tested

Implementation Notes

Division of labor between VString and its subclasses

Implementation of subscripting

Don't get tangled up considering various cases of nesting

Those double-quotes in inspect

Saving a little typing when testing

Code for VString.each

Put all four classes in vstring.rb

Problem 4. Extra Credit vstring-extra.txt

Problem 5. (12 points) gf.rb

Problem 6. (22 points) optab.rb

Problem 7. Extra Credit optab-extra.txt

Problem 8. Extra Credit observations.txt

Turning in your work

Miscellaneous

Don't forget about `label.rb`

Problem 1. (15 points, unevenly distributed across several sub-problems) `re.rb`

A note on testing `re.rb`

Problem 2. (3 points) `bowl.rb`

Problem 3. (22 points) `vstring.rb`

The `String` semantics exception

`VString` itself is never directly tested

Those double-quotes in `inspect`

Code for `VString.each`

Put all four classes in `vstring.rb`

Problem 4. Extra Credit `vstring-extra.txt`

Problem 5. (12 points) `gf.rb`

Problem 6. (22 points) `optab.rb`

Problem 7. Extra Credit `optab-extra.txt`

Problem 8. Extra Credit `observations.txt`