CSC 372, Spring 2018 Assignment 6 Due: Tuesday, March 27 2018 at 23:59:59

The Usual Stuff

A tiny typing saver

My ~/.bashrc has a series of aliases like this:
...
alias t5=a5/tester
alias t6=a6/tester
alias t7=a7/tester
...
With those aliases I can just type t6 switched to run the tester for switched.rb on this assignment.

I could go further and have per-problem aliases, too:

alias tsw="a6/tester switched"

A note about problems 1-4

Each of the first four problems ask you to write an iterator, which is a single method, rather than a program. The write-up for the first iterator, eo, shows a couple of ways to approach the edit-run cycle.

Each of the iterators has a "duck typing" specification that describes what the iterator requires of its argument(s), such as being subscriptable, or having a size or each method.

IMPORTANT: The first test for each iterator is a Ruby program with a name like ITERATOR-ap.rb. The "ap" stands for all points—when grading, all the points for the problem will be based on whether you pass the "ap" test. The "ap" tests use dumbed-down classes that provide only the capabilities that the iterator is supposed to require. For example, eo-ap.rb uses a class named Dumb that provides only subscripting with x[n] and a size method. If you're passing some cases for an iterator but failing the "ap" test, then your implementation is requiring more capabilities of its argument(s) than it should be, according to the duck typing specification.

For one point of extra credit, track the the total time you spend on problems 1-4 (the iterators) and report it in observations.txt with a line of this form:

Hours for iterators: 1.5

If you do that, do not subtract those hours from your overall hours. For example, if you spend 1.5 hours on the iterators and another 4 hours on the rest of the assignment, report Hours: 5.5 in your observations.

Problem 1. (2 points) eo.rb

Write a Ruby iterator eo(x) that yields every other element in x.

eo(x) returns x.

Duck typing: eo only requires x to be subscriptable with x[n] and have a size method.

Examples:

>> eo([10,20,30,40]) {|v| puts v }
10
30
=> [10, 20, 30, 40]

>> eo("testing") {|c| print "'#{c}' "}
't' 's' 'i' 'g' => "testing"

>> sum = 0; eo((1..10).to_a) {|v| sum += v}; sum
=> 25

>> eo([]) {|v| puts v }
=> []
Here's a way to approach the edit-run cycle with eo and the three other iterators.

First, add the following line to your ~/.irbrc:

$eo="eo.rb"
Then, after restarting irb, you can do this:
$ irb
>> load $eo
=> true
>> eo("abcd") {|c| puts c}
a
c
=> "abcd"
That works because load is a Ruby function. That contrasts with :load in ghci, which is an operation provided by the REPL, not the Haskell language.

You can edit in another window and then do load $eo to load up the latest.

Another angle to save a little typing is add a method like this to to your .irbrc:

def ld s
    load s.to_s + ".rb"
end
Use that ld method like this:
$ irb
>> ld :eo
=> true
By giving ld a Symbol, :eo above, it's sort of like using a string literal that needs a quote on only one end. (Atoms in Lisp are similar.) ld converts its argument to a string, tacks on the ".rb" suffix, and calls load with the result. Again, we get this sort of flexibility because load is a Ruby method, but on the other hand, I still miss the filename completion that ghci provides.

Problem 2. (4 points) tkcycle.rb

Write a Ruby iterator tkcycle(x, sizes) that yields consecutive "slices" of x based on the integers in sizes. tkcycle cycles through the sizes in sizes until it runs out of elements in x. tkcycle always returns nil.

Duck typing: tkcycle only requires x to support a "slice" operation with x[start,length] and have a size method. sizes is assumed to be a non-empty array of non-negative integers.

Examples:

>> tkcycle((1..10).to_a,[1,2]) {|s| p s}
[1]
[2, 3]
[4]
[5, 6]
[7]
[8, 9]
[10]
=> nil

>> tkcycle("just a test", [3]) {|s| puts "slice = #{s}"}
slice = jus
slice = t a
slice =  te
=> nil

>> tkcycle("just a test", [30]) {|s| puts "slice = #{s}"}
=> nil

>> tkcycle("just a test", [3,0,2]) {|s| puts ">#{s}<"}
>jus<
><
>t <
>a t<
><
>es<
Note that tkcycle("just a test", [3]) above demonstrates that tkcycle never yields a partial result: After the third block invocation, with s = " te", only "st" remains, and that's not enough for another three-element yield.

Problem 3. (4 points) vrepl.rb

Write a Ruby iterator vrepl(x) that produces an array consisting of varying numbers of repetitions of elements in x. The number of repetitions for an element is determined by the result of the block when the iterator yields that element to the block.

Duck typing: vrepl only requires x to have an each method.

Examples:

>> vrepl(%w{a b c}) { 2 }
=> ["a", "a", "b", "b", "c", "c"]

>> vrepl(%w{a b c}) { 0 }
=> []

>> vrepl((1..10).to_a) { |x| x % 2 == 0 ? 1 : 0 }
=> [2, 4, 6, 8, 10]

>> i = 0
=> 0

>> vrepl([7, [1], "4"]) { i += 1 }
=> [7, [1], [1], "4", "4", "4"]
If the block produces a negative value, zero repetitions are produced:
>> vrepl([7, 1, 4]) { -10 }
=> []

Problem 4. (4 points) mirror.rb

Write a Ruby iterator mirror(x) that yields a "mirrored" sequence of values based on the values that x.each yields.

Duck typing: mirror only requires that x implement the iterator each.

The value returned by mirror is always nil.

Examples:

>> mirror(1..3) { |v| puts v }
1
2
3
2
1
=> nil

>> mirror([1, "two", {a: "b"}, 3.0]) { |v| p v }
1
"two"
{:a=>"b"}
3.0
{:a=>"b"}
"two"
1
=> nil

>> mirror({:a=>1, :b=>2, :c=>3}) {|x| p x}
[:a, 1]
[:b, 2]
[:c, 3]
[:b, 2]
[:a, 1]
=> nil

>> mirror([]) { |v| puts v }
=> nil
Like the previous three iterators, mirror is a freestanding method.

Problem 5. (27 points) switched.rb

The U.S. Social Security Administration makes available yearly counts of first names on birth certificates back to 1885. Over time, some names change from predominantly male to predominantly female or vice-versa. For this problem you are to create a Ruby program switched.rb to look for names that change from predominantly male to predominantly female in given spans of years.

switched.rb takes two command-line arguments: a starting year and an ending year. Here's a run:

$ ruby switched.rb 1951 1958
             1951   1952   1953   1954   1955   1956   1957   1958
Dana         1.19   1.20   1.26   1.29   1.00   0.79   0.67   0.64
Jackie       1.40   1.29   1.14   1.13   1.11   0.94   0.72   0.57
Kelly        4.23   2.74   3.73   2.10   2.32   1.77   0.98   0.51
Kim          2.58   1.82   1.47   1.08   0.61   0.30   0.17   0.12
Rene         1.43   1.32   1.15   1.24   1.13   0.88   0.87   0.89
Stacy        1.06   0.81   0.62   0.47   0.44   0.36   0.29   0.21
Tracy        1.51   1.14   1.02   0.73   0.56   0.55   0.59   0.59
First, note that all numbers in the leftmost column are greater than one and all numbers in the rightmost column are less than one. The 1.19 for Dana in 1951 indicates that in 1951 there were 1.19 times as many male babies named Dana as there were female babies named Dana. We can see that in a6/yob/1951.txt, which has the 1951 data:
$ grep Dana, a6/yob/1951.txt
Dana,F,1076
Dana,M,1277
The format of the a6/yob/YEAR.txt data files is simple: each line has the name, sex, and the associated count, separated by commas.

Note that the argument to grep, "Dana," has a trailing comma so that "Danae" doesn't turn up, too.

By 1958 things had changed—there were only 0.64 males named Dana for every female named Dana:

$ grep Dana, a6/yob/1958.txt
Dana,F,2388
Dana,M,1531

switched reads the a6/yob/YEAR.txt files for all the years in the range specified by the command line arguments and looks for names for which the male/female ratio is > 1.0 in the first year and < 1.0 in the last year. For all the names it finds, it prints the male/female ratio for all the years from the first year through the last year. Names are printed in alphabetical order.

As a specific example, Dana is included in the list for 1951 through 1958 because males/females in 1951 was 1.19 (> 1.0) and males/females in 1958 was 0.64 (<1.0). The ratios for the middle years are not examined to decide whether to include a name; they are shown only to provide a more complete picture of the data between the endpoints.

Note that there's a big shift for Kim from 1954 through 1957. I wonder if that's because the actress Kim Novak had a breakout role in 1955's Picnic.

Spring 2016 student Mai Tong noticed this for "Addison":

             1995  1996  1997  1998  1999  2000  2001  2002  2003  2004  2005  2006  2007  2008  2009  2010
Addison      1.11  0.82  0.68  0.53  0.53  0.44  0.36  0.28  0.25  0.21  0.13  0.06  0.03  0.02  0.02  0.02

Ms. Tong wrote,

I looked up on Google to find the reason why and I found that the name means "Son/Child of Adam", which is interesting since Addison is now commonly used as a girl's name rather than boy's. Possibly due to uncommon use for a girls name back in the early 1990s? But looking closely, there's a show called Grey's Anatomy (premiered in 2005) with a popular female-lead named Dr. Addison Montgomery which might be why the ratio dropped dramatically in 2005 and onwards.

Back to the details, if no names meet the criteria, switched prints "no names found" and exits by calling exit:

$ ruby switched.rb 2011 2012
no names found
IMPORTANT: To eliminate the less significant results, a name is included only if both the male and female counts in both the first and last year are greater than or equal to 100. By that criteria the name Lavern is not included for 1949-1951, and no other names turn up, either:
$ ruby switched.rb 1949 1951
no names found
Here's the underlying data:
$ grep Lavern, a6/yob/1949.txt
Lavern,F,93
Lavern,M,121

$ grep Lavern, a6/yob/1951.txt
Lavern,F,95
Lavern,M,86
There was a M/F shift from 121/93 in 1949 to 86/95 in 1951 but because not all four of those counts are >= 100, Lavern is not included. There's no grep shown above for 1950 because that data is inconsequential: inclusion is determined solely based on counts for the first and last year.

It's interesting to combine switched with a Bash for-loop that runs the program with a gradually shifting range. When your switched is done, try this:

for i in $(seq 1940 2005); do ruby switched.rb $i $(($i+9)); echo ===; done
Two obvious extensions to switched would be command-line options to adjust the 100-baby minimum and to look for female to male flips for a name. You might find those interesting to implement and experiment with, but neither are required.

switched does no error handling whatsoever. Behavior is only defined in the case of being given two command line arguments in the range of 1885 to 2014, and the first must be less than the second.

Implementation notes for switched

I intend this problem to be an exercise in using the Hash class. I encourage you to devise a data structure yourself but in case you run into trouble, here are a few thoughts on my approach to the problem: http://www.cs.arizona.edu/classes/cs372/spring18/a6/switched-hint.html

It's easy to drown in the data on a problem like this. You might start by having your code that reads the YEAR.txt files discard data for everybody but "Dana" and then use p x (equivalent to puts x.inspect) to dump out your data structure. Then try adding in a male-only and a female-only name, like "Delbert" and "Mimi" in 1951. Alternatively, you might edit down some data files to just a few lines of interest. (After copying any of the a6/yob files into your directory, use chmod 600 FILE so that you can edit it; cp will have left it as mode 444—read only.)

Watch out for bugs related to integer division. (Use .to_f to get a Float when needed.)

Use File.open to produce a File object whose gets method can be used to read lines. Example:

$ cat fileio.rb
year = ARGV[0]

f = File.open("a6/yob/#{year}.txt")

count = 0
while line = f.gets
    count += 1
end

f.close

puts "read #{count} lines"

$ ruby fileio.rb 2001
read 30258 lines
Alternatively, you could use f.readlines() to produce an array of all the lines in the file with a single call or f.each { ... } to process each line with the associated block.

The M/F ratios are formatted using a %7.2f format with printf, demonstrated on the command line with ruby -e:

$ ruby -e 'printf("%7.2f\n", 1277.0/1076.0)'
   1.19

Names are left-justified in a 10-wide field using a printf format of %-10s.

You may have questions about the data files. Before mailing to us or posting on Piazza, take a look at the data files and see if you can answer the question yourself. The files are in a6/yob. Those same files will be used for testing when grading.

You can download http://www.cs.arizona.edu/classes/cs372/spring18/a6/yob.zip for testing on your own machine. In the same directory as your switched.rb, make a directory named a6 and then unzip yob.zip in that directory to produce a structure compatible with the File.open show above.

Problem 6. (12 points) label.rb

Deadline exception: To give you a little schedule flexibility, you may do this problem, label.rb, as part of assignment 7 rather than this assignment. If you wish to do that, simply don't submit a solution for label.rb on this assignment. You'll see a zero for this problem on assignment 6, and your overall average will be about a point or so low, but you'll make it up on assignment 7. Similarly, students who submit label.rb on this assignment should not submit label.rb on assignment 7, and will see a zero for it there.

Barring the unforeseen, assignment 7 will be due on Friday, April 6.

In lecture we've seen that Array.inspect, which is used by Kernel.p and by irb, does not accurately depict an array that contains multiple references to the same array and/or references itself. Example:

>> a = []

>> b = [a,a]

>> p b
[[], []]

By simply examining the output of p b we can't tell whether b references two distinct arrays or has two references to the same array.

Another problem is that if an array references itself, Ruby "punts" and shows "[...]":

>> a = []

>> a << a

>> p a
[[...]]
The problems continue if we then get a Hash involved:
>> h = {}

>> h[a] = h

>> p h
{[[...]]=>{...}}

For this problem you are to write a method label(x) that produces a labeled representation of x, where x is a string, number, array or hash. Arrays and hashes may in turn reference other arrays and hashes, and be cyclic.

Here's what label shows for the first case above.

>> a = []; b = [a,a]

>> puts label(b)
a1:[a2:[],a2]
The outermost array is labeled as a1. Its first element is an empty array, labeled a2. The second element is a reference to that same empty array, but its contents are not shown, only the label a2.

Note: For reasons described later, we'll use puts label(...) to show the string that label produces.

Here's another step, and the result:

>> c = [b,b]

>> puts label(c)
a1:[a2:[a3:[],a3],a2]
Note that the label numbers are not preserved across calls. The array that this call labels as a3 was labeled as a2 in the previous example.

To explore relationships between the contents of a, b, and c we could wrap them in an array:

>> puts label([a,b,c])
a1:[a2:[],a3:[a2,a2],a4:[a3,a3]]
Here's a simple cyclic case. The third element in a is a reference to a; representing that reference with a label lets us see the cycle.
>> a = []

>> a = [10,20]

>> a << a

>> puts label(a)
a1:[10,20,a1]
Let's try a simple hash:
>> h = {}

>> h["one"] = 1

>> h[2] = "two"

>> puts label(h)
h1:{"one"=>1,2=>"two"}
Note that hashes are labeled as "hN". The key/value pairs are shown with "thick" arrows. Pairs are separated with commas, and curly braces surround the list of pairs.

Let's add some arrays into the mix:

>> h = {}

>> a = [2,2,2]

>> a << a

>> h["twos"] = a

>> h["words"] = %w{just a test}

>> puts label(h)
h1:{"twos"=>a1:[2,2,2,a1],"words"=>a2:["just","a","test"]}
Note that arrays and hashes have separate numbering: the above labeling shows both h1 and a1, for example.

Let's try h[h] = h:

>> h[h] = h

>> puts label(h)
h1:{"twos"=>a1:[2,2,2,a1],"words"=>a2:["just","a","test"],h1=>h1}
label(h) eventually reaches the key/value pair made by h[h] = h. Because h has already been labeled as h1, the presence of that key/value pair is shown as h1=>h1. Another example:
>> a = [1,2,3]

>> a << [[[a,[a]]]]

>> a << a

>> puts label(a)
a1:[1,2,3,a2:[a3:[a4:[a1,a5:[a1]]]],a1]
One more example:
>> a = [1,2,3]

>> b = {"lets" => "abc", 1 => a}

>> 3.times { a << b }

>> a << "end"

>> puts label([a,b,{100=>200}])
a1:[a2:[1,2,3,h1:{"lets"=>"abc",1=>a2},h1,h1,"end"],h1,h2:{100=>200}]
The trivial case for label(x) is that x is not an array or hash. If so, label(x) returns x.inspect:
>> puts label(4)
4

>> puts label("testing")
"testing"
Along with the two cases immediately above, here are some simple cases that are good for getting started:
>> puts label([7])
a1:[7]

>> puts label([[7]])
a1:[a2:[7]]

>> puts label([[70],[80,"90"]])
a1:[a2:[70],a3:[80,"90"]]
Keep in mind that your solution must be able to accommodate an arbitrarily complicated structure but the only types you'll encounter are numbers, strings, arrays and hashes.

The above examples use puts label(...) rather than showing the result of label(...). Let's look at a label call without puts:

>> label([1,"two",["3"]])
=> "a1:[1,\"two\",a2:[\"3\"]]"
label returns a string and irb uses inspect to show an unambiguous representation of results, so any quotes in the result are "escaped" with a backslash, so that the result is shown as a valid string literal. Using puts lets us avoid that clutter:
>> puts label([1,"two",["3"]])
a1:[1,"two",a2:["3"]]
=> nil

Implementation notes for label.rb

I think of this as a fairly hard problem to solve given only the above and what you've seen in class, but for those who wish to have a challenge, I'll say nothing here about how to approach it. If you have trouble getting started, however, see http://www.cs.arizona.edu/classes/cs372/spring18/a6/label-hint.html

IMPORTANT: You must match the sequence of label numbers that my solution produces. That essentially requires you to traverse the structure in the same order I do, which is depth-first. Here's an example that evidences that depth-first traversal:

>> puts label([[[10]],[[21,22]]])
a1:[a2:[a3:[10]],a4:[a5:[21,22]]]
Note that the deeply nested [10] was labeled with a3 before the second element of the top-level list was labeled with a4.

Also, note that I iterate through key/value pairs in a hash using h.each {|k,v| ... }.

Don't hesitate to dig into the test program for label

There's only one test for label.rb. It's this: ruby a6/label1.rb.

You may need to dig around in a6/label1.rb to figure out exactly what code is being executed for a particular failure and associated "diff". It may be useful to copy a6/label1.rb into your directory and hack it up, maybe adding Kernel.p calls or making other changes to track down a bug.

In various places in the output from a6/label1.rb you'll see something like this:

Line 35 in a6/label1.rb:
label([a,b,c]):
   a1:[a2:[],a3:[a2,a2],a4:[a3,a3]]
That first line tells us that the test originates at line 35 in a6/label1.rb. Here's that line:
sv("label([a,b,c])", bb)
sv, standing for "show values", is a helper method that uses Ruby's Kernel.eval method to evaluate the expression specified by the first argument. The second argument, bb, is the current set of variable bindings, which are used by eval.

Historical note

label is a simplified version of the Image(x) procedure from the Icon library. In CSC 550A/B, String and List Processing, taught in the 1980s, Ralph Griswold used Image(x) as an example. Icon's Image(x), and label(x), which you are about to write in relatively few lines of code, are incredibly handy for dealing with cycling data structures but I know of no equally powerful function that's part of the standard library in any other language.

From time to time I've asked in various IRC channels and web forums in search of something similar in other languages but those questions are often met with "Why would you need that?", "There's no reason for an array to reference itself!", etc. Of course, graphs and "a maze of twisty little passages, all alike" are immediate applications of self-referential data structures.

If you ever come across anything like the label(x) method you're about to write, let me know!

Problem 7. (ZERO points) pancakes.rb

Let's see who will write a Ruby version of the Haskell pancake printer from assignment 4 just for the fun of it!

The Ruby version is a program that reads lines from standard input, one "order" per line, echoes the order, and then shows the pancakes.

Example:

$ cat a6/pancakes.1
3 1 / 3 1 5
3 1 3
1 5/          1 1 1/11 3 15     /3 3 3     3/1
1
$ ruby pancakes.rb < a6/pancakes.1
Order: 3 1 / 3 1 5

     ***  
***   *   
 *  ***** 

Order: 3 1 3

*** 
 *  
*** 

Order: 1 5/          1 1 1/11 3 15     /3 3 3     3/1

                        ***   
      *   ***********   ***   
  *   *       ***       ***   
***** * *************** *** * 

Order: 1

* 

$ 
A blank line is printed after the "Order:" line and again after the stacks.

Make these assumptions:

If an order specifies an even-width pancake, the message shown below is printed. Processing then continues with the next order in the input, if any.

$ ruby pancakes.rb < a6/pancakes.2
Order: 1 3 1 / 1 2 3

Even-width pancake.  Order ignored.

Order: 51 49

*************************************************** 
 *************************************************  

$ 
In case you want to play "Beat the Teacher", I'll tell you that it took me about 25 minutes to write pancakes.rb, sketching on paper included. If you care to, let me know how long it takes you. Think about it all you want "for free", but start the clock the moment a tangible artifact is produced, like a mark on a piece of paper.

Problem 8. Extra Credit observations.txt

Submit a plain text file named observations.txt with...

(a) (1 point extra credit) An estimate of how many hours it took you to complete this assignment. Put that estimate on a line by itself, like this:

Hours: 3.5
There should be only one "Hours:" line in observations.txt. (It's fine if you care to provide per-problem times, and that data is useful to us, but report it in some form of your own invention that doesn't contain the string "Hours:". Thanks!)

Feedback and comments about the assignment are welcome, too. Was it too long, too hard, too detailed? Speak up! I appreciate all feedback, favorable or not.

(b) (1-3 points extra credit) Cite an interesting course-related observation (or observations) that you made while working on the assignment. The observation should have at least a little bit of depth. Think of me thinking "Good!" as one point, "Excellent!" as two points, and "Wow!" as three points. I'm looking for quality, not quantity.

Turning in your work

Use a6/turnin to submit your work. Each run creates a time-stamped "tar file" in your current directory with a name like aN.YYYYMMDD.HHMMSS.tz. You can run a6/turnin as often as you want. We'll grade your final pre-deadline submission.

Use a6/turnin -l to list your submissions.

To give you an idea about the size of my solutions, here's what I see as of press time:

$ wc $(grep -v txt a6/delivs)  # this command should work for you, too
   8   17   93 eo.rb
  19   37  332 tkcycle.rb
   8   29  158 vrepl.rb
  17   23  205 mirror.rb
  57  142 1241 switched.rb
  22   57  611 label.rb
  39  112  851 pancakes.rb
 170  417 3491 total
My code has few comments.

Note that each of the a6.*.tz files created in your directory by a6/turnin is essentially a time-stamped snapshot of your code. (If you need to recover a file and aren't familiar with tar, perhaps mail to 372s18—it's easy to accidentally overwrite your latest versions.)

Miscellaneous

Point values of problems correspond closely to the "assignment points" mentioned in the syllabus. For example, a 10-point problem would correspond to about 1% of your final grade in the course.

Feel free to use comments to document your code as you see fit, but note that no comments are required, and no points will be awarded for documentation itself. (In other words, no part of your score will be based on documentation.)

In Ruby, # is comment to end of line, unless in a string literal or regular expression. There's no Ruby analog for /* ... */ in Java and {- ... -} in Haskell but you can comment out multiple lines by making them an embedded document—lines bracketed with =begin/=end starting in column 1. Example:

=begin
    Just some
    comments here.
=end

Remember that late assignments are not accepted and that there are no late days; but if circumstances beyond your control interfere with your work on this assignment, there may be grounds for an extension. See the syllabus for details.

My estimate is that it will take a typical CS junior from 7 to 9 hours to complete this assignment.

Our goal is that everybody gets 100% on this assignment AND gets it done in an amount of time that is reasonable for them.

Assignments are not take-home tests! We hope you'll make use of Piazza, email and office hours if you run into trouble. We're happy to help you get started with any or all of these problems but in any event, if you put six hours into this assignment and don't seem to be close to completing it, it's definitely time to touch base with us. Specifically mention that you've reached six hours. Give us a chance to speed you up!