CSC 372, Fall 2022 Assignment 2 Due: Friday, September 2, 2022 at 23:59:59

Introduction

Aside from one problem, bench.txt, there's no programming at all on this assignment. It's a combination of web research, creative thought, pondering, and a little writing.

For each of the problems whose name ends in .txt (like morefacts.txt and quote.txt) you are to answer by turning in a plain ASCII text file. Use an editor like Sublime, Vim, Emacs, Notepad++, etc. DO NOT submit Word documents, PDFs, Rich Text files, HTML documents, etc. As a double-check, your .txt files should look perfectly fine when displayed with cat on lectura. Also, the file command should show them as "ASCII text" or maybe "ASCII English text", or something similar.

Important: Create a symbolic link to the fall22/a2 directory

The directory /cs/www/classes/cs372/fall22/a2 on lectura has a couple of shell scripts, turnin and run-bench, and other files that you'll need to use to complete the assignment. The instructions in this write-up assume that you're working on lectura AND that you've made a symbolic link to the fall22/a2 directory.

Imagining that you're maintaining your assignment 2 files on lectura in a directory named ~/csc372/assignment2, here's how to make that symbolic link in the right place and then examine it with ls:

% cd ~/csc372/assignment2
% ln -s /cs/www/classes/cs372/fall22/a2 a2
% ls -l a2
lrwxrwxrwx 1 whm whm 31 Aug 25 01:01 a2 -> /cs/www/classes/cs372/fall22/a2

If we now do ls -l a2/ (don't forget the slash!), we'll see files that are actually in /cs/www/classes/cs372/fall22/a2: (files present, sizes, and dates may vary)

% ls -l a2/
total 9
-rw-r----- 1 whm whm 1051 Aug 25 15:18 bench-avg.py
-rw-rw-r-- 1 whm whm  105 Aug 23 18:07 delivs
-rwxr-xr-x 1 whm whm  735 Aug 25 01:36 run-bench
-rwxrwxr-x 1 whm whm  493 Aug 23 15:37 turnin
-rw-rw---- 1 whm whm  968 Aug 25 16:09 whm-bench.txt

If you haven't worked with symbolic links, see slides 134+ in my 352 UNIX slides.

Problem 1. (6 points) morefacts.txt

When covering slide 27 in the intro set I said a sentence or two about various languages of historical or current significance. For this problem I'd like you to find three languages that are not mentioned on that slide and tell me a sentence or two about each.

I'll compile all the answers and post them on Piazza. Follow this format for your answers:

  1. The full response for each language should be a single line of text.
  2. Begin the line with the language's name followed by the year it appeared and then a colon and a space, followed by one or more sentences with whatever you want to say.
  3. End each line with an attribution that is either "anonymous" or your name in some reasonable form, like "-- W. Mitchell" or "-- William M." or "-- whm". (No fanciful identities, ok?)
As examples of both the format and the sort of thing I'm looking for, here are three of mine, one with anonymous attribution. I'll use cat -n to display my morefacts.txt and demonstrate that it contains three lines. (cat -n is sort of like a visual wc(1). Experiment: Try cat both with and without the -n option.)

I'll also see what file(1) says. Note that my shell prompt is just a percent sign.

% cat -n morefacts.txt
     1  Ada 1980: The DoD's attempt to have one language for military embedded systems, instead of 450. -- William Mitchell
     2  Java 1995: The most rapidly adopted language of all time.   In Spring 1997 I gave one lecture in 372 about Java as a rising language; by Fall 1998 it was being taught in 127A. -- William Mitchell
     3  Scala 2003: Proof that Germans should stick to beer and BMWs. -- anonymous
% file morefacts.txt
morefacts.txt: ASCII text
%

Feel free to use Google, Wikipedia, etc., for research on this question but needless to say, no posts anywhere soliciting ideas.

Above I say, "the year it appeared" but that's often subject to debate. Feel free to believe Wikipedia or go with another source.

Just to be clear, you may use anonymous attribution to keep your classmates from knowing you wrote a particular entry but what you submit must be original, not something you found on the net.

Some of you may find that you share a first, middle, or last name with a programming language.

Problem 2. (2 points) quote.txt

We started the class with a quote from Alan J. Perlis about programming languages. For this problem I'd like you to find a quote you like about programming languages by somebody other than Perlis.

Put the quote in quote.txt in any format you'd like. Here's an example using Perlis' quote:

% cat -n quote.txt
     1  "A language that doesn't affect the way you
     2  think about programming is not worth knowing."
     3                            --Alan J. Perlis
% file quote.txt
quote.txt: ASCII text
%
Quotes that "wow" me and/or the TAs might get an extra point.

Problem 3. (3 points) pyph.txt

Slide 38 in the intro set raises the idea of the philosophy of a language. In a nutshell, I think of the philosophy of a language as what the language treats as important, or not. For this problem I'd like you to identify three significant elements of the philosophy of Python.

Important: Write for a reader who doesn't know anything about Python, just like you probably don't know anything about FLPL (a predecessor of Lisp).

I'm hereby prohibiting the following two items of "low-hanging fruit" in Python's philosophy: (1) dynamic typing and (2) support for object-oriented programming.

For this problem it's fine to work with classmates and friends, Google for "what is the philosophy of Python", etc., but what you submit must be stated in your own words. If you do significant work with others, mention them in pyph.txt

Some of you may know of import this in Python. There are good starting points there but most would need elaboration to meet the criteria of writing for a reader who doesn't know Python.

Problem 4. (3 points) jshell.txt

I think it's pretty stupid that it took the developers/maintainers of Java twenty years to provide a REPL for Java but that's the sad fact: JShell was released as part of JDK 9, in 2017.

For this problem I'd like you to spend maybe fifteen minutes experimenting with JShell and write a few sentences about your observations. You might talk about what you liked, what you expected to work but didn't, what you'd like to change, surprising behaviors, etc. If your 210 instructor showed or encouraged JShell, did it help or hurt when learning Java? If you didn't see JShell in 210, do you think it would have helped you learn Java?

If you've got Java installed on your machine, running jshell at a command-line prompt should start it up. If not, you can run jshell on lectura.

Some interaction with JShell is shown on slides 25 and 53 in the Haskell set. Here's a little more interaction with JShell:

% jshell
|  Welcome to JShell -- Version 11.0.16
|  For an introduction type: /help intro

jshell> 3 + 4
$1 ==> 7

jshell> $1 + "abc"
$2 ==> "7abc"

jshell> int i, j = 7;
i ==> 0
j ==> 7

jshell> /var
|    int $1 = 7
|    String $2 = "7abc"
|    int i = 0
|    int j = 7

jshell> /exit (or control-D to exit)
|  Goodbye
%
Note that /var shows variables and their values. /help shows all the commands.

Also, at the jshell> prompt, try typing "".s<TAB>

Problem 5. (10 points) bench.txt

Your task in this problem is to devise a simple computation whose execution time depends on one parameter, then implement it in Java and Python, and then compare the performance of the Java and Python versions for three different cases of parameter values. In essence, you are to create a simple benchmark to explore the relative performance of Java and Python.

Background

In general, benchmarking is a complicated undertaking that's packed with pitfalls and has an interesting cast of characters. However, we're just going to have a little fun with benchmarking. I'll start by showing you my solution for this problem.

Here's the computation I devised for my solution/implementation of this problem:

For some number N, first build a string of the form "1,2,...,N,". Then print N, the total length of the string, the number of zeroes in the string and then the contents of the string, in single quotes.

If N is 10, here's the expected output:

N=10: 21 total chars, 1 zeroes, s: '1,2,3,4,5,6,7,8,9,10,'
If N is 7,000,000, here's the expected output, with about 55 million characters elided from the middle:
N=7000000: 54888896 total chars, 4088895 zeroes, s: '1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19...6999995,6999996,6999997,6999998,6999999,7000000,'

Here is my Java implementation of the computation:

public class bench {
    public static void main(String args[]) {
        //int Ns[] = {1, 3, 10}; // small Ns for testing
        int Ns[] = {1_000_000, 3_000_000, 7_000_000};

        String s = "";
        StringBuilder sb = new StringBuilder();

        int N = Ns[Integer.parseInt(args[0])-1];
        for (int i = 1; i <= N; i++) {
            //s += i + ",";
            sb.append(i);
            sb.append(',');
            }

        s = sb.toString();
        int count = 0;
        for (int i = 0; i < s.length(); i++)
            if (s.charAt(i) == '0')
                count++;

        System.out.format("N=%d: %d total chars, %d zeroes, s: '%s'\n",
            N, s.length(), count, s);
    }
}

Here is my Python implementation of the computation:

import sys

#Ns = [1, 3, 10] # small Ns for testing
Ns = [1000000, 3000000, 7000000]

s = ""
N = Ns[int(sys.argv[1])-1]
for i in range(1,N+1):
    s += str(i) + ","

count = 0
for i in range(len(s)):
    if s[i] == '0':
        count += 1

print(f"N={N}: {len(s)} total chars, {count} zeroes, s: '{s}'")

Let's time the Python version for the first case: N=1,000,000. (Examine the code to see how the command-line argument 1 is used to select the first element of the list Ns.)

% time python3 bench.py 1 >p

real    0m1.171s
user    0m1.121s
sys     0m0.025s

The first non-blank line of time output, with real 0m1.171s, is sometimes called the "wall-clock" time—it's the amount of time you'd have seen pass if you'd been watching a clock on the wall. The third line, sys 0m0.025s, is the amount of time the operating system spent doing work on your behalf, i.e., the time spent in "kernel mode". The middle line, user 0m1.121s is the line we're interested in. It shows how much time we spent in our Python code: 1.121 seconds. (User time does include compilation into Python bytecode, but it's pretty negligible.)

The redirection >p causes the Python program's output to be sent to a file named p (for Python).

Because the time command is built into shells (try type time), its output can vary from shell to shell. The above output is from Bash. Here's what we might see with zsh:

% time python3 bench.py 1 >p
python3 bench.py 1 > p  1.14s user 0.02s system 99% cpu 1.166 total

Let's time the Java program for the first case. Instead of just using java bench.java to compile and run, we'll compile it first with javac, since Java compilation time is somewhat significant. (Try time javac bench.java to see how long the compilation takes.)

% javac bench.java
% time java bench 1 >j

real    0m0.199s
user    0m0.212s
sys     0m0.060s

Comparing the user times from the two runs, we see that the Java version ran 5.29 (1.121/.212) times faster than Python.

We should also be sure that the Java and Python versions produced identical output. We'll use cmp for a yes/no test:

% cmp j p
%
A behavior common among UNIX tools is for them to produce no output if an operation was successful. (In 352 I say, "Silence is Golden".) Above we see no output from cmp j p, so j and p are identical.

To see what cmp shows if the files differ, let's compare bench.java and bench.py:

% cmp bench.java bench.py
bench.java bench.py differ: char 1, line 1

Let's look at the sizes, just for fun:

% ls -l j p
-rw-r--r-- 1 whm staff 6888949 Aug 24 23:50 j
-rw-r--r-- 1 whm staff 6888949 Aug 24 23:33 p

If a cmp fails, or I get a "diff" with diff, I'll often first look at the sizes of the output files. One of the files might be empty, for example.

Timing can vary from run to run. A good practice is to run a benchmark several times and see if there's any significant variation in the times reported. Let's time the Python version three times in a row:

% time python3 bench.py 1 >p

real    0m1.233s
user    0m0.981s
sys     0m0.030s
% time python3 bench.py 1 >p

real    0m1.168s
user    0m0.957s
sys     0m0.048s
% time python3 bench.py 1 >p

real    0m1.199s
user    0m0.993s
sys     0m0.028s

Sure enough, we do see some slight variation. But now we're thinking about maybe manually starting three runs each of both Java and Python. It's clearly time to automate, and because we're programmers, we can!

Let's have a shell script do all the work of running both Java and Python versions, extracting timings, ensuring identical output from both versions, and more. If this were 352, I'd have you write that script yourself but since this is 372, I'm supplying it for you. It's in a2/run-bench. Assuming you've made an a2 symlink, you can type cat a2/run-bench to read the code, and I hope you do!

Let's run our first case three times:

% a2/run-bench 1
Java:   0m0.198s
Python: 0m1.078s
% a2/run-bench 1
Java:   0m0.224s
Python: 0m1.042s
% a2/run-bench 1
Java:   0m0.219s
Python: 0m0.979s

Just so you know what it looks like, here's an example of what's shown if there's a run-time error in bench.java:

$ a2/run-bench 1
Java:   0m0.013s
Python: 0m1.059s
cmp: EOF on j which is empty
=====Java errors=====
Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: Index -9 out of bounds for length 5
    at bench.main(bench.java:9)
A Python error would be similar.

We could add a --runs COUNT option to run-bench to run it several times, but let's think in terms of orthogonal tools and use Bash's for-loop to run run-bench 2 (N=3,000,000) three times, and print a blank line after each:

% for i in $(seq 3); do a2/run-bench 2; echo; done
Java:   0m0.313s
Python: 0m3.223s

Java:   0m0.284s
Python: 0m3.207s

Java:   0m0.353s
Python: 0m3.192s

When developing this problem I used my Mac's Spotlight a number of times to calculate the relative performance, but I quickly tired of typing in the user times. (Programmers should have a certain degree of impatience, I believe.) I harnessed my impatience to write a program to calculate average relative performance: a2/bench-avg.py. Let's pipe the output of that for-loop into that program:

% for i in $(seq 3); do a2/run-bench 2; echo; done | python3 a2/bench-avg.py
Java:   0m0.317s
Python: 0m3.176s

Java:   0m0.269s
Python: 0m3.133s

Java:   0m0.289s
Python: 0m3.182s

Java was 10.85x faster than Python (0.29167s vs. 3.16367s for 3 runs)

For this case of N being three million we're seeing almost an 11x advantage for Java.

As an aside, years ago it occurred to me that being a programmer is both a blessing and a curse: The blessing is that you can automate things. The curse is that you might start feeling you should automate everything.

If you haven't had 352, maybe experiment a little with Bash's for loop. First try this:

for i in $(seq 10 | tac); do echo $i...; done
Then, remove "| tac" and try it again. Also try running seq 10 by itself.

Back to the benchmark, let's try seven million for N with case 3:

% for i in $(seq 3); do a2/run-bench 3; echo; done | python3 a2/bench-avg.py
Java:   0m0.465s
Python: 0m7.693s

Java:   0m0.535s
Python: 0m7.600s

Java:   0m0.570s
Python: 0m7.956s

Java was 14.81x faster than Python (0.52333s vs. 7.74967s for 3 runs)

We see that Java's advantage seems to increase as N grows. I can imagine an interview question involving this data and "big-O", but we won't dig into that here.

All the shell stuff above should be familiar if you've had 352. If you haven't had 352, you've had a little preview of its UNIX segment here.

Given your successful completion of 120 and 210, I'd estimate that you should be able to write bench-avg.py in maybe 15-45 minutes in either Python or Java. I'm thinking that bench-avg.py might be interesting to write in Haskell and/or Racket this semester.

Your Task

For this problem you are to:

When all of the above are done, run your benchmark for each of your three cases. A simple way to do that is to run these three commands, one at a time:

for i in $(seq 3); do a2/run-bench 1; echo; done | python3 a2/bench-avg.py
for i in $(seq 3); do a2/run-bench 2; echo; done | python3 a2/bench-avg.py
for i in $(seq 3); do a2/run-bench 3; echo; done | python3 a2/bench-avg.py

Yes, you could put those commands in a script, and/or use nested for-loops.

Then, put the output of those commands into bench.txt, which is one of the three deliverables for this problem.

You can "pretty-up" bench.txt if you'd like. You can see what I created for bench.txt in a2/whm-bench.txt.

Along with bench.txt, there are two other deliverables for this problem: bench.java and bench.py.

Hog-wild option!

If you want to go further for fun, maybe with graphs and more, or support for more than three cases or multiple parameters, or even additional languages, go for it! The sky's the limit! You'll earn no extra credit but you might impress some folks.

If you choose to go hog wild, create one more deliverable: hogwild.pdf. Its form and content are fully at your discretion.

I'll share all hogwild.pdf submissions with the class, unless you really don't want yours included, and then you'll need to visit me during office hours, explain your reticence, and be prepared for some pushback from me!

IMPORTANT: Even if you do take the hog wild option, you still need to submit a bench.txt, like everybody else.

Problem 6. (ZERO points) quickrepl.txt

After working with languages that have a REPL available, you may find yourself really wanting a REPL when you go to learn a language that doesn't have one. One approach to a REPL is to integrate it with a language's implementation, but that typically requires a good understanding of that implementation. Another approach is a completely standalone REPL, such as once provided by javarepl.com, but that can require a lot of code—wc showed about 8,000 lines in Java source files for that system.

For this problem your challenge is to come up with a quick approach to write a simple REPL for a language. Your quick REPL needn't be nearly as good as ghci but it should be able to do the sorts of things you do when learning a language, such as evaluating expressions and showing types. Think about an approach that provides a lot of functionality for relatively little effort—something you could implement in a day, and not be late for dinner.

I'm not asking you to write any code for this problem; just describe how you might quickly produce a simple REPL for a language of your choice.

Python, Ruby and other languages have an "eval(s)" function that executes the code contained in a string s. Do not rely on such a capability in your quick REPL.

Note: This is a hard problem that requires some creativity and cleverness, so I'm making it worth zero points, but I'm curious to see who'll do it anyway! For this problem you are free to work in any groups of any size, but for me to possibly be impressed with your answer you'll have to specifically state that you came up with whatever you did all by yourself/yourselves—no Googling.

If you work in a group, all members may submit identical copies of quickrepl.txt, but it should include a list of group members.

Problem 7. Extra Credit observations.txt

Submit a plain text file named observations.txt with...

(a) (1 point extra credit) An estimate of how many hours it took you to complete this assignment. Put that estimate on a line by itself, like this:

Hours: 3.5

Feedback and comments about the assignment are welcome, too. Was it too long, too hard, too detailed? Speak up! I appreciate all feedback, favorable or not.

(b) (1-3 points extra credit) Cite an interesting course-related observation (or observations) that you made while working on the assignment. The observation should have at least a little bit of depth. Think of me saying "Good!" as one point, "Excellent!" as two points, and "Wow!" as three points. I'm looking for quality, not quantity—over the years I've given three points to some one-liners, and single points to a lot of long essays in observations.txt.

Turning in your work

Here's the full list of deliverables for this assignment:
morefacts.txt
quote.txt
pyph.txt
jshell.txt
bench.java
bench.py
bench.txt
hogwild.pdf (optional!)
quickrepl.txt (zero points—it's optional!)
observations.txt (for extra credit)

Note that all characters in the file names are lowercase.

Do not include your name, NetID, etc. in your .txt files—I like reading answers without knowing who wrote them.

The bash script /cs/www/classes/cs372/fall22/a2/turnin (on lectura) should be used to turn in your work. It creates a time-stamped "tar" file (similar to a zip file) that contains the expected files and then uses the system-wide turnin program to actually turn in that tar file.

Assuming you've made that symbolic link, you should see something like the following when you run a2/turnin:

% a2/turnin
Warning: quickrepl.txt not found
======== contents of a2.20220825.215805.tz ========
-rw-r--r-- whm/whm         387 2022-08-25 16:53 morefacts.txt
-rw-r--r-- whm/whm         134 2022-08-25 16:53 quote.txt
-rw-rw---- whm/whm           0 2022-08-25 16:53 pyph.txt
-rw-rw---- whm/whm           0 2022-08-25 16:53 jshell.txt
-rw-r--r-- whm/whm         729 2022-08-25 16:53 bench.java
-rw-r--r-- whm/whm         320 2022-08-25 16:53 bench.py
-rw-rw---- whm/whm           0 2022-08-25 16:53 bench.txt
-rw------- whm/whm       25445 2022-08-25 21:57 hogwild.pdf
-rw-rw---- whm/whm           0 2022-08-25 16:53 observations.txt
======== running turnin ========
Turning in:
 a2.20220825.215805.tz -- ok
All done.
Notice the first line of output: "Warning: quickrepl.txt not found". It indicates that the deliverable quickrepl.txt was not found. We'll imagine that because that quickrepl.txt is a zero-point problem the student elected to not do it.

The contents of the "tar" file that was created, named a2.20220825.215805.tz, are shown after that warning. Finally, the system-wide turnin program is run and its output is shown—we can see that the tar file with the deliverables was turned in ok.

You can run a2/turnin as often as you want. We'll grade your final non-late submission. We can add a -ls option to a2/turnin to show what's been turned in:

% a2/turnin -ls
.:
total 33
-rw-rw---- 1 whm whm 22616 Aug 25 21:58 a2.20220825.215805.tz

If you're curious, and I hope you are, look at a2/turnin. Remember that late submissions for assignments are not accepted, and that there are no late days; but if circumstances beyond your control interfere with your work on this assignment, there may be grounds for an extension. See the syllabus for details.

If a submission is late, it will be ignored unless you mail to 372f22 and tell us what happened. If the situation involves a personal matter, mail only to me (whm).

Miscellaneous

Point values of problems correspond directly to assignment points in the syllabus. For example, a 10-point problem roughly corresponds to 1% of your final grade in the course.
Addendum

A couple more things for bench.txt

I sure hate to add anything to an assignment after it's been published, but I completely forgot to ask you to do a couple of simple but important things on bench.txt:

  1. Start your bench.txt with a brief description of the computation you're doing.
  2. End your bench.txt with a brief section with an interesting observation or two. If you have trouble finding something interesting, let us know!
I've added examples of both to a2/whm-bench.txt.

No need to comment!

Feel free to use comments to document your code as you see fit, but no comments are required in your code.