Demonstration Application

4 Demonstration Application

This section demonstrates the use and benefits of paths with a simple, but realistic application implemented in Scout. The application consists of receiving, decoding, and displaying MPEG encoded video streams. MPEG encoding is able to reduce the size of a video by a factor of 10 to 100, but this compression ratio comes with a computationally expensive decompression algorithm. Workstations have only recently become fast enough to perform this task in realtime. Since MPEG decoding involves substantial computation, it is an application that demonstrates some of the advantages of paths related to resource management.

4.1 MPEG Router Graph

MPEG Router Graph

The Scout router graph for the demonstration application is shown in Figure 9. The topmost router, DISPLAY, manages the framebuffer. The bottom of the graph is formed by three routers implementing standard networking protocols: UDP, IP, and ETH. In the middle are the three interesting routers: MPEG, MFLOW, and SHELL.

Figure 9:Router graph for MPEG example.

The MPEG router accepts messages from MFLOW, applies the MPEG decompression algorithm to them, and sends the decoded images to the DISPLAY router. There, the images are queued for display at the appropriate time. The MPEG router uses application-level framing (ALF) [4] to minimize internal buffering. That is, the MPEG source sends Ethernet MTU-sized packets that contain an integral number of work-units (MPEG macroblocks). This ensures that the MPEG decoder does not have to maintain complex state across packet boundaries and obviates the need for undesirable queueing between MPEG and MFLOW.

The MFLOW router implements a simple flow-control protocol. MFLOW advertises the maximum sequence number that it is willing to receive based on the sequence number of the last processed packet and the input queue size. MFLOW uses sequence numbers to ensure ordered, but not reliable, delivery of packets to MPEG.

The SHELL router is used to create paths dynamically. It is configured on top of UDP so it can receive command requests via the network. SHELL is not unlike a UNIX shell in that it waits for a command request which it then maps into a command ``invocation.'' In the context of Scout, this involves mapping the command name into an appropriate path create operation. To create a path, SHELL requires two pieces of information: the router on which the path create operation is to be invoked on and a set of attributes (invariants). In the current implementation, an mpeg_decode command always results in a path create invocation on the DISPLAY router. In general, SHELL might consult an environment variable to select the graphics display to be used. SHELL creates MPEG paths with the following two attributes:

PA_NET_PARTICIPANTS=<ip-addr, udp-port>:: This attribute specifies the network address of the process that sent the mpeg_decode command request. SHELL assumes that the network address of the video source is the same as the address that originated the command request.
PA_PATHNAME=``MPEG'':: The value of this attribute is a string that, in its simplest form, is interpreted as a sequence of router-names. It is used either to force a specific routing decision or to supply routing information when there is no other routing information available. In the case of an MPEG path, SHELL sets this attribute to the string ``MPEG'' to force DISPLAY to forward path creation to the MPEG router.

Another attribute that is used during MPEG path creation is PA_PROTID. Unlike the other attributes, this one is not specified by the SHELL router. Instead, it is reset by each router that implements a networking protocol. The value of this attribute is the protocol id of the next-higher level networking protocol. This id is normally needed during packet classification. For example, IP packets with a protocol type of 6 are TCP packets and TCP packets with a port number of 21 are normally FTP packets. So when FTP forwards a path create operation to TCP, it sets PA_PROTID to 21. If TCP decides to forward path creation to IP, it resets the value of PA_PROTID to 6 to let IP know that it is dealing with a TCP path.

Figure 9 shows two video paths (from ETH to DISPLAY) and a shell path for receiving commands (from ETH to SHELL). Note that the video paths take their input from, and deposit their output into, a queue. These queues are serviced by interrupt handlers. In ETH, the queue is filled in response to a receive interrupt, and in DISPLAY, the queue is drained in response to the vertical synchronization impulse of the video display. Output to the display is synchronized to this impulse because there is no point in updating the display at a higher frequency.

There are three points worth emphasizing about this example. First, there are no queues other than the ones in ETH and DISPLAY. As mentioned above, this is due to MPEG's use of ALF. Second, ALF---along with explicit paths---enable integrated layer processing. Since MPEG reads the network packet data in units of 32 bits, it would be straight-forward to integrate the (optional) UDP checksum with the reading of the MPEG data. This would require a path-transformation rule that matches for MPEG being run directly on top of UDP. If this pattern matches, the path can be transformed by replacing the UDP and MPEG receive processing functions with functions that implement the UDP checksum computation as part of MPEG's reading of the packet data. Third, without queuing in the middle of the path, scheduling is simplified---if the output queue is full already, there is little point in scheduling a thread to process a packet in the input queue. This implication would not hold in the presence of additional queues.

Table 1 gives measurements that indicate the performance a Scout MPEG kernel can achieve. The table lists the maximum decoding rate in frames per second for a selection of four video clips. To put these numbers in perspective, the table also gives the corresponding numbers for Linux. The numbers are comparable in the sense that both systems run on the same machine (a 300MHz 21064 Alpha), use essentially the same MPEG code, and receive the compressed video over the network. The dominant costs in this example are the decompression of the MPEG stream and the dithering and displaying of the video frames. That is, practically all time is spent in the MPEG and DISPLAY routers.

	# of	max. rate [fps]
Video	frames	Scout	Linux
Flower	150	44.7	37.1
Neptune	1345	49.9	39.2
RedsNightmare	1210	67.1	55.5
Canyon	1758	245.9	183.3

Table 1:Coarse-Grain Comparison of Scout and Linux

While the playing field was as level as we could make it, it must be understood that this is an apples and oranges comparison---the two systems have a very different scope, level of functionality, and maturity. Still, the comparison is useful to establish that a path-based system such as Scout can easily achieve performance that is consistent with the machine on which it runs.

4.2 Queues

Queues

As Figure 9 shows, two queues exist at the ends of the MPEG path. These queues are in the ETH router (the input queue) and in DISPLAY (the output queue).

The input queue is required for two reasons: (1) for high-latency networks it may be necessary to have multiple network packets in transit, and (2) because of network jitter, these multiple packets may all arrive clustered together. Since the peak arrival rate at the Ethernet is much higher than the MPEG processing rate, the queue is needed to absorb such peaks.

Whereas the input queue absorbs bursts that are limited in size, the job of the output queue is to absorb jitter at a more global level---decompression itself introduces significant jitter. Depending on the spatial and temporal complexity of a video scene, the encoded size of any particular video frame may be orders of magnitudes different from the size of the average frame in that stream. The network may also suffer from significant jitter, e.g., due to temporary congestion of a network link. Finally, the sender of the MPEG stream itself is likely to add jitter since the video may, for example, be read from a a disk drive. Just how big should these queues be? Obviously, they should be ``just big enough,'' but is it possible to put some quantitative limits on their sizes?

First, consider the input queue. If processing a single packet requires more time than it takes to request a new packet from the source, then an input queue that can hold two packets is sufficient: one slot is occupied while the last received packet is being processed, and the second (free) slot is advertised to the source. If the round-trip time (RTT) is greater than the time to process a packet, then the input queue needs to be two times the RTT xbandwidth product of the network. MFLOW can measure the round-trip latency by putting a timestamp in its header. The important point from the perspective of this paper, however, is that accurate measurement of the peak processing rate is enabled by paths---it is a simple matter of specifying the appropriate transformation rule to ensure that the average time spent processing each packet is measured. For MPEG, this means that the initial function in the ETH-stage of the router is modified to measure processing time and to update the path attribute that keeps track of the average processing time.

In the case of the output queue, the factors influencing queue size are more varied and complex. A complete analysis is beyond the scope of this paper. In general, bounding the size of this queue requires cooperation with admission control and would typically employ a network reservation system, such as RSVP [3]. The current implementation leaves this parameter under user control to facilitate experimentation.

4.3 Scheduling

Scheduling

Since each video path has its own input queue and since the packet classifier is run at interrupt time, newly arriving packets are immediately placed in the correct queue. This means that once a packet is under control of the software, there is no danger of priority inversion due to low-priority packets being processed ahead of high-priority packets. This is one of the most significant advantages of paths. For example, the early separation makes it possible to run a video stream while flooding the network adapter with small Ethernet packets.

This is demonstrated in Table 2, which shows how the maximum decoding frame rate for the Neptune video drops when load is added to the Scout and Linux systems, respectively. The additional load consists of a flood of ICMP ECHO requests (generated with ping -f). In the Scout case, the video path is run at the default round robin priority, whereas the path handling ICMP requests is run at the next lower priority. In contrast, Linux handles ICMP and video packets identically inside the kernel. As the table shows, adding the ICMP load has little effect on the frame rate for Scout, while the maximum framerate for Linux drops by more than 42%. Clearly, the early separation afforded by paths can have significant benefits. This is not to say that paths are the only way to solve this particular problem (e.g., [22]), but it does support our claim that paths can be an effective solution to such problems.

	Framerate [fps]
	unloaded	loaded	\Delta
Scout	49.9	49.8	-0.2%
Linux	39.2	22.7	-42.1%

Table 2:Frame Rate Under Load

While the advantages of paths due to early separation are important, paths play an even more intimate role in scheduling. As explained in Section 3, a path can register a wakeup callback that can be used to adjust a thread's scheduling policy and priority according to its own needs. The MPEG path uses this facility to ensure that any thread that is ready for execution in the path will be scheduled with the proper realtime constraints. In combination, separate input queues and proper scheduling guarantee that the MPEG Scout kernel has no difficulty in delivering and processing realtime MPEG packets even under severe background loads. For example, an arbitrary number of low-priority MPEG streams (or some other non-realtime background work) can be displayed without adversely affecting realtime streams running in the foreground.

The default Scout scheduler is a fixed-priority, round-robin scheduler. Since video is periodic, it seems reasonable to use rate-monotonic (RM) scheduling for MPEG paths. With RM scheduling, a (periodic) realtime thread receives a priority level that is proportional to the rate at which it executes. That is, the frame-rate at which a video is displayed would control the priority of the corresponding path. However, there are several reasons that make earliest-deadline-first (EDF) scheduling more attractive than RM scheduling. These include:

The frame-rate must be under user control to support features such as slow-motion play or fast forward. This implies that a large number of priority-levels would be necessary. Otherwise, two MPEG paths that have similar, but not identical, frame-rates could not be distinguished scheduling-wise. If the number of priority levels is large, EDF scheduling is just as efficient as RM scheduling.
MPEG decoding is periodic, but not perfectly so. Consider playing a movie at 31Hz on a machine with a display update frequency of 30Hz. Given that only 30 images can be displayed every second, it will be necessary to drop one image during each one second interval. When the drop occurs, there is no need to schedule that path, so a fixed priority would be suboptimal.
While not a quantitative argument, probably the strongest case for EDF scheduling is that it is the natural choice for a soft realtime thread that moves data from an input queue to an output queue. For example, if the output queue drains at 30 frames/second and the queue is half full, it is trivial to compute the deadline by which the next frame has to be produced.

For these reasons, Scout uses EDF scheduling for realtime MPEG paths. For example, this allows Scout to display 8 Canyon movies at a rate of 10 frames per second, together with a Neptune movie playing at 30 frames per second, all without missing a single deadline. In contrast, the same load with single-priority round-robin scheduling leads to a large number of missed deadlines if the output queues for the Canyon movies are large.³ For example, with a queue size of 128 frames, on the order of 850 out of 1345 deadlines are missed by the path displaying the Neptune movie. The reason for the poor performance of round-robin scheduling is that it keeps scheduling the 8 Canyon movies as long as their output queues are not full, even at times when the Neptune movie needs the CPU much more urgently.

One question that remains is how the deadline is computed. Here again, paths play a central role. If path execution is the bottleneck, then the output queue should be kept as full as possible. In this case, it is best to set the deadline to the display time of the next frame to be put in the output queue. In contrast, if network latency is the bottleneck, then the deadline should be based on the state of the input queue. Since at any given time some number of packets (n) should be in the transit to keep the network pipe full, MFLOW must be able to advertise an open window of size n. This means that the deadline is the time at which the input queue would have less than n free slots. This time can be estimated based on the current length of the queue and the average packet arrival rate.

Since the path object provides direct access to both queues, the effective deadline can simply be computed as the minimum of the deadlines associated with each queue. Alternatively, the path can use the path execution time and network round-trip time to decide which queue is the bottleneck queue, and then schedule according to the bottleneck queue only. The latter approach is slightly more efficient, but requires a clear separation between path execution time and network round-trip time. The implemented MPEG decoder is currently optimized for the case where the output queue is the bottleneck, so scheduling is always driven off of that queue.

4.4 Admission Control

Admission Control

Finally, paths enable admission control. As all memory allocation requests are performed on behalf of a given path, it is a simple matter of accounting to decide whether a newly created path is admissible or not. Before starting path creation, the admission policy decides how much memory can be granted to a new path. As long as each router in the path lives within that constraint, the path creation process is allowed to continue. (Note that admission control has not yet been implemented in Scout.)

Paths are also useful in deciding admissibility with respect to CPU load. Again, this is due to the fact that it is easy to compute the execution time spent per path---our experiments show that there is a good correlation between the average size of a frame (in bits) and the average amount of CPU time it takes to decode a frame. Naturally, the model that translates average frame size into CPU processing time is parameterized by the speed of the CPU, the memory system, and the graphics card. Rather than determining these parameters manually, it is much easier to measure path execution time in the running system and use those measurements to derive the required parameters. That is, the path execution timings are used to derive the model parameters, which in turn, are used for admission control.

Finally, if admission control determines that a video cannot be displayed at the full rate, a user may choose to view the video with reduced quality. For example, the user may request that only every third image be displayed. Thanks to ALF and paths, it is possible to drop packets of skipped frames as soon as they arrive at the network adapter. This avoids wasting CPU cycles at a time when they are at a premium.