Project: Filaments
Investigators: Vince Freeh, David Lowenthal and Greg Andrews

Filaments implements fine-grain parallelism both portably and efficiently on shared- and distributed-memory machines. It currently runs on an SGI, Sequent, and a cluster of Sun IPCs. Prototypes are running on a cluster of DECstations running Mach and an Intel Paragon. DF provides a shared-memory programming model, implemented by a distributed shared memory on distributed machines. It achieves efficiency by using stateless threads, overlapping communication and computation, and automatically balancing the load in fork/join computations.

Filaments strives to make languages and compilers simpler by pushing some of the complexity into a run-time system. To this end, we have validated Filaments by creating fsc, a modified a SISAL compiler to generate Filaments code. Filaments obviates the need for compilers to cluster small tasks into larger ones. Ongoing research is focusing on how to augment Filaments to automatically place data on processors (in the Adapt subsystem) and to automatically avoid thrashing. These subsystems will obviate the need for compilers to place data on the processors and to pad data to avoid thrashing, simplifying the job of the programmer and the compiler. Further details on Filaments can be found in the following papers:

Dawson R. Engler, David K. Lowenthal, and Gregory R. Andrews Shared Filaments: Efficient Fine-Grain Parallelism on Shared-Memory Multiprocessors. TR 93-13a, University of Arizona, April 1993 (187K bytes)

Vincent W. Freeh, David K. Lowenthal, and Gregory R. Andrews. Distributed Filaments: Efficient Fine-Grain Parallelism on a Cluster of Workstations. First Symposium on Operating Systems Design and Implementation, p. 201-212, Monterey, CA, November 14-17, 1994 (217K bytes)

David K. Lowenthal, Vincent W. Freeh, and Gregory R. Andrews. Using Fine-Grain Threads and Run-Time Decision Making in Parallel Computing. To appear, Journal of Parallel and Distributed Computing, Special Issue on Multithreading for Multiprocessors (244 bytes, 97K compressed)


The Filament Runtime System

The Filaments package is a library of C code that currently runs on clusters of workstations (Sun running SunOS or Solaris, and DEC-5000 running Mach), shared-memory multiprocessors (Sequent Symmetry and SGI Iris), and a distributed-memory multi-computer (Intel Paragon). It has been designed to support parallel scientific applications, and is intended to be a target for a compiler. The package is relatively small (less than 7,000 lines of C code), simple (about 20 subroutine calls), and efficient.

Filaments has two key abstractions: fine-grain threads and distributed shared memory. The fine-grain execution model can be thought of as the least common denominator for concurrency, because it supports coarse- and medium-grain tasks as well as fine-grain ones. The shared-memory programming model provides a simple and natural way to program.

A filament is a very lightweight thread. Each filament is an independent of unit of work, and there may be thousands in an application. There are two kinds of filaments: iterative, which are used in loop-based applications such as matrix multiplication and Jacobi iterative, and fork/join, which are used in recursive applications such as adaptive quadrature. Filaments are executed concurrently by server threads.

The distributed shared memory (DSM) system provides a logically shared memory on distributed memory machines. Hence, filaments can communicate by reading and writing shared variables; no explicit message passing is required. Filaments synchronize using barriers for iterative applications or the join primitive for recursive applications. The package also provides reduction operations, which are integrated with barrier synchronization.

The Filament package employs a number of techniques to help make it efficient. First, filaments are stackless (reducing the cost of a context switch significantly) and independent. Thus, there may be thousands and each can be executed on any node in any order. This allows the runtime system flexibility in selecting the next filament to run and facilitates load balancing. Second, the DSM is multi-threaded: When a filament faults on a non-local DSM page, it is suspended and a different filament is executed; the faulting filament is rescheduled when the page arrives. This overlaps communication and computation, and therefore, eliminates communication latency in most applications. Third, the package employs techniques such as inlining and pruning to reduce the overhead of executing filaments. Finally, a reliable datagram protocol is used to reduce buffering and copying of message data (pages) and to provide scalability. These and additional features are discussed in the above papers.