CSc 352: Reading Unix Directories

Conceptually, A directory can be thought of as a file containing a series of entries, one entry per file contained in that directory. However, these entries are not represented as simple text strings, and as a result we cannot use input routines such as scanf to process directories. Instead, we use system calls such as opendir, readdir, rewinddir, and closedir to handle directories.

Reading a directory consists of the following steps (see the man pages for further details):

  1. Open the directory using the system call opendir. This returns a directory stream, which is of type
    DIR *
    and which will subsequently be used by other code to read and process the contents of the directory. Make sure you check the return value of opendir to determine whether the directory was opened successfully.

  2. Once a directory stream has been opened, the system call readdir can be used to directory entries corresponding to individual files. Each call to readdir returns (a pointer to) information corresponding to a single file in the directory; this is of type
    struct dirent *
    (this is discussed below in more detail). readdir does its own bookkeeping internally and keeps track of how much of the directory it has read (and, therefore, what the next entry is), so you don't have to update the directory stream yourself in any way.

  3. If, at any point, we want to re-read the contents of the directory from the beginning, this can be done using the system call rewinddir.

  4. At the end, when we are finished processing the directory, we can close the directory stream using the system call closedir.

Your code might therefore look something like this:

DIR *dir_ptr;
struct dirent *dirent_ptr;
...
dir_ptr = opendir(
filename);
...
for ( ... ) {
dirent_ptr = readdir(dir_ptr);
... process dirent_ptr ...
}
...
closedir(dir_ptr);
The second step above requires further explanation. The structure of a directory entry is as follows (see man dirent):
struct  dirent {
   ino_t           d_ino;
   off_t           d_off;
   unsigned short  d_reclen;
   char            d_name[1];
};
This is a variable length structure (because the length of the filename is not fixed), which is not easy to define in C. This is handled by using the field d_name to specify the address of the file name, and d_reclen to specify the length of the file name. (The code that allocates these structures makes sure that enough memory is allocated at d_name to hold the file name string, not just a single byte as the declaration seems to suggest.) The name of the file corresponding to the directory entry can therefore be obtained using the field d_name.