One way to approach the challenge of managing large-scale, Internet-connected distributed systems is to understand user behavior and choose accordingly the appropriate tradeoffs between cost and performance. We applied this intuition to file-sharing environments typical of Grids and peer-to-peer systems. In this context, we propose a novel structure, the data-sharing graph, to help us study the similarity of user interests in files. In the data-sharing graph nodes represent users and edges represent similarity of interest in files. Studies of three file-sharing communities---a high-energy physics collaboration, the Web, and the Kazaa peer-to-peer system---show that the data-sharing graph is a small world.
Two characteristics are specific to small-world graphs: a large clustering coefficient and a small average path length. The large clustering coefficient reveals the existence of groups of users interested in same sets of files. The small average path length shows that the distance in interest between two users is, on average, small. We propose a file-location algorithm, FLASK, to exploit these small-world characteristics. We show that FLASK satisfies requirements of scientific communities (such as efficient location of collections) while preserving the spirit of decentralized, self-adaptive peer-to-peer solutions.