Mitgating the Effects of Optimistic Replication in a Distributed File System

Puneet Kumar
December 1994

Abstract

Optimistic replication strategies can significantly increase availability of data in distributed systems. However such strategies cannot guarantee global consistency in the presence of partitioned updates. The danger of conflicting partitioned updates, combined with the fear that the machinery needed to cope with conflicts might be excessively complex has prevented designers from using optimistic replication in real systems.

 This dissertation puts these fears to rest by showing that it is indeed practical and feasible to use optimistic replication in distributed file systems. It describes the design, implementation and evaluation of the mechanisms used to transparently resolve diverging replicas in the Coda file system. Files and directories are resolved using orthogonal mechanisms due to the difference in their structure and semantics. A server-based mechanism that uses operation logging is utilized to resolve directories, while a client-based mechanism that uses application support is utilized to resolve files. When automatic resolution fails a repair tool in conjunction with standard Unix utilities aids the user in merging the diverging replicas. The combination of these mechanisms allows the system to provide high data availability with minimal impact on its usability, scalability, security and performance.

 Coda has been in daily use by thirty-five users for more than three years. The system consists of ten servers storing more than four Gigabytes of data and seventy-five clients, consisting of desktop and mobile hosts. Empirical measurements show that the system has maintained usability by automatically resolving partitioned updates on more than 99% of the attempts. Furthermore, the automatic resolution facility has excellent performance, minimal overhead and is rarely noticeable in normal operation. Usage experience with the repair facility has also been positive.

 This dissertation makes four significant contributions: a design of simple yet novel automatic resolution techniques; a formalization of the Unix file system model and proof of correctness of the resolution methods; implementation of these methods in a system with a real user community; and measurements, showing the efficacy of the approach.