This dissertation puts these fears to rest by showing that it is indeed practical and feasible to use optimistic replication in distributed file systems. It describes the design, implementation and evaluation of the mechanisms used to transparently resolve diverging replicas in the Coda file system. Files and directories are resolved using orthogonal mechanisms due to the difference in their structure and semantics. A server-based mechanism that uses operation logging is utilized to resolve directories, while a client-based mechanism that uses application support is utilized to resolve files. When automatic resolution fails a repair tool in conjunction with standard Unix utilities aids the user in merging the diverging replicas. The combination of these mechanisms allows the system to provide high data availability with minimal impact on its usability, scalability, security and performance.
Coda has been in daily use by thirty-five users for more than three years. The system consists of ten servers storing more than four Gigabytes of data and seventy-five clients, consisting of desktop and mobile hosts. Empirical measurements show that the system has maintained usability by automatically resolving partitioned updates on more than 99% of the attempts. Furthermore, the automatic resolution facility has excellent performance, minimal overhead and is rarely noticeable in normal operation. Usage experience with the repair facility has also been positive.
This dissertation makes four significant contributions: a design
of simple yet novel automatic resolution techniques; a formalization of
the Unix file system model and proof of correctness of the resolution methods;
implementation of these methods in a system with a real user community;
and measurements, showing the efficacy of the approach.