Distributed Systems — laranevans.com

A distributed system runs one logical service across many machines joined by a network that drops and delays messages on its own schedule. Three conditions follow from that setup, and they generate every hard problem in the field. The network partitions, so two machines that are both running cannot always reach each other. The machines fail independently, so part of the system is down while the rest keeps serving. And no node holds a current view of the whole, so "what is the latest value of this record" stops being a question one machine can answer.

Those three conditions turn operations that stay trivial on a single machine into design problems with real tradeoffs. This cluster collects the models and results you use to reason about them. It opens with PACELC, which names what a replicated data store gives up during a partition and what it gives up the rest of the time.

The pages in this cluster

PACELC. The trade a replicated store makes during a network partition (availability or consistency) and the separate trade it still makes when the network is healthy (latency or consistency). PACELC is the frame the other results in this cluster hang on.

More pages land as the cluster grows: the CAP theorem PACELC extends, consistency models, consensus, and replication strategies.

Where distributed systems meets the rest of systems

The partition trade is not unique to replicated databases. A cache that serves a value after its source has changed is making PACELC's healthy-network choice in miniature, picking latency over consistency because reaching the source on every read costs too much. A store that promises ACID guarantees buys the opposite, paying coordination latency to keep every read consistent. Seeing the same trade across a cache, a database, and a replicated store hands you one model to carry instead of three separate rules.