When a group of related actions is performed on a data system there are several desirable properties:
- atomicity – either all the actions succeed or none succeed.
- consistency – the system remains in a consistent state with regard to any constraints.
- isolation – the temporary state consequent to one group of actions is not visible to another group of actions occurring concurrently.
- durability – the successful completion of the group of actions results in a permanent change of state of the system.
These properties were stated formally in the early 1980s in a paper, The Transaction Concept: Virtues and Limitations, by Jim Gray and a paper, Principles of Transaction-Oriented Database Recovery ($), by Theo Haerder and Andreas Reuter. In the latter these properties were described as the ACID test of the system’s quality.
The primary goal of relational databases is state consistency which relies on the atomicity, isolation and durability of transactions.
In a distributed data system some part of the system may lose contact temporarily with the rest of the system. In essence, the system separates into two or more network partitions. This means that changes in one partition are not be available in another partition leading to consistency problems of the data system.
In a PODC 2000 speech, Eric Brewer conjectured that it is impossible for a distributed system to satisfy all three of:
- partition tolerance
The conjecture was formally proved by Seth Gilbert and Nancy Lynch in a paper titled Brewer’s Conjecture and the Feasibility of Consistent, Available Partition-Tolerant Web Services ($).
Just as the ACID properties are desired in a data system, so the CAP properties are desired in a distributed data system. The CAP theorem states that in any distributed data system a tradeoff must be made among the three CAP properties.
A system supporting ACID emphasizes consistency at the expense of partition tolerance and availability since it may become unavailable in the event of a partition occurring that causes transactions to fail. The growth of the web during the 1990s made it clear there was an important role for a distributed data system prioritizing availability over consistency. When large numbers of people are accessing a system it is often more important that the system is available than that the data offered by the system is absolutely up-to-date and consistent.
The name BASE was introduced in a 1997 symposium paper to describe such a service – with the name deriving from basically available, soft state, eventual consistency. The central idea is that a BASE system is always available – at the expense of inconsistency in the event of a partition occurring. It is possible, however, that different users of a BASE system are provided with different values for the same data. This arises, for example, when users access the data in different partitions and one of the users is provided with a stale copy of the data. The idea with BASE is that when the partition heals the stale data will be updated to the fresh value and the data system will be consistent once again.
The ability of a BASE system to function in the event of a network partition also makes BASE useful for building massively scalable systems. BASE systems can be built-out in a manner and at a cost that would be prohibitive for ACID systems. This makes BASE almost ideal for systems where massive scale is required but up-to-the-second data consistency is not required. This is true of much of the data on the web.
ACID plus BASE
ACID and BASE are orthogonal ways of viewing data systems in that one prioritizes consistency while the other prioritizes availability. It is not the case, however, that one is better than the other. Indeed it is very likely that any reasonably large web system would need to contain both ACID and BASE data systems. For example, the scalability and availability of BASE is useful for large scale data storage for things like user comments while the data consistency of ACID is essential to accurate billing of users.
Dan Pritchett has a really good article, BASE: An ACID Alternative, that provides a lot more detail on BASE and its motivation. Werner Vogels, the Amazon CTO, has an article, Eventually Consistent, that goes into what being eventually consistent means in practice.