Introduction
Conflict-Free Replicated Data Types (CRDTs) are designed to enable concurrent updates and ensure eventual consistency across distributed systems without requiring coordination.
Where’re CRDTs used?
- Riak
- Redis (sets in Redis Enterprise)
Types of CRDTs:
1. Operational CRDTs:
Disadvantages:
Imagine we’ve a Set of DB-1 and DB-2:
2. State based CRDTs:
Let’s send the whole CRDT, instead of the operators on it! → Now, we can have duplicate messages. For this to work, merge fn should be:
- Commutative: f(a,b) = f(b,a)
- Associative: f(a, f(b,c)) = f(f(a,b), c)
- Idempotent: f(a, b) = f(f(a, b), b)
Since, order doesn’t matter, we can communicate via GOSSIP PROTOCOL (requires no extra messaging infrastructure)
How Gossip Protocol Works ??
The gossip protocol, inspired by the way rumors spread in social networks, works as follows:
- Periodic Communication:
- Each node periodically selects one or more random nodes to share its state.
- Information Exchange:
- Nodes exchange information about their local state, including updates they have received.
- State Merging:
- Nodes merge the received information with their local state, ensuring that all updates are eventually propagated to all nodes.
Disadvantages of GOSSIP Protocol:
- Network Overhead:
- The periodic communication can lead to increased network traffic, especially in very large systems.
- Convergence Time:
- The time it takes for all nodes to converge to the same state can vary and may be longer in systems with high latency or frequent partitions.
- Duplicate Messages:
- Nodes may receive the same update multiple times, leading to redundant processing.
Let’s go through some types of CRDTs:
Name | Operations | State |
---|---|---|
Counter | inc(leader) | incs: [0, 3, 2] → sum(incs) = 5 |
Decrease-able Counter | inc(leader), dec(leader) | incs: [0, 3, 2] & decs: [0, 1, 2] → sum: (incs - decs) or (0, 3, 2) - (0 + 1 + 0) = 4 |
3. Sequential CRDTs:
Build an eventual consistent list. Very hard because elements are ordered! Used in Google Docs and other collaborative text editors!
Implementing sequential CRDTs is particularly challenging due to the inherent complexity of maintaining a consistent order across distributed nodes.
Why Sequential CRDTs are Tough to Implement ??
- Maintaining Order Consistency: Unlike sets or counters, where the order of elements does not matter, sequences must preserve the relative ordering of elements across all replicas.
- Handling Concurrent Inserts: When multiple nodes insert elements at the same position concurrently, the system must determine a consistent order for these elements.
- Complexity of Merge Functions: The merge functions that combine different replicas’ states must ensure that all operations are applied in a manner that maintains the sequence’s consistency and order. Designing these merge functions to handle all possible states and operations efficiently is complex and error-prone.
- Latency and Performance Trade-offs: Ensuring that all operations are correctly ordered and merged can introduce significant latency, especially in systems with high network latency or frequent partitions.