Introduction
Rebalancing is the process of redistributing data and requests across nodes in a cluster to handle changes such as increased query throughput (requiring more CPUs), increased dataset size (requiring more disks and RAM), and machine failures (necessitating other machines to take over). This ensures efficient load management by moving data and requests from one node to another.
Regardless of the partitioning scheme, rebalancing must meet the following minimum requirements:
- After rebalancing, the load (data storage, read, and write requests) should be evenly distributed across the nodes.
- The database must continue accepting reads and writes during the rebalancing process.
- Only the necessary amount of data should be moved between nodes to ensure fast rebalancing and minimize network and disk I/O load.
Strategies for Rebalancing:
1. How not to do it: hash mod N (= hash(key) % N)
The problem with mod N
approach is that if the number of nodes changes, most of the keys will need to be moved from one node to another. Such frequent moves makes rebalancing excessively expensive!
2. Consistent Hashing
The goal of consistent hashing is to minimize the number of objects that need to be reassigned when the number of servers changes, ensuring that almost all objects remain assigned to the same server to maintain balance and minimize disruptions. Watch this: https://youtu.be/UF9Iqmg94tk?t=148
Read here: 2. Hash Based Partitioning(Hashing a key) - Consistent Hashing
Consistent hashing is a technique used in distributed systems to efficiently distribute data across multiple nodes while minimizing the disruption caused by the addition or removal of nodes. It is particularly useful in scenarios where the number of nodes can change dynamically, such as in distributed caches, load balancers, and databases.
Here’s how consistent hashing works:
- Hash Space Representation: Consistent hashing organizes the hash space into a circle, often referred to as a ring. This ring represents the entire range of hash values.
- Node Placement: Each node (server) in the distributed system is assigned a position on the ring based on a hash of its identifier (e.g., IP address, hostname). This placement is typically done using a hash function that generates a hash value for each node.
- Data Placement: Each piece of data (e.g., a cache entry, database record) is also hashed to determine its position on the ring. The data is then assigned to the first node that comes after its position in the clockwise direction on the ring.
- Handling Node Changes:
- Node Addition: When a new node is added, it is placed on the ring based on its hash value. Only the data that would now fall under the new node’s range (between the new node and its predecessor) needs to be moved to the new node. This minimizes the amount of data that needs to be redistributed.
- Node Removal: When a node is removed, its data is reassigned to the next node in the clockwise direction. Again, only a small portion of the data needs to be redistributed.
- Virtual Nodes: To improve load balancing and reduce data movement even further, consistent hashing can use virtual nodes. Each physical node is assigned multiple virtual nodes, each with its own position on the ring. This helps distribute the data more evenly and ensures that the removal or addition of a physical node has an even smaller impact on the data distribution.
Cons of consistent hashing are:
- Non-Uniform Distribution: Hash functions might not evenly distribute data, causing load imbalance.
- Increased Complexity: Implementation, especially with virtual nodes, adds complexity.
- Coordination Overhead: Node additions/removals require coordination, introducing latency.
- Partition Hotspots: Uneven key distribution can create hotspots with excessive traffic.
3. Fixed number of partitions:
To address load imbalance in consistent hashing, a system can use more partitions than nodes and assign multiple partitions to each node. For example, in a cluster with 10 nodes, the data could be divided into 1,000 partitions, with each node managing around 100 partitions. When a new node is added, it takes some partitions from existing nodes, and if a node is removed, its partitions are redistributed among remaining nodes.
Key points:
- More partitions than nodes are created (e.g., 1,000 partitions for 10 nodes).
- Nodes manage multiple partitions.
- Adding/removing nodes redistributes entire partitions.
- The number of partitions and key-to-partition assignments remain fixed.
- Reassignment of partitions is gradual, maintaining the old assignment during data transfer.
Cons:
Need to choose pretty good number. Too few : Partitions too big for DB. Too many: Overhead for disk to store partitions (of what partition means what, for example: p0: 0-72, p1: 73-115, …pN: 800-999).
Dynamic Partitioning:
Should our DB automatically split big partitions and merge small partitions, i.e., manage it’s own partitioning ?? What about re-balancing when a node goes down ??
Maybe, but if we do it too often we send tons of data over the network!
How do we ever know if our system is down? Solution : Distributed Consensus