Raft Basics
- Leader based:
- Pick one server to be leader
- It accepts client requests, manages replication
- Pick a new leader if the current one fails
- Server states:
- Leader
- Follower: totally passive
- Candidate: used for electing new leaders
- Time divided into terms:
- Terms have numbers that increase monotonically ().
- Each term starts with an election
- One or more candidates attempting to become leader
- Winning candidate (if any) serves as leader for the rest of the term
- Terms allow detection of stale information
- Each server stores current term
- Checked on every request
- Term numbers increase monotonically
- Different servers observe term transitions differently
- Request-response protocol between servers (remote procedure calls, or RPCs). 2 request types:
- RequestVote
- AppendEntries (also serves as heartbeat)
Leader Election
- All servers start as followers
- No heartbeat (AppendEntries)? Start election:
- Increment current term
- Vote for self
- Send RequestVote requests to all other servers
- Voting Process:
- Term Comparison: The candidate’s term must be at least as recent as the voter’s current term.
- Log Completeness: The candidate’s log must be at least as up-to-date as the voter’s log.
- A node can only vote for one candidate per term, and if it has already voted for another candidate or if the candidate’s log is not sufficiently up-to-date, it will reject the vote request.
- Election:
- The follower transitions to the candidate state.
- Increment current term
- Send RequestVote requests to all other servers
- Election outcomes:
- Majority (Quorums) Votes: The candidate receives votes from a majority (quorum) of the servers. It becomes the leader and begins sending heartbeats to prevent further elections.
- Receiving AppendEntries: The candidate receives AppendEntries from another server with a higher term. It reverts to the follower state.
- Election Timeout: The candidate does not receive a majority or a heartbeat within the election timeout, so it starts a new election cycle.
- Network Partition: If the network is partitioned, a minority partition may repeatedly try and fail to elect a leader. The servers in this partition will continuously time out and start new election cycles. However, a majority partition could still elect a leader and continue operations, while the minority partition remains in a perpetual election state until network connectivity is restored.
- Split Vote: If two or more candidates receive the same number of votes without reaching a majority, this is called a split vote. In this case, each candidate will eventually time out and start a new election. The randomness in election timeouts usually ensures that one candidate will eventually win a majority in the subsequent elections.
- Leader Crashes or Network Failures After Election: If a leader crashes or is isolated due to network failures after winning the election, the followers will detect the lack of heartbeats and start a new election cycle.
- Each server votes for at most one candidate in a given term
- Election Safety: only one server can be elected leader in a given term
- Availability: randomized election timeouts reduce split votes.
Safety
- Must ensure that the leader for new term always holds all of the log entries committed in previous terms (Leader Completeness Property).
- Step 1: restriction on elections: don’t vote for a candidate unless candidate’s log is at least as up-to-date as yours.
- Compare indexes and terms from last log entries.
- Step 2: be very careful about when an entry is considered committed
- As soon as replicated on majority? Unsafe!
- Committed ⇒ entry in current term replicated on majority
- All preceding entries also committed because of Log Matching Property
Credits:
https://web.stanford.edu/~ouster/cgi-bin/cs190-winter20/lecture.php?topic=raft