Raft Basics

  • Leader based:
    • Pick one server to be leader
    • It accepts client requests, manages replication
    • Pick a new leader if the current one fails
  • Server states:
    • Leader
    • Follower: totally passive
    • Candidate: used for electing new leaders
  • Time divided into terms:
    • Terms have numbers that increase monotonically ().
    • Each term starts with an election
      • One or more candidates attempting to become leader
    • Winning candidate (if any) serves as leader for the rest of the term
    • Terms allow detection of stale information
      • Each server stores current term
      • Checked on every request
      • Term numbers increase monotonically
    • Different servers observe term transitions differently
  • Request-response protocol between servers (remote procedure calls, or RPCs). 2 request types:
    • RequestVote
    • AppendEntries (also serves as heartbeat)

Leader Election

  • All servers start as followers
  • No heartbeat (AppendEntries)? Start election:
    • Increment current term
    • Vote for self
    • Send RequestVote requests to all other servers
  • Voting Process:
    1. Term Comparison: The candidate’s term must be at least as recent as the voter’s current term.
    2. Log Completeness: The candidate’s log must be at least as up-to-date as the voter’s log.
    3. A node can only vote for one candidate per term, and if it has already voted for another candidate or if the candidate’s log is not sufficiently up-to-date, it will reject the vote request.
  • Election:
    • The follower transitions to the candidate state.
    • Increment current term
    • Send RequestVote requests to all other servers
  • Election outcomes:
    1. Majority (Quorums) Votes: The candidate receives votes from a majority (quorum) of the servers. It becomes the leader and begins sending heartbeats to prevent further elections.
    2. Receiving AppendEntries: The candidate receives AppendEntries from another server with a higher term. It reverts to the follower state.
    3. Election Timeout: The candidate does not receive a majority or a heartbeat within the election timeout, so it starts a new election cycle.
    4. Network Partition: If the network is partitioned, a minority partition may repeatedly try and fail to elect a leader. The servers in this partition will continuously time out and start new election cycles. However, a majority partition could still elect a leader and continue operations, while the minority partition remains in a perpetual election state until network connectivity is restored.
    5. Split Vote: If two or more candidates receive the same number of votes without reaching a majority, this is called a split vote. In this case, each candidate will eventually time out and start a new election. The randomness in election timeouts usually ensures that one candidate will eventually win a majority in the subsequent elections.
    6. Leader Crashes or Network Failures After Election: If a leader crashes or is isolated due to network failures after winning the election, the followers will detect the lack of heartbeats and start a new election cycle.
  • Each server votes for at most one candidate in a given term
  • Election Safety: only one server can be elected leader in a given term
  • Availability: randomized election timeouts reduce split votes.

Safety

  • Must ensure that the leader for new term always holds all of the log entries committed in previous terms (Leader Completeness Property).
  • Step 1: restriction on elections: don’t vote for a candidate unless candidate’s log is at least as up-to-date as yours.
  • Compare indexes and terms from last log entries.
  • Step 2: be very careful about when an entry is considered committed
    • As soon as replicated on majority? Unsafe!
    • Committed ⇒ entry in current term replicated on majority
      • All preceding entries also committed because of Log Matching Property
Credits:

https://web.stanford.edu/~ouster/cgi-bin/cs190-winter20/lecture.php?topic=raft