Skip to content

Recovery & view change implementation + bug fixes.

Michael Whittaker requested to merge ir_recovery_squashed into master


This PR implements the view change and recovery portion of IR. It also fixes a couple of important bugs and introduces a new mechanism with with to unit test IR.

What's New?

This PR includes the following, in roughly descending order of importance:

  • The main contribution of this PR is the implementation of view changes and recovery. The implementation required some of the following things:
    • I introduce DoViewChangeMessage and StartViewMessage protobufs as well as protobufs for serializing records.
    • IR replicas have logic to periodically initiate a view change or initiate a view change upon recovery. They also have logic to handle DoViewChangeMessage and StartViewMessage messages.
    • IR replicas persist their view information, which is needed for recovery, using PeristentRegisters which persist data on disk.
    • When a client issues a PropseInconsistentMessage or ProposeConsistentMessage, the reply indicating whether the request has already been finalized. If it has, the clients respond appropriately.
    • For conensus requests, clients make sure that the majority of replies and majority of confirms come from the same view.
  • I fixed a slow path/fast path bug in IR clients. Previously, after a timeout, clients would transition from the fast path into the slow path. But, they would not wait for a majority of responses from replicas. Instead, they would call decide on whatever responses they had at the moment.
  • Previously, decide functions took in a set of replies, but they needed to take in a multiset of replies so that the decide function could do things like count the frequency of each reply.
  • I introduced a new simulated transport ReplTransport. It's pretty neat. It launches a command line REPL that you can use to manually control the execution of the system. You decide what timers to trigger and what messages to deliver. Then, you can save your execution as a unit test to run for later. I used the ReplTransport a lot to debug the view change and recovery algorithms since a lot of the corner cases are hard to trigger with the existing transport layers. I added some ReplTransport unit tests for the lock server.
  • I implemented Sync and Merge for the lock server. The lock server is now fully functional.
  • I generalized timeout callbacks to error callbacks which are invoked whenever an error (not just a timeout) is encountered. This was necessary to nicely handle some of the failure scenarios introduced by view changes (e.g. a client gets majority replies and majority confirms in different views for a consensus request).
  • I performed some miscellaneous cleanup here and there, fixing whitespace, changing raw pointers to smart pointers, stuff like that.

What's Left?

There are still some things left that this PR doesn't implement:

  • Currently during a view change, replicas send their entire records to one another. I'm guessing there are more efficient ways to transfer records between replicas, similar to how VR has some optimization tricks for log shipping.
  • I have not implemented Sync or Merge for TAPIR, only for the lock server.
  • Clients do not notify replicas of view changes when they receive replies from older views. Similarly, a replica never detects that its in a stale view or requests a master record from replica in a higher view. It just eventually does a view change to stay up-to-date.
  • None of the code has been profiled or optimized or anything like that. This PR focuses only on correctness, not performance.
  • There could be more unit tests.

Merge request reports