Skip to content
Snippets Groups Projects
  1. Apr 04, 2018
  2. Mar 30, 2018
  3. Mar 29, 2018
  4. Mar 27, 2018
  5. Mar 26, 2018
    • Michael Whittaker's avatar
      Recovery & view change implementation + bug fixes. · 0cd92291
      Michael Whittaker authored
      Overview
      ========
      This PR implements the view change and recovery portion of IR. It also
      fixes a couple of important bugs and introduces a new mechanism with
      with to unit test IR.
      
      What's New?
      ===========
      This PR includes the following, in roughly descending order of
      importance:
      
      - The main contribution of this PR is the implementation of view changes
        and recovery. The implementation required some of the following
        things:
          - I introduce DoViewChangeMessage and StartViewMessage protobufs as
            well as protobufs for serializing records.
          - IR replicas have logic to periodically initiate a view change or
            initiate a view change upon recovery. They also have logic to
            handle DoViewChangeMessage and StartViewMessage messages.
          - IR replicas persist their view information, which is needed for
            recovery, using PeristentRegisters which persist data on disk.
          - When a client issues a PropseInconsistentMessage or
            ProposeConsistentMessage, the reply indicating whether the request
            has already been finalized. If it has, the clients respond
            appropriately.
          - For conensus requests, clients make sure that the majority of
            replies and majority of confirms come from the same view.
      - I fixed a slow path/fast path bug in IR clients. Previously, after a
        timeout, clients would transition from the fast path into the slow
        path. But, they would not wait for a majority of responses from
        replicas. Instead, they would call decide on whatever responses they
        had at the moment.
      - Previously, decide functions took in a set of replies, but they needed
        to take in a multiset of replies so that the decide function could do
        things like count the frequency of each reply.
      - I introduced a new simulated transport ReplTransport. It's pretty
        neat. It launches a command line REPL that you can use to manually
        control the execution of the system. You decide what timers to trigger
        and what messages to deliver. Then, you can save your execution as a
        unit test to run for later. I used the ReplTransport a lot to debug
        the view change and recovery algorithms since a lot of the corner
        cases are hard to trigger with the existing transport layers. I added
        some ReplTransport unit tests for the lock server.
      - I implemented Sync and Merge for the lock server. The lock server is
        now fully functional.
      - I generalized timeout callbacks to error callbacks which are invoked
        whenever an error (not just a timeout) is encountered. This was
        necessary to nicely handle some of the failure scenarios introduced by
        view changes (e.g. a client gets majority replies and majority
        confirms in different views for a consensus request).
      - I performed some miscellaneous cleanup here and there, fixing
        whitespace, changing raw pointers to smart pointers, stuff like that.
      
      What's Left?
      ============
      There are still some things left that this PR doesn't implement:
      
      - Currently during a view change, replicas send their entire records to
        one another. I'm guessing there are more efficient ways to transfer
        records between replicas, similar to how VR has some optimization
        tricks for log shipping.
      - I have not implemented Sync or Merge for TAPIR, only for the lock
        server.
      - Clients do not notify replicas of view changes when they receive
        replies from older views. Similarly, a replica never detects that its
        in a stale view or requests a master record from replica in a higher
        view. It just eventually does a view change to stay up-to-date.
      - None of the code has been profiled or optimized or anything like that.
        This PR focuses only on correctness, not performance.
      - There could be more unit tests.
      0cd92291
  6. Mar 21, 2018
  7. Mar 01, 2018
  8. Feb 28, 2018
  9. Feb 22, 2018
  10. Nov 17, 2017
    • Irene Zhang's avatar
      Merge pull request #8 from mwhittaker/lockserver_client_fixes · 46488d1b
      Irene Zhang authored
      A couple small lockserver client improvements.
    • Michael Whittaker's avatar
      A couple small lockserver client improvements. · 679fcb9f
      Michael Whittaker authored
      1. Previously, typing ` `, `lock`, or `unlock` into a lockserver
         client's REPL would cause the client to crash because of some
         `strtok` calls that were returning `NULL`. Now, this bug is fixed;
         `strtok` returns are checked to be `NULL`, and no input should crash
         the client.
      2. Previously, typing in an unrecognized command into the lockserver
         client's repl would print `Unknown command.. Try again!`. Now, it
         also prints out the set of legal commands `Usage: exit | q | lock
         <key> | unlock <key>`. This makes it a bit easier for someone
         tinkering around with TAPIR for the first time (like me) to know what
         to type.
      679fcb9f
  11. Jun 22, 2017
  12. Jun 08, 2017
  13. Feb 08, 2017
  14. Jan 18, 2017
  15. Jan 17, 2017
  16. Jan 04, 2017
  17. Aug 09, 2016
  18. May 13, 2016
  19. Feb 22, 2016
Loading