- Mar 26, 2018
-
-
Michael Whittaker authored
Overview ======== This PR implements the view change and recovery portion of IR. It also fixes a couple of important bugs and introduces a new mechanism with with to unit test IR. What's New? =========== This PR includes the following, in roughly descending order of importance: - The main contribution of this PR is the implementation of view changes and recovery. The implementation required some of the following things: - I introduce DoViewChangeMessage and StartViewMessage protobufs as well as protobufs for serializing records. - IR replicas have logic to periodically initiate a view change or initiate a view change upon recovery. They also have logic to handle DoViewChangeMessage and StartViewMessage messages. - IR replicas persist their view information, which is needed for recovery, using PeristentRegisters which persist data on disk. - When a client issues a PropseInconsistentMessage or ProposeConsistentMessage, the reply indicating whether the request has already been finalized. If it has, the clients respond appropriately. - For conensus requests, clients make sure that the majority of replies and majority of confirms come from the same view. - I fixed a slow path/fast path bug in IR clients. Previously, after a timeout, clients would transition from the fast path into the slow path. But, they would not wait for a majority of responses from replicas. Instead, they would call decide on whatever responses they had at the moment. - Previously, decide functions took in a set of replies, but they needed to take in a multiset of replies so that the decide function could do things like count the frequency of each reply. - I introduced a new simulated transport ReplTransport. It's pretty neat. It launches a command line REPL that you can use to manually control the execution of the system. You decide what timers to trigger and what messages to deliver. Then, you can save your execution as a unit test to run for later. I used the ReplTransport a lot to debug the view change and recovery algorithms since a lot of the corner cases are hard to trigger with the existing transport layers. I added some ReplTransport unit tests for the lock server. - I implemented Sync and Merge for the lock server. The lock server is now fully functional. - I generalized timeout callbacks to error callbacks which are invoked whenever an error (not just a timeout) is encountered. This was necessary to nicely handle some of the failure scenarios introduced by view changes (e.g. a client gets majority replies and majority confirms in different views for a consensus request). - I performed some miscellaneous cleanup here and there, fixing whitespace, changing raw pointers to smart pointers, stuff like that. What's Left? ============ There are still some things left that this PR doesn't implement: - Currently during a view change, replicas send their entire records to one another. I'm guessing there are more efficient ways to transfer records between replicas, similar to how VR has some optimization tricks for log shipping. - I have not implemented Sync or Merge for TAPIR, only for the lock server. - Clients do not notify replicas of view changes when they receive replies from older views. Similarly, a replica never detects that its in a stale view or requests a master record from replica in a higher view. It just eventually does a view change to stay up-to-date. - None of the code has been profiled or optimized or anything like that. This PR focuses only on correctness, not performance. - There could be more unit tests.
-
- Mar 21, 2018
-
-
Irene Y Zhang authored
-
Irene Y Zhang authored
-
- Mar 01, 2018
-
-
Irene Y Zhang authored
Added a gitignore to the project. See merge request !1
-
Irene Y Zhang authored
Added debugging rule to Makefile. See merge request syslab/tapir!2
-
Irene Y Zhang authored
Removed `goto fail` code with `unique_ptr`. See merge request syslab/tapir!3
-
Michael Whittaker authored
Previously, `UDPTransport::SendMessageInternal` dynamically allocated a `char[]` and used a `goto fail` to make sure that it was properly deleted. Something like: ```c++ char *buf = new char[100]; if (...) { ... goto fail; } else if (...) { ... goto fail; } else { ... } fail: delete [] buf; return false; ``` Now, the array is stored in a `unique_ptr` so that it's properly deallocated when the function returns, without needing the goto fail.
-
Michael Whittaker authored
Added a debugging Makefile rule stolen from [1]. With this rule, you can run make print-<VAR> to print out the contents of <VAR>. For example, make print-OBJS make print-PROTOOBJS make print-BINS [1]: https://blog.melski.net/2010/11/30/makefile-hacks-print-the-value-of-any-variable/
-
Michael Whittaker authored
Previously, directories like `.obj/` and files like `libtapir.so` weren't being ignored by git. Now they are! This .gitignore ignores vim, c++, and project specific stuff.
-
- Feb 28, 2018
-
-
irene authored
-
- Feb 22, 2018
-
-
Naveen Kr. Sharma authored
-
Naveen Kr. Sharma authored
-
- Nov 17, 2017
-
-
Irene Zhang authored
A couple small lockserver client improvements.
-
Michael Whittaker authored
1. Previously, typing ` `, `lock`, or `unlock` into a lockserver client's REPL would cause the client to crash because of some `strtok` calls that were returning `NULL`. Now, this bug is fixed; `strtok` returns are checked to be `NULL`, and no input should crash the client. 2. Previously, typing in an unrecognized command into the lockserver client's repl would print `Unknown command.. Try again!`. Now, it also prints out the set of legal commands `Usage: exit | q | lock <key> | unlock <key>`. This makes it a bit easier for someone tinkering around with TAPIR for the first time (like me) to know what to type.
-
- Jun 22, 2017
-
-
Irene Zhang authored
fix building on Apple clang
-
Maxime Caron authored
-
- Jun 08, 2017
-
-
Irene Y Zhang authored
-
- Feb 08, 2017
-
-
Irene Y Zhang authored
-
Irene Y Zhang authored
-
Irene Y Zhang authored
-
- Jan 18, 2017
-
-
Naveen Kr. Sharma authored
-
Naveen Kr. Sharma authored
-
Naveen Kr. Sharma authored
-
- Jan 17, 2017
-
-
Naveen Kr. Sharma authored
-
Naveen Kr. Sharma authored
-
- Jan 04, 2017
-
-
Naveen Kr. Sharma authored
-
- Aug 09, 2016
-
-
Irene Y Zhang authored
-
- May 13, 2016
-
-
Irene Y Zhang authored
-
- Feb 22, 2016
-
-
Naveen Kr. Sharma authored
-
Naveen Kr. Sharma authored
-
Naveen Kr. Sharma authored
-
Naveen Kr. Sharma authored
-
Naveen Kr. Sharma authored
-
Irene Y Zhang authored
This reverts commit 4791cfd3.
-
Irene Y Zhang authored
-
Irene Y Zhang authored
-
Irene Y Zhang authored
-
Irene Y Zhang authored
-
Irene Y Zhang authored
fixing one bug in TAPIR (not handling abstain) and one in IR, not correctly passing back decide result
-
- Feb 11, 2016
-
-
Naveen Kr. Sharma authored
-