... | ... | @@ -9,7 +9,6 @@ |
|
|
In this lab you'll add persistence to your key/value server. The overall goal is to be able to recover after the crash and restart of one or more key/value servers. It's this capability that makes fault-tolerance really valuable! The specific properties you'll need to ensure are:
|
|
|
|
|
|
* If a key/value server crashes (halts cleanly with disk intact), and is re-started, it should re-join its replica group. The effect on availability of one or more such crashed key/value servers should be no worse than if the same servers had been temporarily disconnected from the network rather than crashing. This ability to re-start requires that each replica save its key/value database, Paxos state, and any other needed state to disk where it can find it after the re-start.
|
|
|
* If a key/value server crashes (halts cleanly) and loses its disk contents, and is re-started, it should acquire a key/value database and other needed state from the other replicas and re-join its replica group. If a majority of a replica group simultaneously loses disk contents, the group cannot ever continue. If a minority simultaneously lose their disk content, and re-start, the group must recover so that it can tolerate future crashes.
|
|
|
|
|
|
You do not need to design a high-performance format for the on-disk data. It is sufficient for a server to store each key/value pair in a separate file, and to use a few more files to store its other state.
|
|
|
|
... | ... | @@ -46,8 +45,6 @@ After merging your Lab 4 code into diskv, you should be able to pass the tests t |
|
|
|
|
|
**Hint:** You can run the Lab 4 tests with: `go test -run Test4`
|
|
|
|
|
|
**Hint:** If a server crashes, loses its disk, and re-starts, a potential problem is that it could participate in Paxos instances that it had participated in before crashing. Since the server has lost its Paxos state, it won't participate correctly in these instances. So you must find a way to ensure that servers that re-join after disk loss only participate in new instances.
|
|
|
|
|
|
**Hint:** diskv/server.go includes some functions that may be helpful to you when reading and writing files containing key/value data.
|
|
|
|
|
|
**Hint:** You may want to use Go's gob package to format and parse saved state. Here's an example. As with RPC, if you want to encode structs with gob, you must capitalize the field names.
|
... | ... | @@ -89,7 +86,13 @@ func main() { |
|
|
|
|
|
The Lab 5 tester will kill key/value servers so that they stop executing at a random place, which was not the case in previous labs. One consequence is that, if your server is writing a file, the tester might kill it midway through writing the file (much as a real crash might occur while writing a file). A good way to cause replacement of a whole file to nevertheless be atomic is to write to a temporary file in the same directory, and then call os.Rename(tempname,realname).
|
|
|
|
|
|
You'll probably have to modify your Paxos implementation, at least so that it's possible to save and restore a key/value server's Paxos state.
|
|
|
You'll need to modify your Paxos implementation, so that it's possible to save and restore a key/value server's Paxos state. Think about what state is important to write and the contract between the proposer and the acceptors in the Paxos protocol. You'll also need to modify the interface to your Paxos to accept a `restart` flag.
|
|
|
|
|
|
Some of the tests will make sure that your implementation is not using too much disk. Be sure to clean up both your on-disk key/value server state as well as your on-disk Paxos state.
|
|
|
|
|
|
**Hint:** Make sure all of your key/value servers bring their Paxos log up-to-date periodically. Otherwise, your log may never be garbage collected. One way to do this is to submit an operation to the Paxos library on every `tick` event in the key/value server.
|
|
|
|
|
|
**Hint:** You might find your test are running very slowly. This is fine. It may take up to 170 seconds to finish all of the tests for this lab.
|
|
|
|
|
|
Don't run multiple instances of the Lab 5 tests at the same time on the same machine. They will remove each others' files.
|
|
|
|
... | ... | |