In this lab you'll build a MapReduce library as a way to learn the Go
programming language and as a way to learn about fault tolerance in distributed
systems. In the first part you will write a simple MapReduce program. In the
second part you will write a Master that hands out jobs to workers, and handles
failures of workers. The interface to the library and the approach to fault
tolerance is similar to the one described in the original
You must write all the code you hand in for 452, except for code that we give
you as part of the assignment. You are not allowed to look at anyone else's
solution, and you are not allowed to look at code from previous years. You may
discuss the assignments with other students, but you may not look at or copy
each others' code. Please do not publish your code or make it available to
future 452 students -- for example, please do not make your code public on
Undergrads taking 452 may do the labs with a partner. Masters students should
complete the labs individually.
You'll implement this lab (and all the labs) in Go >=1.5. This version is
available as packages for Linux and Mac OSX through MacPorts. You
can also download binaries from the Go web site.
The Go web site contains lots of tutorial information which you may want to look
at. We supply you with a non-distributed MapReduce implementation, and a partial
implementation of a distributed implementation (just the boring bits).
You'll fetch the initial lab software with git (a version control system). To
learn more about git, take a look at the git user's manual, or, if
you are already familiar with other version control systems, you may find this
CS-oriented overview of git useful.
We'll be using our new department git server, which is hosted at
gitlab.cs.washington.edu. This is basically a local deployment of GitHub, so
if you have used GitHub before, you are all set! Before downloading the
software, you'll need to sign in and upload your ssh key. A handy tutorial on
how to generate and upload your ssh key is available from GitHub.
$ git clone firstname.lastname@example.org:dwoos/452-labs.git 452-labs$ cd 452-labs$ lssrc README.md
Git allows you to keep track of the changes you make to the code. For example,
if you want to checkpoint your progress, you can commit your changes by running:
$ git commit -am'partial solution to lab 1'
If you would like to host your code on gitlab as well, the easiest way is to
fork the 452-labs repo and make your edits to your fork. [Instructions on how to
fork a repo.][fork-rep] If you do fork the repo, do not forget to keep your fork
synced! We may be adding hints to the code throughout the quarter. Also, be sure
to make your repo private.
There is an input file kjv12.txt in ~/452-labs/src/main, which was
downloaded from here. Compile the initial software we provide you and run
it with the downloaded input file:
$ export GOPATH=$HOME/452-labs # Or, the location of the repo's root directory$ cd$GOPATH/src/main$ go run wc.go master kjv12.txt sequential# command-line-arguments./wc.go:11: missing return at end of function./wc.go:15: missing return at end of function
The compiler produces two errors, because the implementation of the Map and
Reduce functions is incomplete.
Part I: Word count
Modify Map and Reduce so that wc.go reports the number of occurrences of
each word in alphabetical order.
and it will report if your solution is correct or not.
Before you start coding read Section 2 of the MapReduce paper.
Your Map() and Reduce() functions will differ a bit from those in the
paper's Section 2.1. Your Map() will be passed some of the text from the file;
it should split it into words, and return a list.List of key/value pairs, of
typemapreduce.KeyValue. Your Reduce() will be called once for each key,
with a list of all the values generated by Map() for that key; it should
return a single output value.
It will help to read our code for mapreduce, which is in mapreduce.go in
package mapreduce. Look at RunSingle() and the functions it calls. This well
help you to understand what MapReduce does and to learn Go by example.
Once you understand this code, implement Map and Reduce in wc.go.
Hint: you can use strings.FieldsFunc to split a string into
components. The following lambda expression will be useful for checking whether
a character is a letter:
Hint: for the purposes of this exercise, you can consider a word to be any
contiguous sequence of letters, as determined by unicode.IsLetter. A
good read on what strings are in Go is the Go Blog on strings.
Hint: The solution to this part of the lab should take you about 10 lines of
code. If your code is much >100 LoC, you might want to discuss your design with
You can remove the output file and all intermediate files with:
$ rm mrtmp.*
Part II: Distributing MapReduce jobs
In this part you will complete a version of MapReduce that splits the work up
over a set of worker threads, in order to exploit multiple cores. A master
thread hands out work to the workers and waits for them to finish. The master
should communicate with the workers via RPC. We give you the worker code
(mapreduce/worker.go), the code that starts the workers, and code to deal with
RPC messages (mapreduce/common.go).
Your job is to complete master.go in the mapreduce package. In particular, you
should modify RunMaster() in master.go to hand out the map and reduce jobs to
workers, and return only when all the jobs have finished.
Look at Run() in mapreduce.go. It calls Split() to split the input into
per-map-job files, then calls your RunMaster() to run the map and reduce jobs,
then calls Merge() to assemble the per-reduce-job outputs into a single output
file. RunMaster only needs to tell the workers the name of the original input
file (mr.file) and the job number; each worker knows from which files to read
its input and to which files to write its output.
Each worker sends a Register RPC to the master when it starts. mapreduce.go
already implements the master's MapReduce.Register RPC handler for you, and
passes the new worker's information to mr.registerChannel. Your RunMaster
should process new worker registrations by reading from this channel.
Information about the MapReduce job is in the MapReduce struct, defined in
mapreduce.go. Modify the MapReduce struct to keep track of any additional
state (e.g., the set of available workers), and initialize this additional state
in the InitMapReduce() function. The master does not need to know which Map or
Reduce functions are being used for the job; the workers will take care of
executing the right code for Map or Reduce.
You should run your code using Go's unit test system. We supply you with a set
of tests in test_test.go. You run unit tests in a package directory (e.g., the
mapreduce directory) as follows:
$ cd mapreduce $ go test
You are done with Part II when your implementation passes the first test (the
"Basic mapreduce" test) in test_test.go in the mapreduce package. You don't
yet have worry about failures of workers.
The master should send RPCs to the workers in parallel so that the workers can
work on jobs concurrently. You will find the go statement useful for this
purpose and the Go RPC documentation.
The master may have to wait for a worker to finish before it can hand out more
jobs. You may find channels useful to synchronize threads that are waiting for
reply with the master once the reply arrives. Channels are explained in the
document on Concurrency in Go.
The easiest way to track down bugs is to insert log.Printf() statements,
collect the output in a file with go test > out, and then think about whether
the output matches your understanding of how your code should behave. The last
step (thinking) is the most important.
The code we give you runs the workers as threads within a single UNIX process,
and can exploit multiple cores on a single machine. Some modifications would be
needed in order to run the workers on multiple machines communicating over a
network. The RPCs would have to use TCP rather than UNIX-domain sockets; there
would need to be a way to start worker processes on all the machines; and all
the machines would have to share storage through some kind of network file
You will see some error messages that are safe to ignore. These will looks
something like this:
2016/01/04 11:44:52 method CleanupFiles has wrong number of ins: 12016/01/04 11:44:52 method CleanupRegistration has wrong number of ins: 12016/01/04 11:44:52 method KillWorkers has wrong number of ins: 12016/01/04 11:44:52 method Merge has wrong number of ins: 12016/01/04 11:44:52 method ProcessJobs has wrong number of ins: 12016/01/04 11:44:52 method Run has wrong number of ins: 12016/01/04 11:44:52 method RunMaster has wrong number of ins: 12016/01/04 11:44:52 method Split has wrong number of ins: 22016/01/04 11:44:52 method StartRegistrationServer has wrong number of ins: 1
The important thing to look for is a PASS at the end of your output indicating
that your implementation has passed all of the unit tests. If you would like to
not see these errors, we have provided a script (/bin/run-tests) which you can
run in place of go test.
Hint: Use a select to check for new worker registrations as well as
checking for finished workers that need new jobs.
Hint: Think about how you might handle jobs that do not finish and have to
be restarted. This will limit how much re-design you will have to do in the next
section when you have to handle worker failures.
Part III: Handling worker failures
In this part you will make the master handle failed workers. MapReduce makes
this relatively easy because workers don't have persistent state. If a worker
fails, any RPCs that the master issued to that worker will fail (e.g., due to a
timeout). Thus, if the master's RPC to the worker fails, the master should
re-assign the job given to the failed worker to another worker.
An RPC failure doesn't necessarily mean that the worker failed; the worker may
just be unreachable but still computing. Thus, it may happen that two workers
receive the same job and compute it. However, because jobs are idempotent, it
doesn't matter if the same job is computed twice---both times it will generate
the same output. So, you don't have to anything special for this case. (Our
tests never fail workers in the middle of job, so you don't even have to worry
about several workers writing to the same output file.)
You don't have to handle failures of the master; we will assume it won't fail.
Making the master fault-tolerant is more difficult because it keeps persistent
state that would have to be recovered in order to resume operations after a
master failure. Much of the later labs are devoted to this challenge.
Your implementation must pass the two remaining test cases in test_test.go. The
first case tests the failure of one worker. The second test case tests handling
of many failures of workers. Periodically, the test cases start new workers that
the master can use to make forward progress, but these workers fail after
handling a few jobs.
Hint: The solution to Parts II and III of the lab should be around 60 lines
Make sure that you have done the following:
COMMENT your code! We should be able to understand what your code is doing.
Make sure all of your code passes the test cases. Do not modify them, as we
will be replacing all test_test.go files before grading.
Add a README.lab1 in 452-labs/src with:
Your partner's name (if you had one)
How many hours you each spent on the lab.
A high-level description of your design.
Run make lab1.tar.gz from the repo's root directory to build the file you
should turn in.