notes.md 12.9 KB
Newer Older
James R. Wilcox's avatar
James R. Wilcox committed
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
# Fuzzing Trefoil v2

As we've discussed before, the Trefoil v2 interpreter processes its input in
several phases:
- tokenization
- PST parsing
- PST->AST conversion/parsing
- interpretation

The idea with fuzzing is to generate random inputs in an attempt to discover
errors. We can generate random inputs at different levels of structure, for
example, generating a random string character by character, or generating a
random sequence of tokens, etc. These levels of structure roughly correspond to
the phases of the interpreter's processing. For example, if we generate the
input character by character, we are unlikely to be lucky enough to write a long
program. On the other hand, if we use a very structured fuzzer that only
generates syntactically correct programs, then we will not be testing the error
cases of the parser very deeply. These differences in coverage mean that "more
structured" fuzzers are not "better". Instead, it is useful to have multiple
fuzzers at different levels of structure, so that all the phases of our
interpreter are covered.

This week we will look at four fuzzers:
- character fuzzer
- token fuzzer
- token fuzzer with balanced parentheses
- grammar fuzzer

## Character fuzzer

Similar to one of our Trefoil v1 fuzzers, this fuzzer generates inputs character
by character. Running `test01CharFuzz` from the provided code for this week
results in a report whose last line looks like this (split across multiple lines
here for readability):

```
James R. Wilcox's avatar
James R. Wilcox committed
37
38
39
40
Paren:26172
Abstract:4571
Unbound Variables:957975
Unbound Functions:6793
James R. Wilcox's avatar
James R. Wilcox committed
41
42
Other Runtime:0
StackOverflows:0
James R. Wilcox's avatar
James R. Wilcox committed
43
Programs:4489
James R. Wilcox's avatar
James R. Wilcox committed
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
Total:1000000
```

The report indicates at the bottom that 1 million strings were generated. The
other lines describe what happened when those 1 million "programs" were
executed.
- 96% of the results are unbound variable errors ("`Unbound Variables`"). This
  happens any time the program starts with a character that is a valid variable
  name (not a parenthesis, semicolon, or integer), which happens a lot.
- 2.5% of the results are parenthesized syntax errors, due to unbalanced
  parentheses (either closing a paren that was never open, or reaching end of
  input without closing all parens).
- 0.5% of the results are unbound functions. This happens when the fuzzer
  generates an open parenthesis as the first character and then generates a
  non-keyword function name, which is in all likelihood not found.
- Another 0.5% of the results have abstract syntax errors ("`Abstract`").
  In this fuzzer, this primarily happens when the first character of the input
  is a built-in operator (e.g. `+`, `=`, etc.). The staff solution requires
  variable names not to conflict with built-in operators, so this is considered
  an abstract syntax error by us. (This is described in the spec, but was
  missing from the starter code, so your solution is free to not do this and may
  report these as unbound variables instead, which is totally fine.)
- The last 0.5% of results are valid programs that run to completion. Almost all
  of these succeed in large part to being mostly commented out.

All in all, this fuzzer does not get very deep into the interpreter.

James R. Wilcox's avatar
James R. Wilcox committed
71
72
73
74
75
76
### The generating procedure

This fuzzer's generation code is pretty simple. We just generate random strings
of a particular length by uniformly selecting each character from a set of
"good" characters.

James R. Wilcox's avatar
James R. Wilcox committed
77
## Token fuzzer
James R. Wilcox's avatar
James R. Wilcox committed
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126

This one is also similar in spirit to our Trefoil v1 token fuzzer. In Trefoil
v2, there are more tokens.

Let's take a look at the numbers.

```
Paren:356556
Abstract:464582
Unbound Variables:162126
Unbound Functions:11605
Other Runtime:844
StackOverflows:0
Programs:4287
Total:1000000
```

The distribution of outcomes is very different from before!

- 46% of results are abstract syntax errors. These primarily stem from using
  built-in keywords like `let` as variable names, or, if the fuzzer gets lucky
  and balances the parentheses, then it often passes the wrong number of
  arguments. (As before, if your code follows the starter code and does not
  check for keywords, you will instead see more "Unbound Variable" errors here,
  which is fine too.)
- 36% of results are parenthesized syntax errors. These are due to unbalanced
  parens. This outcome is more common with the token fuzzer than the character
  fuzzer because the character fuzzer struggles to even generate a paren
  character most of the time, so it doesn't even get to the point of not
  balancing them.
- 16% of results are unbound variables. This result is much less common now not
  because the fuzzer is smarter, but because the token distribution favors
  generating built-in symbols over random variable names.
- 1% are unbound functions.
- 0.08% are "other runtime" errors, such as applying `+` to non-integers. We are
  excited to see this error show up, as it indicates our fuzzer is starting to
  get deeper into the interpreter! The character fuzzer has almost no chance of
  generating these programs.
- 0.5% are valid programs that run to completion. Of these, 90% start with the
  comment character :) The other 10% mostly start with an integer literal,
  followed by a comment character. Not the most interesting programs in the
  world.

### The generating procedure

This fuzzer generates input by generating a random sequence of 100 tokens. Each
token is drawn from a custom distribution that we made up because it seemed
reasonable. See the code documentation for `genTokenStringInto` for more
information, and feel free to play around with tweaking the token distribution!
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187

## Token fuzzer with balanced parentheses

Now we start to diverge from our Trefoil v1 fuzzers as we add more structure. In
Trefoil v1, our next step was to introduce a fuzzer that never underflowed the
stack. That fuzzer's structure reflected the structure of the language being
fuzzed. In Trefoil v2, we don't have a stack, but we do have a similar kind of
common error that we can try to eliminate: unbalanced parentheses.

This fuzzer keeps track of how many open parentheses it has generated so that it
knows to close exactly the right amount.

Let's take a look at the numbers.

```
Paren:0
Abstract:695841
Unbound Variables:285036
Unbound Functions:18075
Other Runtime:1030
StackOverflows:0
Programs:18
Total:1000000
```

The first thing to notice is that there are 0 parenthesized symbol errors. We
succeeded in our goal of structurally eliminating this class of error from our
fuzzers output! This is a very typical workflow for fuzzing: refining the output
by adding structure to eliminate a class of error that we are no longer
interested in testing, so that we can go deeper in other parts of the input
space.

Since parenthesized syntax errors used to make up nearly 30% of the results,
generally we expect all other results to become more frequent.

- 70% of results are now abstract syntax error. In the staff solution, most of
  these are from keyword errors, but also a good number from generating the
  empty set of parens `()` (never valid in Trefoil v2) or from applying an
  operator to the wrong number of arguments.
- 30% of results are unbound variables or functions
- About the same number of results (0.1%) are other runtime errors.
- Perhaps surprisingly, there are many fewer valid programs (essentially zero --
  only 18 out of 1 million). This is because this fuzzer does not generate
  comment characters, so it is *much* harder for the fuzzer to generate 100
  tokens without making a mistake.

  Here is one lucky program from our fuzzer:

  ```
  -3 -8 -6 ( * 6 -1632566905 ) -190 -10
  ```

  Mostly just a bunch of literals, but one correct multiplication operation!
  This is pretty impressive, because each token of this input was generated
  randomly. So it had to get lucky to open the parentheses, close the
  parentheses, and generate the correct number of arguments (of the correct
  type!). Now you see why this almost never happens.

### The generating procedure

Similar to the previous fuzzer except that it tracks an `int` that counts the
thiarichey's avatar
thiarichey committed
188
189
190
number of open-but-not-yet closed parentheses. After generating our token string,
if this `int` is not equal to 0, then we need to add closed parentheses
onto the end of the string until it is! In order to make this tracking
191
192
easier, this fuzzer does not generate comments (since parentheses inside
comments wouldn't count).
James R. Wilcox's avatar
James R. Wilcox committed
193
194
195
196
197
198
199

## Grammar fuzzer

Our previous fuzzer structurally eliminated balanced parentheses errors. Yay!
But the majority of the resulting programs had abstract syntax errors. Let's now
structurally eliminate those.

thiarichey's avatar
thiarichey committed
200
201
202
203
204
205
The general approach here is similar to the one we took to generate balanced 
parentheses, but more complicated. Just as we know a valid Trefoil V2 node is 
parenthesized, we know that a valid Trefoil V2 program is composed of a sequence 
of bindings (whose arguments can be expressions!). So, let's think about how to 
generate expressions and bindings.

James R. Wilcox's avatar
James R. Wilcox committed
206
207
208
209
210
211
212
The strategy will be to think about generating *trees* (in particular, ASTs)
instead of sequences of tokens. At each step, the fuzzer will randomly decide
what kind of binding to produce, and then for that binding, will generate the
right number of arguments. Similarly, when generating expressions, it will first
select what kind of expression to generate, and then generate the right number
for arguments.

thiarichey's avatar
thiarichey committed
213
Let's take a look at the numbers.
James R. Wilcox's avatar
James R. Wilcox committed
214
215
216
217
218
219
220
221
222
223
224
225

```
Paren:0
Abstract:0
Unbound Variables:307831
Unbound Functions:195843
Other Runtime:443764
StackOverflows:0
Programs:52562
Total:1000000
```

thiarichey's avatar
thiarichey committed
226
227
228
We've made a *lot* of progress! Valid programs now account for about 5% of our
results, up from virtually none before. This is because we have eliminaated abstract 
syntax errors thanks to our AST-based approach to generating bindings; however, 
thiarichey's avatar
thiarichey committed
229
we now have many more runtime errors, as well as unbound variable and function binding errors.
thiarichey's avatar
thiarichey committed
230

James R. Wilcox's avatar
James R. Wilcox committed
231
232
### The generating procedure

thiarichey's avatar
thiarichey committed
233
Let's begin with expressions. Expressions can be a lot of things, and we won't review them
thiarichey's avatar
thiarichey committed
234
all here (although you can see LANGUAGE in HW3 for that!). Instead, let's consider the subset of
thiarichey's avatar
thiarichey committed
235
expressions consisting of integer literals, boolean literals, variable reference expressions, the `nil` literal, and `if` expressions. One of these is not like the others: whereas the first four expression types are composed of literals or symbols, an `if` expression takes other expressions as arguments! In other words, it is a *recursive* expression.
thiarichey's avatar
thiarichey committed
236

thiarichey's avatar
thiarichey committed
237
In this case, we need our generator to generate more expressions, some of which will be recursive expressions and some of which will be literals or symbols. You may be able to envision a problem here: stack overflow! We choose a reasonably small maximum depth for expressions to address this, after which point we guarantee that a nonrecursive expression will be generated. We can think of the four aforementioned expression types as our base cases.
thiarichey's avatar
thiarichey committed
238
239
240
241

Bindings are similar to expressions, but simpler in that their arguments are nodes, symbols, or 
expressions, as opposed to other (non-expression) bindings.

thiarichey's avatar
thiarichey committed
242
Our general approach, then, is this: we generate bindings, which generate expressions, which eventually terminate. This results in only valid Trefoil V2 ASTs! If this were Trefoil V1, we would only be generating valid programs right now; recall that this was in fact the case in the previous Trefoil V1 fuzzer, `stackFuzz`. However, our Trefoil V2 programs are more complicated, and we cannot eliminate runtime errors even in programs with valid ASTs. Consider what happens in the following example:
thiarichey's avatar
thiarichey committed
243
244
245
246
247

  ```
(define x (+ 1 y))
  ```

thiarichey's avatar
thiarichey committed
248
When we try to look up `y` in our dynamic environment, it's not there, because we have not defined it! This represents an unbound variable error. What are some other types of errors you can think
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
of coming up in Trefoil V2 programs generated using the grammar fuzzer?

## Static environment fuzzer

Let's go one step further and try to eliminate unbound variable and function errors (like those we just saw!). We'll keep the basic structure of the previous fuzzer, but now we want to keep track of the static environment. On a high level, this means that, whenever we create a variable or function binding, we add information about that binding to the static environment. Then, when generating expressions that involve a variable or function binding (such as a `f` call or a `let` expression), we choose from existing variable or function bindings in the static environment rather than generating random (and statistically, *almost certainly* unbound) ones. We'll talk about this in more detail momentarily, but first...

Let's take a look at the numbers.

```
Paren:0 
Abstract:0 
Unbound Variables:0 
Unbound Functions:0 
Other Runtime:810486 
StackOverflows:900 
Programs:188614 
Total:1000000
```

Wow! Valid programs now account for *nearly 20%* of our results, up from 5% using our already pretty "smart" grammar fuzzer. Variable and function binding errors have been eliminated; nearly all of our remaining errors are unspecified runtime errors, but, interestingly, we also have more stack overflow errors. Think about why this might be.

### The generating procedure