Commit 08a2c858 authored by James R. Wilcox's avatar James R. Wilcox
Browse files

notes for token fuzzer

parent d3e3c531
......@@ -68,4 +68,59 @@ executed.
All in all, this fuzzer does not get very deep into the interpreter.
### The generating procedure
This fuzzer's generation code is pretty simple. We just generate random strings
of a particular length by uniformly selecting each character from a set of
"good" characters.
## Token fuzzer
This one is also similar in spirit to our Trefoil v1 token fuzzer. In Trefoil
v2, there are more tokens.
Let's take a look at the numbers.
```
Paren:356556
Abstract:464582
Unbound Variables:162126
Unbound Functions:11605
Other Runtime:844
StackOverflows:0
Programs:4287
Total:1000000
```
The distribution of outcomes is very different from before!
- 46% of results are abstract syntax errors. These primarily stem from using
built-in keywords like `let` as variable names, or, if the fuzzer gets lucky
and balances the parentheses, then it often passes the wrong number of
arguments. (As before, if your code follows the starter code and does not
check for keywords, you will instead see more "Unbound Variable" errors here,
which is fine too.)
- 36% of results are parenthesized syntax errors. These are due to unbalanced
parens. This outcome is more common with the token fuzzer than the character
fuzzer because the character fuzzer struggles to even generate a paren
character most of the time, so it doesn't even get to the point of not
balancing them.
- 16% of results are unbound variables. This result is much less common now not
because the fuzzer is smarter, but because the token distribution favors
generating built-in symbols over random variable names.
- 1% are unbound functions.
- 0.08% are "other runtime" errors, such as applying `+` to non-integers. We are
excited to see this error show up, as it indicates our fuzzer is starting to
get deeper into the interpreter! The character fuzzer has almost no chance of
generating these programs.
- 0.5% are valid programs that run to completion. Of these, 90% start with the
comment character :) The other 10% mostly start with an integer literal,
followed by a comment character. Not the most interesting programs in the
world.
### The generating procedure
This fuzzer generates input by generating a random sequence of 100 tokens. Each
token is drawn from a custom distribution that we made up because it seemed
reasonable. See the code documentation for `genTokenStringInto` for more
information, and feel free to play around with tweaking the token distribution!
Supports Markdown
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment