Skip to content

Commit eb07b87

Browse files
authored
Merge pull request #2691 from rust-lang/tshepang/sembr
sembr fuzzing.md
2 parents e273da0 + 2877c1f commit eb07b87

File tree

3 files changed

+51
-78
lines changed

3 files changed

+51
-78
lines changed

.github/workflows/ci.yml

Lines changed: 1 addition & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@ on:
77
pull_request:
88
schedule:
99
# Run multiple times a day as the successfull cached links are not checked every time.
10-
- cron: '0 */8 * * *'
10+
- cron: "0 */8 * * *"
1111

1212
jobs:
1313
ci:
@@ -83,16 +83,6 @@ jobs:
8383
git commit -m "Deploy ${GITHUB_SHA} to gh-pages"
8484
git push --quiet -f "https://x-token:${{ secrets.GITHUB_TOKEN }}@github.com/${GITHUB_REPOSITORY}" HEAD:gh-pages
8585
86-
- name: Cache sembr build
87-
uses: actions/cache@v4
88-
with:
89-
path: |
90-
~/.cargo/registry/index/
91-
~/.cargo/registry/cache/
92-
~/.cargo/git/db/
93-
ci/sembr/target/
94-
key: sembr-${{ hashFiles('ci/sembr/Cargo.lock') }}
95-
9686
- name: Check if files comply with semantic line breaks
9787
continue-on-error: true
9888
run: |

ci/sembr/src/main.rs

Lines changed: 15 additions & 34 deletions
Original file line numberDiff line numberDiff line change
@@ -177,6 +177,9 @@ fn lengthen_lines(content: &str, limit: usize) -> String {
177177
let Some(next_line) = content.get(n + 1) else {
178178
continue;
179179
};
180+
if next_line.trim_start().starts_with("```") {
181+
continue;
182+
}
180183
if ignore(next_line, in_code_block)
181184
|| REGEX_LIST_ENTRY.is_match(next_line)
182185
|| REGEX_IGNORE_END.is_match(line)
@@ -255,6 +258,12 @@ preserve next line
255258
256259
preserve next line
257260
* three
261+
262+
do not mess with code block chars
263+
```
264+
leave the
265+
text alone
266+
```
258267
";
259268
let expected = "\
260269
do not split short sentences
@@ -269,6 +278,12 @@ preserve next line
269278
270279
preserve next line
271280
* three
281+
282+
do not mess with code block chars
283+
```
284+
leave the
285+
text alone
286+
```
272287
";
273288
assert_eq!(expected, lengthen_lines(original, 50));
274289
}
@@ -294,40 +309,6 @@ fn test_prettify_ignore_link_targets() {
294309
assert_eq!(original, lengthen_lines(original, 100));
295310
}
296311

297-
#[test]
298-
fn test_sembr_then_prettify() {
299-
let original = "
300-
hi there. do
301-
not split
302-
short sentences.
303-
hi again.
304-
";
305-
let expected = "
306-
hi there.
307-
do
308-
not split
309-
short sentences.
310-
hi again.
311-
";
312-
let processed = comply(original);
313-
assert_eq!(expected, processed);
314-
let expected = "
315-
hi there.
316-
do not split
317-
short sentences.
318-
hi again.
319-
";
320-
let processed = lengthen_lines(&processed, 50);
321-
assert_eq!(expected, processed);
322-
let expected = "
323-
hi there.
324-
do not split short sentences.
325-
hi again.
326-
";
327-
let processed = lengthen_lines(&processed, 50);
328-
assert_eq!(expected, processed);
329-
}
330-
331312
#[test]
332313
fn test_sembr_question_mark() {
333314
let original = "

src/fuzzing.md

Lines changed: 35 additions & 33 deletions
Original file line numberDiff line numberDiff line change
@@ -3,12 +3,13 @@
33
<!-- date-check: Mar 2023 -->
44

55
For the purposes of this guide, *fuzzing* is any testing methodology that
6-
involves compiling a wide variety of programs in an attempt to uncover bugs in
7-
rustc. Fuzzing is often used to find internal compiler errors (ICEs). Fuzzing
8-
can be beneficial, because it can find bugs before users run into them and
6+
involves compiling a wide variety of programs in an attempt to uncover bugs in rustc.
7+
Fuzzing is often used to find internal compiler errors (ICEs).
8+
Fuzzing can be beneficial, because it can find bugs before users run into them and
99
provide small, self-contained programs that make the bug easier to track down.
1010
However, some common mistakes can reduce the helpfulness of fuzzing and end up
11-
making contributors' lives harder. To maximize your positive impact on the Rust
11+
making contributors' lives harder.
12+
To maximize your positive impact on the Rust
1213
project, please read this guide before reporting fuzzer-generated bugs!
1314

1415
## Guidelines
@@ -28,16 +29,14 @@ project, please read this guide before reporting fuzzer-generated bugs!
2829

2930
- Don't report lots of bugs that use internal features, including but not
3031
limited to `custom_mir`, `lang_items`, `no_core`, and `rustc_attrs`.
31-
- Don't seed your fuzzer with inputs that are known to crash rustc (details
32-
below).
32+
- Don't seed your fuzzer with inputs that are known to crash rustc (details below).
3333

3434
### Discussion
3535

3636
If you're not sure whether or not an ICE is a duplicate of one that's already
37-
been reported, please go ahead and report it and link to issues you think might
38-
be related. In general, ICEs on the same line but with different *query stacks*
39-
are usually distinct bugs. For example, [#109020][#109020] and [#109129][#109129]
40-
had similar error messages:
37+
been reported, please go ahead and report it and link to issues you think might be related.
38+
In general, ICEs on the same line but with different *query stacks* are usually distinct bugs.
39+
For example, [#109020][#109020] and [#109129][#109129] had similar error messages:
4140

4241
```
4342
error: internal compiler error: compiler/rustc_middle/src/ty/normalize_erasing_regions.rs:195:90: Failed to normalize <[closure@src/main.rs:36:25: 36:28] as std::ops::FnOnce<(Emplacable<()>,)>>::Output, maybe try to call `try_normalize_erasing_regions` instead
@@ -63,10 +62,10 @@ end of query stack
6362

6463
## Building a corpus
6564

66-
When building a corpus, be sure to avoid collecting tests that are already
67-
known to crash rustc. A fuzzer that is seeded with such tests is more likely to
68-
generate bugs with the same root cause, wasting everyone's time. The simplest
69-
way to avoid this is to loop over each file in the corpus, see if it causes an
65+
When building a corpus, be sure to avoid collecting tests that are already known to crash rustc.
66+
A fuzzer that is seeded with such tests is more likely to
67+
generate bugs with the same root cause, wasting everyone's time.
68+
The simplest way to avoid this is to loop over each file in the corpus, see if it causes an
7069
ICE, and remove it if so.
7170

7271
To build a corpus, you may want to use:
@@ -84,16 +83,16 @@ To build a corpus, you may want to use:
8483
Here are a few things you can do to help the Rust project after filing an ICE.
8584

8685
- [Bisect][bisect] the bug to figure out when it was introduced.
87-
If you find the regressing PR / commit, you can mark the issue with the label
88-
`S-has-bisection`. If not, consider applying `E-needs-bisection` instead.
86+
If you find the regressing PR / commit, you can mark the issue with the label `S-has-bisection`.
87+
If not, consider applying `E-needs-bisection` instead.
8988
- Fix "distractions": problems with the test case that don't contribute to
9089
triggering the ICE, such as syntax errors or borrow-checking errors
91-
- Minimize the test case (see below). If successful, you can label the
92-
issue with `S-has-mcve`. Otherwise, you can apply `E-needs-mcve`.
90+
- Minimize the test case (see below).
91+
If successful, you can label the issue with `S-has-mcve`.
92+
Otherwise, you can apply `E-needs-mcve`.
9393
- Add the minimal test case to the rust-lang/rust repo as a [crash test].
9494
While you're at it, consider including other "untracked" crashes in your PR.
95-
Please don't forget to mark all relevant issues with `S-bug-has-test` once
96-
your PR is merged.
95+
Please don't forget to mark all relevant issues with `S-bug-has-test` once your PR is merged.
9796

9897
See also [applying and removing labels][labeling].
9998

@@ -103,13 +102,14 @@ See also [applying and removing labels][labeling].
103102

104103
## Minimization
105104

106-
It is helpful to carefully *minimize* the fuzzer-generated input. When
107-
minimizing, be careful to preserve the original error, and avoid introducing
105+
It is helpful to carefully *minimize* the fuzzer-generated input.
106+
When minimizing, be careful to preserve the original error, and avoid introducing
108107
distracting problems such as syntax, type-checking, or borrow-checking errors.
109108

110-
There are some tools that can help with minimization. If you're not sure how
111-
to avoid introducing syntax, type-, and borrow-checking errors while using
112-
these tools, post both the complete and minimized test cases. Generally,
109+
There are some tools that can help with minimization.
110+
If you're not sure how to avoid introducing syntax, type-, and borrow-checking errors while using
111+
these tools, post both the complete and minimized test cases.
112+
Generally,
113113
*syntax-aware* tools give the best results in the least amount of time.
114114
[`treereduce-rust`][treereduce] and [picireny][picireny] are syntax-aware.
115115
[`halfempty`][halfempty] is not, but is generally a high-quality tool.
@@ -121,21 +121,23 @@ these tools, post both the complete and minimized test cases. Generally,
121121
## Effective fuzzing
122122

123123
When fuzzing rustc, you may want to avoid generating machine code, since this
124-
is mostly done by LLVM. Try `--emit=mir` instead.
124+
is mostly done by LLVM.
125+
Try `--emit=mir` instead.
125126

126-
A variety of compiler flags can uncover different issues. `-Zmir-opt-level=4`
127-
will turn on MIR optimization passes that are not run by default, potentially
128-
uncovering interesting bugs. `-Zvalidate-mir` can help uncover such bugs.
127+
A variety of compiler flags can uncover different issues.
128+
`-Zmir-opt-level=4` will turn on MIR optimization passes that are not run by default, potentially
129+
uncovering interesting bugs.
130+
`-Zvalidate-mir` can help uncover such bugs.
129131

130132
If you're fuzzing a compiler you built, you may want to build it with `-C
131-
target-cpu=native` or even PGO/BOLT to squeeze out a few more executions per
132-
second. Of course, it's best to try multiple build configurations and see
133+
target-cpu=native` or even PGO/BOLT to squeeze out a few more executions per second.
134+
Of course, it's best to try multiple build configurations and see
133135
what actually results in superior throughput.
134136

135137
You may want to build rustc from source with debug assertions to find
136138
additional bugs, though this is a trade-off: it can slow down fuzzing by
137-
requiring extra work for every execution. To enable debug assertions, add this
138-
to `bootstrap.toml` when compiling rustc:
139+
requiring extra work for every execution.
140+
To enable debug assertions, add this to `bootstrap.toml` when compiling rustc:
139141

140142
```toml
141143
[rust]

0 commit comments

Comments
 (0)