Merge pull request #2691 from rust-lang/tshepang/sembr

tshepang · web-flow · commit eb07b8716e2e · 2025-12-13T17:30:23.000+02:00
sembr fuzzing.md
diff --git a/.github/workflows/ci.yml b/.github/workflows/ci.yml
@@ -7,7 +7,7 @@ on:
   pull_request:
   schedule:
     # Run multiple times a day as the successfull cached links are not checked every time.
-    - cron: '0 */8 * * *'
+    - cron: "0 */8 * * *"
 
 jobs:
   ci:
@@ -83,16 +83,6 @@ jobs:
           git commit -m "Deploy ${GITHUB_SHA} to gh-pages"
           git push --quiet -f "https://x-token:${{ secrets.GITHUB_TOKEN }}@github.com/${GITHUB_REPOSITORY}" HEAD:gh-pages
 
-      - name: Cache sembr build
-        uses: actions/cache@v4
-        with:
-          path: |
-            ~/.cargo/registry/index/
-            ~/.cargo/registry/cache/
-            ~/.cargo/git/db/
-            ci/sembr/target/
-          key: sembr-${{ hashFiles('ci/sembr/Cargo.lock') }}
-
       - name: Check if files comply with semantic line breaks
         continue-on-error: true
         run: |
diff --git a/ci/sembr/src/main.rs b/ci/sembr/src/main.rs
@@ -177,6 +177,9 @@ fn lengthen_lines(content: &str, limit: usize) -> String {
         let Some(next_line) = content.get(n + 1) else {
             continue;
         };
+        if next_line.trim_start().starts_with("```") {
+            continue;
+        }
         if ignore(next_line, in_code_block)
             || REGEX_LIST_ENTRY.is_match(next_line)
             || REGEX_IGNORE_END.is_match(line)
@@ -255,6 +258,12 @@ preserve next line
 
 preserve next line
 * three
+
+do not mess with code block chars
+```
+leave the
+text alone
+```
 ";
     let expected = "\
 do not split short sentences
@@ -269,6 +278,12 @@ preserve next line
 
 preserve next line
 * three
+
+do not mess with code block chars
+```
+leave the
+text alone
+```
 ";
     assert_eq!(expected, lengthen_lines(original, 50));
 }
@@ -294,40 +309,6 @@ fn test_prettify_ignore_link_targets() {
     assert_eq!(original, lengthen_lines(original, 100));
 }
 
-#[test]
-fn test_sembr_then_prettify() {
-    let original = "
-hi there. do
-not split
-short sentences.
-hi again.
-";
-    let expected = "
-hi there.
-do
-not split
-short sentences.
-hi again.
-";
-    let processed = comply(original);
-    assert_eq!(expected, processed);
-    let expected = "
-hi there.
-do not split
-short sentences.
-hi again.
-";
-    let processed = lengthen_lines(&processed, 50);
-    assert_eq!(expected, processed);
-    let expected = "
-hi there.
-do not split short sentences.
-hi again.
-";
-    let processed = lengthen_lines(&processed, 50);
-    assert_eq!(expected, processed);
-}
-
 #[test]
 fn test_sembr_question_mark() {
     let original = "
diff --git a/src/fuzzing.md b/src/fuzzing.md
@@ -3,12 +3,13 @@
 <!-- date-check: Mar 2023 -->
 
 For the purposes of this guide, *fuzzing* is any testing methodology that
-involves compiling a wide variety of programs in an attempt to uncover bugs in
-rustc. Fuzzing is often used to find internal compiler errors (ICEs). Fuzzing
-can be beneficial, because it can find bugs before users run into them and
+involves compiling a wide variety of programs in an attempt to uncover bugs in rustc.
+Fuzzing is often used to find internal compiler errors (ICEs).
+Fuzzing can be beneficial, because it can find bugs before users run into them and
 provide small, self-contained programs that make the bug easier to track down.
 However, some common mistakes can reduce the helpfulness of fuzzing and end up
-making contributors' lives harder. To maximize your positive impact on the Rust
+making contributors' lives harder.
+To maximize your positive impact on the Rust
 project, please read this guide before reporting fuzzer-generated bugs!
 
 ## Guidelines
@@ -28,16 +29,14 @@ project, please read this guide before reporting fuzzer-generated bugs!
 
 - Don't report lots of bugs that use internal features, including but not
   limited to `custom_mir`, `lang_items`, `no_core`, and `rustc_attrs`.
-- Don't seed your fuzzer with inputs that are known to crash rustc (details
-  below).
+- Don't seed your fuzzer with inputs that are known to crash rustc (details below).
 
 ### Discussion
 
 If you're not sure whether or not an ICE is a duplicate of one that's already
-been reported, please go ahead and report it and link to issues you think might
-be related. In general, ICEs on the same line but with different *query stacks*
-are usually distinct bugs. For example, [#109020][#109020] and [#109129][#109129]
-had similar error messages:
+been reported, please go ahead and report it and link to issues you think might be related.
+In general, ICEs on the same line but with different *query stacks* are usually distinct bugs.
+For example, [#109020][#109020] and [#109129][#109129] had similar error messages:
 
 ```
 error: internal compiler error: compiler/rustc_middle/src/ty/normalize_erasing_regions.rs:195:90: Failed to normalize <[closure@src/main.rs:36:25: 36:28] as std::ops::FnOnce<(Emplacable<()>,)>>::Output, maybe try to call `try_normalize_erasing_regions` instead
@@ -63,10 +62,10 @@ end of query stack
 
 ## Building a corpus
 
-When building a corpus, be sure to avoid collecting tests that are already
-known to crash rustc. A fuzzer that is seeded with such tests is more likely to
-generate bugs with the same root cause, wasting everyone's time. The simplest
-way to avoid this is to loop over each file in the corpus, see if it causes an
+When building a corpus, be sure to avoid collecting tests that are already known to crash rustc.
+A fuzzer that is seeded with such tests is more likely to
+generate bugs with the same root cause, wasting everyone's time.
+The simplest way to avoid this is to loop over each file in the corpus, see if it causes an
 ICE, and remove it if so.
 
 To build a corpus, you may want to use:
@@ -84,16 +83,16 @@ To build a corpus, you may want to use:
 Here are a few things you can do to help the Rust project after filing an ICE.
 
 - [Bisect][bisect] the bug to figure out when it was introduced.
-  If you find the regressing PR / commit, you can mark the issue with the label
-  `S-has-bisection`. If not, consider applying `E-needs-bisection` instead.
+  If you find the regressing PR / commit, you can mark the issue with the label `S-has-bisection`.
+  If not, consider applying `E-needs-bisection` instead.
 - Fix "distractions": problems with the test case that don't contribute to
   triggering the ICE, such as syntax errors or borrow-checking errors
-- Minimize the test case (see below). If successful, you can label the
-  issue with `S-has-mcve`. Otherwise, you can apply `E-needs-mcve`.
+- Minimize the test case (see below).
+  If successful, you can label the issue with `S-has-mcve`.
+  Otherwise, you can apply `E-needs-mcve`.
 - Add the minimal test case to the rust-lang/rust repo as a [crash test].
   While you're at it, consider including other "untracked" crashes in your PR.
-  Please don't forget to mark all relevant issues with `S-bug-has-test` once
-  your PR is merged.
+  Please don't forget to mark all relevant issues with `S-bug-has-test` once your PR is merged.
 
 See also [applying and removing labels][labeling].
 
@@ -103,13 +102,14 @@ See also [applying and removing labels][labeling].
 
 ## Minimization
 
-It is helpful to carefully *minimize* the fuzzer-generated input. When
-minimizing, be careful to preserve the original error, and avoid introducing
+It is helpful to carefully *minimize* the fuzzer-generated input.
+When minimizing, be careful to preserve the original error, and avoid introducing
 distracting problems such as syntax, type-checking, or borrow-checking errors.
 
-There are some tools that can help with minimization. If you're not sure how
-to avoid introducing syntax, type-, and borrow-checking errors while using
-these tools, post both the complete and minimized test cases. Generally,
+There are some tools that can help with minimization.
+If you're not sure how to avoid introducing syntax, type-, and borrow-checking errors while using
+these tools, post both the complete and minimized test cases.
+Generally,
 *syntax-aware* tools give the best results in the least amount of time.
 [`treereduce-rust`][treereduce] and [picireny][picireny] are syntax-aware.
 [`halfempty`][halfempty] is not, but is generally a high-quality tool.
@@ -121,21 +121,23 @@ these tools, post both the complete and minimized test cases. Generally,
 ## Effective fuzzing
 
 When fuzzing rustc, you may want to avoid generating machine code, since this
-is mostly done by LLVM. Try `--emit=mir` instead.
+is mostly done by LLVM.
+Try `--emit=mir` instead.
 
-A variety of compiler flags can uncover different issues. `-Zmir-opt-level=4`
-will turn on MIR optimization passes that are not run by default, potentially
-uncovering interesting bugs. `-Zvalidate-mir` can help uncover such bugs.
+A variety of compiler flags can uncover different issues.
+`-Zmir-opt-level=4` will turn on MIR optimization passes that are not run by default, potentially
+uncovering interesting bugs.
+`-Zvalidate-mir` can help uncover such bugs.
 
 If you're fuzzing a compiler you built, you may want to build it with `-C
-target-cpu=native` or even PGO/BOLT to squeeze out a few more executions per
-second. Of course, it's best to try multiple build configurations and see
+target-cpu=native` or even PGO/BOLT to squeeze out a few more executions per second.
+Of course, it's best to try multiple build configurations and see
 what actually results in superior throughput.
 
 You may want to build rustc from source with debug assertions to find
 additional bugs, though this is a trade-off: it can slow down fuzzing by
-requiring extra work for every execution. To enable debug assertions, add this
-to `bootstrap.toml` when compiling rustc:
+requiring extra work for every execution.
+To enable debug assertions, add this to `bootstrap.toml` when compiling rustc:
 
 ```toml
 [rust]