Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions .github/copilot-instructions.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@ Purpose: Help agents operate the benchmark harness quickly. Stay repo-specific;
3. Prompts load from `suites/<suite>/prompts/<scenario>/<tier>-*.md`; absence logs a warning and skips the agent.
4. Scenario `validation.commands` run in install→test→lint→typecheck order via `runValidationCommands`, capturing exit codes and logs.
5. `buildDiffArtifacts` compares workspace vs fixture to populate `diff_summary` and package deltas.
6. `runEvaluators` applies current metrics, then `computeWeightedTotals` rescales weighted averages to a 0–10 score before writing `results/summary.json`.
6. `runEvaluators` applies current metrics, then `computeWeightedTotals` rescales weighted averages to a 0–10 score before saving to `benchmarks.db`.

### Evaluators & scoring (all in `packages/evaluators/src/evaluators/`)
- `InstallEvaluator`: expects an install command result with exit code 0.
Expand Down Expand Up @@ -43,7 +43,7 @@ npm -w packages/harness run build
node packages/harness/dist/cli.js run update-deps nx-pnpm-monorepo --tier L1 --agent claude-code --model sonnet --max-turns 15
```
- `npm -w packages/harness run dev` launches `tsc --watch` while editing the CLI.
- Each `run` overwrites `results/summary.json`; archive outputs manually if you need history.
- All benchmark results are stored in `benchmarks.db` with full history.

### Guardrails & troubleshooting
- Work inside the generated workspace (`results/workspaces/...`), not the repository root; evaluators read only that directory.
Expand Down
2 changes: 0 additions & 2 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -104,10 +104,8 @@ dist

.idea

results/summary.json
results/**/*
results/workspaces/**/*
results/summary.json

# Snapshot files
snapshot.json5
Loading
Loading