Skip to content

feat: support passing multiple models to evaluate the same prompts across models#257

Open
danymarques wants to merge 4 commits intoangular:mainfrom
danymarques:feat/support-mutiple-models-for-same-prompt
Open

feat: support passing multiple models to evaluate the same prompts across models#257
danymarques wants to merge 4 commits intoangular:mainfrom
danymarques:feat/support-mutiple-models-for-same-prompt

Conversation

@danymarques
Copy link

The --model option now accepts multiple values, allowing users to
run the same evaluation against several models in a single command.
When multiple models are specified, each model's report name is
suffixed with the model name to avoid collisions.

Usage: pnpm run wcs eval --model=gemini-2.5-pro --model=claude-sonnet-4.5

Integrate @github/copilot as a new code generation runner. The runner
supports multiple models (Claude, Gemini, GPT), configures permissions
via .copilot/settings.json, and uses COPILOT.md for instructions.

Also add debug logging to BaseCliAgentRunner, controllable via the
CLI_RUNNER_DEBUG environment variable, to aid in troubleshooting
agent process execution.
…ross models

The --model option now accepts multiple values, allowing users to
run the same evaluation against several models in a single command.
When multiple models are specified, each model's report name is
suffixed with the model name to avoid collisions.

Usage: pnpm run wcs eval --model=gemini-2.5-pro --model=claude-sonnet-4-5
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant