Fix multiple bugs and code quality issues#20
Open
hobostay wants to merge 1 commit intoalibaba:mainfrom
Open
Conversation
## Summary This PR addresses several bugs and improves code quality across multiple files. ## Changes ### 1. Fix pass@k calculation edge case (`pass_at_k_statistic.py`) - **Bug**: When `c=0` (no successes), the function returned `1.0` incorrectly - **Fix**: Added explicit check to return `0.0` when there are no successful attempts - **Impact**: Prevents incorrect scoring when all test attempts fail ### 2. Fix dependency configuration (`pyproject.toml`) - **Bug**: `numpy` and `pandas` were in dev dependencies but used in production code - **Fix**: Moved them to main dependencies with appropriate version constraints - **Impact**: Fixes runtime errors for users who only install main dependencies ### 3. Improve exception handling and logging - `llm_manager.py`: Log errors when closing LLM instances fails - `testcase.py`: Add debug logging for XML parsing failures - `logger_utils.py`: Add stderr output for handler exceptions - **Impact**: Better debugging capabilities when errors occur ### 4. Code quality improvements - `rate_limiter.py`: Use explicit `return` instead of `pass` in `__aexit__` - `openai.py` & `openai_responses.py`: Replace "TODO" with professional default system prompt - **Impact**: More maintainable and professional code Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
This PR addresses several bugs and improves code quality across multiple files.
Changes
1. Fix pass@k calculation edge case (
pass_at_k_statistic.py)c=0(no successes), the function returned1.0incorrectly0.0when there are no successful attempts2. Fix dependency configuration (
pyproject.toml)numpyandpandaswere in dev dependencies but used in production code3. Improve exception handling and logging
llm_manager.py: Log errors when closing LLM instances failstestcase.py: Add debug logging for XML parsing failureslogger_utils.py: Add stderr output for handler exceptions4. Code quality improvements
rate_limiter.py: Use explicitreturninstead ofpassin__aexit__openai.py&openai_responses.py: Replace "TODO" with professional default system promptTesting
Checklist