Skip to content

Fix multiple bugs and code quality issues#20

Open
hobostay wants to merge 1 commit intoalibaba:mainfrom
hobostay:fix-multiple-bugs
Open

Fix multiple bugs and code quality issues#20
hobostay wants to merge 1 commit intoalibaba:mainfrom
hobostay:fix-multiple-bugs

Conversation

@hobostay
Copy link

Summary

This PR addresses several bugs and improves code quality across multiple files.

Changes

1. Fix pass@k calculation edge case (pass_at_k_statistic.py)

  • Bug: When c=0 (no successes), the function returned 1.0 incorrectly
  • Fix: Added explicit check to return 0.0 when there are no successful attempts
  • Impact: Prevents incorrect scoring when all test attempts fail

2. Fix dependency configuration (pyproject.toml)

  • Bug: numpy and pandas were in dev dependencies but used in production code
  • Fix: Moved them to main dependencies with appropriate version constraints
  • Impact: Fixes runtime errors for users who only install main dependencies

3. Improve exception handling and logging

  • llm_manager.py: Log errors when closing LLM instances fails
  • testcase.py: Add debug logging for XML parsing failures
  • logger_utils.py: Add stderr output for handler exceptions
  • Impact: Better debugging capabilities when errors occur

4. Code quality improvements

  • rate_limiter.py: Use explicit return instead of pass in __aexit__
  • openai.py & openai_responses.py: Replace "TODO" with professional default system prompt
  • Impact: More maintainable and professional code

Testing

  • All changes are backward compatible
  • The pass@k fix corrects a mathematical error in edge cases
  • Dependency fix ensures the package works with standard installation methods

Checklist

  • Code follows project style guidelines
  • Changes are backward compatible
  • No breaking changes introduced

## Summary

This PR addresses several bugs and improves code quality across multiple files.

## Changes

### 1. Fix pass@k calculation edge case (`pass_at_k_statistic.py`)
- **Bug**: When `c=0` (no successes), the function returned `1.0` incorrectly
- **Fix**: Added explicit check to return `0.0` when there are no successful attempts
- **Impact**: Prevents incorrect scoring when all test attempts fail

### 2. Fix dependency configuration (`pyproject.toml`)
- **Bug**: `numpy` and `pandas` were in dev dependencies but used in production code
- **Fix**: Moved them to main dependencies with appropriate version constraints
- **Impact**: Fixes runtime errors for users who only install main dependencies

### 3. Improve exception handling and logging
- `llm_manager.py`: Log errors when closing LLM instances fails
- `testcase.py`: Add debug logging for XML parsing failures
- `logger_utils.py`: Add stderr output for handler exceptions
- **Impact**: Better debugging capabilities when errors occur

### 4. Code quality improvements
- `rate_limiter.py`: Use explicit `return` instead of `pass` in `__aexit__`
- `openai.py` & `openai_responses.py`: Replace "TODO" with professional default system prompt
- **Impact**: More maintainable and professional code

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant