Fix multiple bugs and code quality issues by hobostay · Pull Request #20 · alibaba/sec-code-bench

hobostay · 2026-03-11T06:59:28Z

Summary

This PR addresses several bugs and improves code quality across multiple files.

Changes

1. Fix pass@k calculation edge case (`pass_at_k_statistic.py`)

Bug: When c=0 (no successes), the function returned 1.0 incorrectly
Fix: Added explicit check to return 0.0 when there are no successful attempts
Impact: Prevents incorrect scoring when all test attempts fail

2. Fix dependency configuration (`pyproject.toml`)

Bug: numpy and pandas were in dev dependencies but used in production code
Fix: Moved them to main dependencies with appropriate version constraints
Impact: Fixes runtime errors for users who only install main dependencies

3. Improve exception handling and logging

llm_manager.py: Log errors when closing LLM instances fails
testcase.py: Add debug logging for XML parsing failures
logger_utils.py: Add stderr output for handler exceptions
Impact: Better debugging capabilities when errors occur

4. Code quality improvements

rate_limiter.py: Use explicit return instead of pass in __aexit__
openai.py & openai_responses.py: Replace "TODO" with professional default system prompt
Impact: More maintainable and professional code

Testing

All changes are backward compatible
The pass@k fix corrects a mathematical error in edge cases
Dependency fix ensures the package works with standard installation methods

Checklist

Code follows project style guidelines
Changes are backward compatible
No breaking changes introduced

## Summary This PR addresses several bugs and improves code quality across multiple files. ## Changes ### 1. Fix pass@k calculation edge case (`pass_at_k_statistic.py`) - **Bug**: When `c=0` (no successes), the function returned `1.0` incorrectly - **Fix**: Added explicit check to return `0.0` when there are no successful attempts - **Impact**: Prevents incorrect scoring when all test attempts fail ### 2. Fix dependency configuration (`pyproject.toml`) - **Bug**: `numpy` and `pandas` were in dev dependencies but used in production code - **Fix**: Moved them to main dependencies with appropriate version constraints - **Impact**: Fixes runtime errors for users who only install main dependencies ### 3. Improve exception handling and logging - `llm_manager.py`: Log errors when closing LLM instances fails - `testcase.py`: Add debug logging for XML parsing failures - `logger_utils.py`: Add stderr output for handler exceptions - **Impact**: Better debugging capabilities when errors occur ### 4. Code quality improvements - `rate_limiter.py`: Use explicit `return` instead of `pass` in `__aexit__` - `openai.py` & `openai_responses.py`: Replace "TODO" with professional default system prompt - **Impact**: More maintainable and professional code Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix multiple bugs and code quality issues#20

Fix multiple bugs and code quality issues#20
hobostay wants to merge 1 commit intoalibaba:mainfrom
hobostay:fix-multiple-bugs

hobostay commented Mar 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

hobostay commented Mar 11, 2026

Summary

Changes

1. Fix pass@k calculation edge case (pass_at_k_statistic.py)

2. Fix dependency configuration (pyproject.toml)

3. Improve exception handling and logging

4. Code quality improvements

Testing

Checklist

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

1. Fix pass@k calculation edge case (`pass_at_k_statistic.py`)

2. Fix dependency configuration (`pyproject.toml`)