Conversation
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@cpp/src/mip_heuristics/mip_scaling_strategy.cu`:
- Around line 146-150: Thrust kernel launches in mip_scaling_strategy.cu (e.g.,
the thrust::transform that uses handle_ptr->get_thrust_policy() and
min_nonzero_bounds_abs_transform_t<f_t>{}) are missing explicit CUDA error
checks; after each Thrust call (all thrust::transform / thrust::copy /
thrust::fill invocations at the reported locations) add a CUDA
device-synchronization error check using the same pattern as CUB calls (e.g.,
wrap a cudaDeviceSynchronize() or thrust::detail::backend::cuda::sync call
followed by RAFT_CUDA_TRY or
CUDA_CHECK(cudaGetLastError())/CUDA_CHECK(cudaDeviceSynchronize()) ) so any
launch or runtime error is caught and propagated—apply this change to each
affected symbol/site in the file (the transform using
min_nonzero_bounds_abs_transform_t, other thrust::transform/copy/fill
invocations around lines reported) to ensure consistent error handling.
- Around line 300-309: Clamp the user-provided
row_scaling_k_min/row_scaling_k_max to a safe exponent range before calling
ldexp and before passing into the kernel: compute safe_k_min/safe_k_max by
clamping row_scaling_k_min/row_scaling_k_max to a predetermined safe exponent
window (e.g., the sensible defaults like -20..20 or the numeric limits for f_t's
exponent range), then use safe_k_min/safe_k_max when computing
min_scaling/max_scaling (instead of raw row_scaling_k_min/row_scaling_k_max),
include those safe values in the kernel lambda capture, and replace the per-row
k and required_k clamps with clamps against safe_k_min/safe_k_max so ldexp(1,k)
cannot underflow/overflow.
ℹ️ Review info
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (4)
cpp/src/mip_heuristics/diversity/diversity_manager.cucpp/src/mip_heuristics/diversity/diversity_manager.cuhcpp/src/mip_heuristics/mip_scaling_strategy.cucpp/src/mip_heuristics/mip_scaling_strategy.cuh
| thrust::transform(handle_ptr->get_thrust_policy(), | ||
| row_bounds_begin, | ||
| row_bounds_end, | ||
| row_min_nonzero_bounds.begin(), | ||
| min_nonzero_bounds_abs_transform_t<f_t>{}); |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
# First, let's look at the file structure and the specific lines mentioned
wc -l cpp/src/mip_heuristics/mip_scaling_strategy.cuRepository: NVIDIA/cuopt
Length of output: 106
🏁 Script executed:
# Read the section around lines 146-150 to see the context
sed -n '140,160p' cpp/src/mip_heuristics/mip_scaling_strategy.cuRepository: NVIDIA/cuopt
Length of output: 1229
🏁 Script executed:
# Search for all thrust:: launches in the file and check if they are followed by RAFT_CUDA_TRY
cat cpp/src/mip_heuristics/mip_scaling_strategy.cu | grep -n "thrust::" | head -20Repository: NVIDIA/cuopt
Length of output: 1476
🏁 Script executed:
# More detailed check: look for the pattern of thrust launches and nearby error checks
python3 <<'EOF'
import re
with open('cpp/src/mip_heuristics/mip_scaling_strategy.cu', 'r') as f:
lines = f.readlines()
# Find all thrust:: calls with context
results = []
for i, line in enumerate(lines):
if 'thrust::' in line and not line.strip().startswith('//'):
# Look at the next few lines to see if there's a RAFT_CUDA_TRY
context_start = max(0, i - 1)
context_end = min(len(lines), i + 5)
context = ''.join(lines[context_start:context_end])
# Check if RAFT_CUDA_TRY appears in the next few lines
has_check = 'RAFT_CUDA_TRY' in context
results.append({
'line': i + 1,
'code': line.strip(),
'has_check': has_check,
'context': context.strip()
})
# Print summary
print("Thrust launches found:")
print("=" * 80)
for r in results:
print(f"Line {r['line']}: {r['code'][:70]}")
print(f" Has error check: {r['has_check']}")
print()
EOFRepository: NVIDIA/cuopt
Length of output: 4849
🏁 Script executed:
# Also check what RAFT_CUDA_TRY pattern looks like in this file
grep -n "RAFT_CUDA_TRY" cpp/src/mip_heuristics/mip_scaling_strategy.cu | head -10Repository: NVIDIA/cuopt
Length of output: 626
🏁 Script executed:
# Check the specific ranges mentioned in the comment
echo "=== Line 146-150 ==="
sed -n '146,150p' cpp/src/mip_heuristics/mip_scaling_strategy.cu
echo ""
echo "=== Line 207-225 ==="
sed -n '207,225p' cpp/src/mip_heuristics/mip_scaling_strategy.cu
echo ""
echo "=== Line 320-321 ==="
sed -n '320,321p' cpp/src/mip_heuristics/mip_scaling_strategy.cu
echo ""
echo "=== Line 325-336 ==="
sed -n '325,336p' cpp/src/mip_heuristics/mip_scaling_strategy.cuRepository: NVIDIA/cuopt
Length of output: 1835
🏁 Script executed:
# Check whether this file uses a specific pattern - perhaps thrust operations are wrapped at a higher level
grep -B2 -A2 "thrust::transform\|thrust::upper_bound\|thrust::count\|thrust::fill" cpp/src/mip_heuristics/mip_scaling_strategy.cu | head -50Repository: NVIDIA/cuopt
Length of output: 2085
🏁 Script executed:
# Check if there's any wrapper or special handling in thrust_policy
find . -type f -name "*.hpp" -o -name "*.h" | xargs grep -l "thrust_policy\|get_thrust_policy" | head -5Repository: NVIDIA/cuopt
Length of output: 222
🏁 Script executed:
# Check the RAFT handle implementation to see if get_thrust_policy includes error checking
rg "get_thrust_policy|thrust_policy" -A 3 -B 1 --max-count=10Repository: NVIDIA/cuopt
Length of output: 50369
🏁 Script executed:
# Look for any error checking patterns after thrust operations in the broader codebase
rg "thrust::(transform|fill|upper_bound|count|transform_reduce)" -A 2 | grep -E "CUDA|cuda|error|check" | head -20Repository: NVIDIA/cuopt
Length of output: 1439
🏁 Script executed:
# Specifically check other scaling/heuristics files in the same directory for comparison
ls -la cpp/src/mip_heuristics/Repository: NVIDIA/cuopt
Length of output: 1531
🏁 Script executed:
# Check how other files in the same directory handle Thrust launches
find cpp/src/mip_heuristics -name "*.cu" -o -name "*.cuh" | xargs grep -l "thrust::" | head -3Repository: NVIDIA/cuopt
Length of output: 210
🏁 Script executed:
# Sample one other file to see if this is a systemic pattern
head -250 cpp/src/mip_heuristics/mip_scaling_strategy.cu | tail -50 | rg "thrust::" -A 1Repository: NVIDIA/cuopt
Length of output: 857
🏁 Script executed:
# More comprehensive check: do ANY thrust calls in this repo have post-call error checks?
rg "thrust::(transform|fill|upper_bound|count)" -A 1 | grep -B1 "RAFT_CUDA_TRY\|cudaPeekAtLastError" | head -20Repository: NVIDIA/cuopt
Length of output: 38
Add explicit CUDA error checks after Thrust launches.
CUB calls are checked with RAFT_CUDA_TRY, but Thrust kernel launches lack explicit error verification. This inconsistency can hide launch and runtime failures, weakening error propagation.
🔧 Minimal fix pattern
thrust::transform(handle_ptr->get_thrust_policy(),
row_bounds_begin,
row_bounds_end,
row_min_nonzero_bounds.begin(),
min_nonzero_bounds_abs_transform_t<f_t>{});
+ RAFT_CUDA_TRY(cudaPeekAtLastError());Apply the same pattern after each Thrust launch.
Affected locations: 146–150, 207–225, 320–321, 325–336, 331–336, 352–353, 374–387, 396–402, 462–468, 470–475, 476–481.
Per coding guidelines: "Every CUDA kernel launch and memory operation must have error checking with CUDA_CHECK or equivalent verification" and "Verify error propagation from CUDA to user-facing APIs is complete."
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| thrust::transform(handle_ptr->get_thrust_policy(), | |
| row_bounds_begin, | |
| row_bounds_end, | |
| row_min_nonzero_bounds.begin(), | |
| min_nonzero_bounds_abs_transform_t<f_t>{}); | |
| thrust::transform(handle_ptr->get_thrust_policy(), | |
| row_bounds_begin, | |
| row_bounds_end, | |
| row_min_nonzero_bounds.begin(), | |
| min_nonzero_bounds_abs_transform_t<f_t>{}); | |
| RAFT_CUDA_TRY(cudaPeekAtLastError()); |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@cpp/src/mip_heuristics/mip_scaling_strategy.cu` around lines 146 - 150,
Thrust kernel launches in mip_scaling_strategy.cu (e.g., the thrust::transform
that uses handle_ptr->get_thrust_policy() and
min_nonzero_bounds_abs_transform_t<f_t>{}) are missing explicit CUDA error
checks; after each Thrust call (all thrust::transform / thrust::copy /
thrust::fill invocations at the reported locations) add a CUDA
device-synchronization error check using the same pattern as CUB calls (e.g.,
wrap a cudaDeviceSynchronize() or thrust::detail::backend::cuda::sync call
followed by RAFT_CUDA_TRY or
CUDA_CHECK(cudaGetLastError())/CUDA_CHECK(cudaDeviceSynchronize()) ) so any
launch or runtime error is caught and propagated—apply this change to each
affected symbol/site in the file (the transform using
min_nonzero_bounds_abs_transform_t, other thrust::transform/copy/fill
invocations around lines reported) to ensure consistent error handling.
373d374 to
cfcf0ab
Compare
|
🔔 Hi @anandhkb, this pull request has had no activity for 7 days. Please update or let us know if it can be closed. Thank you! If this is an "epic" issue, then please add the "epic" label to this issue. |
This PR adds row scaling to MIP and uses PDLP scaling for both barrier and PDLP.
Summary by CodeRabbit
New Features
Bug Fixes
Chores