Skip to content

Scaling test#906

Open
akifcorduk wants to merge 65 commits intoNVIDIA:release/26.04from
akifcorduk:scaling_test
Open

Scaling test#906
akifcorduk wants to merge 65 commits intoNVIDIA:release/26.04from
akifcorduk:scaling_test

Conversation

@akifcorduk
Copy link
Contributor

@akifcorduk akifcorduk commented Feb 25, 2026

This PR adds row scaling to MIP and uses PDLP scaling for both barrier and PDLP.

Summary by CodeRabbit

  • New Features

    • MIP row-scaling enabled by default for improved numeric robustness.
    • Solver returns partial results early when time limits are reached; time-limit enforcement is more proactive.
    • New selectable MIP scaling configurations with logging of the selected scaling setup.
  • Bug Fixes

    • Fixed inconsistent unscaling and feasibility handling to prevent mismatched solution states.
  • Chores

    • Removed a deprecated MIP configuration option.
    • Consolidated and replaced the scaling strategy; tests and build updated accordingly.

@anandhkb anandhkb added this to the 26.04 milestone Feb 25, 2026
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@cpp/src/mip_heuristics/mip_scaling_strategy.cu`:
- Around line 146-150: Thrust kernel launches in mip_scaling_strategy.cu (e.g.,
the thrust::transform that uses handle_ptr->get_thrust_policy() and
min_nonzero_bounds_abs_transform_t<f_t>{}) are missing explicit CUDA error
checks; after each Thrust call (all thrust::transform / thrust::copy /
thrust::fill invocations at the reported locations) add a CUDA
device-synchronization error check using the same pattern as CUB calls (e.g.,
wrap a cudaDeviceSynchronize() or thrust::detail::backend::cuda::sync call
followed by RAFT_CUDA_TRY or
CUDA_CHECK(cudaGetLastError())/CUDA_CHECK(cudaDeviceSynchronize()) ) so any
launch or runtime error is caught and propagated—apply this change to each
affected symbol/site in the file (the transform using
min_nonzero_bounds_abs_transform_t, other thrust::transform/copy/fill
invocations around lines reported) to ensure consistent error handling.
- Around line 300-309: Clamp the user-provided
row_scaling_k_min/row_scaling_k_max to a safe exponent range before calling
ldexp and before passing into the kernel: compute safe_k_min/safe_k_max by
clamping row_scaling_k_min/row_scaling_k_max to a predetermined safe exponent
window (e.g., the sensible defaults like -20..20 or the numeric limits for f_t's
exponent range), then use safe_k_min/safe_k_max when computing
min_scaling/max_scaling (instead of raw row_scaling_k_min/row_scaling_k_max),
include those safe values in the kernel lambda capture, and replace the per-row
k and required_k clamps with clamps against safe_k_min/safe_k_max so ldexp(1,k)
cannot underflow/overflow.

ℹ️ Review info

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 6b2a631 and e2b3aea.

📒 Files selected for processing (4)
  • cpp/src/mip_heuristics/diversity/diversity_manager.cu
  • cpp/src/mip_heuristics/diversity/diversity_manager.cuh
  • cpp/src/mip_heuristics/mip_scaling_strategy.cu
  • cpp/src/mip_heuristics/mip_scaling_strategy.cuh

Comment on lines +146 to +150
thrust::transform(handle_ptr->get_thrust_policy(),
row_bounds_begin,
row_bounds_end,
row_min_nonzero_bounds.begin(),
min_nonzero_bounds_abs_transform_t<f_t>{});
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

# First, let's look at the file structure and the specific lines mentioned
wc -l cpp/src/mip_heuristics/mip_scaling_strategy.cu

Repository: NVIDIA/cuopt

Length of output: 106


🏁 Script executed:

# Read the section around lines 146-150 to see the context
sed -n '140,160p' cpp/src/mip_heuristics/mip_scaling_strategy.cu

Repository: NVIDIA/cuopt

Length of output: 1229


🏁 Script executed:

# Search for all thrust:: launches in the file and check if they are followed by RAFT_CUDA_TRY
cat cpp/src/mip_heuristics/mip_scaling_strategy.cu | grep -n "thrust::" | head -20

Repository: NVIDIA/cuopt

Length of output: 1476


🏁 Script executed:

# More detailed check: look for the pattern of thrust launches and nearby error checks
python3 <<'EOF'
import re

with open('cpp/src/mip_heuristics/mip_scaling_strategy.cu', 'r') as f:
    lines = f.readlines()

# Find all thrust:: calls with context
results = []
for i, line in enumerate(lines):
    if 'thrust::' in line and not line.strip().startswith('//'):
        # Look at the next few lines to see if there's a RAFT_CUDA_TRY
        context_start = max(0, i - 1)
        context_end = min(len(lines), i + 5)
        context = ''.join(lines[context_start:context_end])
        
        # Check if RAFT_CUDA_TRY appears in the next few lines
        has_check = 'RAFT_CUDA_TRY' in context
        results.append({
            'line': i + 1,
            'code': line.strip(),
            'has_check': has_check,
            'context': context.strip()
        })

# Print summary
print("Thrust launches found:")
print("=" * 80)
for r in results:
    print(f"Line {r['line']}: {r['code'][:70]}")
    print(f"  Has error check: {r['has_check']}")
    print()
EOF

Repository: NVIDIA/cuopt

Length of output: 4849


🏁 Script executed:

# Also check what RAFT_CUDA_TRY pattern looks like in this file
grep -n "RAFT_CUDA_TRY" cpp/src/mip_heuristics/mip_scaling_strategy.cu | head -10

Repository: NVIDIA/cuopt

Length of output: 626


🏁 Script executed:

# Check the specific ranges mentioned in the comment
echo "=== Line 146-150 ==="
sed -n '146,150p' cpp/src/mip_heuristics/mip_scaling_strategy.cu

echo ""
echo "=== Line 207-225 ==="
sed -n '207,225p' cpp/src/mip_heuristics/mip_scaling_strategy.cu

echo ""
echo "=== Line 320-321 ==="
sed -n '320,321p' cpp/src/mip_heuristics/mip_scaling_strategy.cu

echo ""
echo "=== Line 325-336 ==="
sed -n '325,336p' cpp/src/mip_heuristics/mip_scaling_strategy.cu

Repository: NVIDIA/cuopt

Length of output: 1835


🏁 Script executed:

# Check whether this file uses a specific pattern - perhaps thrust operations are wrapped at a higher level
grep -B2 -A2 "thrust::transform\|thrust::upper_bound\|thrust::count\|thrust::fill" cpp/src/mip_heuristics/mip_scaling_strategy.cu | head -50

Repository: NVIDIA/cuopt

Length of output: 2085


🏁 Script executed:

# Check if there's any wrapper or special handling in thrust_policy
find . -type f -name "*.hpp" -o -name "*.h" | xargs grep -l "thrust_policy\|get_thrust_policy" | head -5

Repository: NVIDIA/cuopt

Length of output: 222


🏁 Script executed:

# Check the RAFT handle implementation to see if get_thrust_policy includes error checking
rg "get_thrust_policy|thrust_policy" -A 3 -B 1 --max-count=10

Repository: NVIDIA/cuopt

Length of output: 50369


🏁 Script executed:

# Look for any error checking patterns after thrust operations in the broader codebase
rg "thrust::(transform|fill|upper_bound|count|transform_reduce)" -A 2 | grep -E "CUDA|cuda|error|check" | head -20

Repository: NVIDIA/cuopt

Length of output: 1439


🏁 Script executed:

# Specifically check other scaling/heuristics files in the same directory for comparison
ls -la cpp/src/mip_heuristics/

Repository: NVIDIA/cuopt

Length of output: 1531


🏁 Script executed:

# Check how other files in the same directory handle Thrust launches
find cpp/src/mip_heuristics -name "*.cu" -o -name "*.cuh" | xargs grep -l "thrust::" | head -3

Repository: NVIDIA/cuopt

Length of output: 210


🏁 Script executed:

# Sample one other file to see if this is a systemic pattern
head -250 cpp/src/mip_heuristics/mip_scaling_strategy.cu | tail -50 | rg "thrust::" -A 1

Repository: NVIDIA/cuopt

Length of output: 857


🏁 Script executed:

# More comprehensive check: do ANY thrust calls in this repo have post-call error checks?
rg "thrust::(transform|fill|upper_bound|count)" -A 1 | grep -B1 "RAFT_CUDA_TRY\|cudaPeekAtLastError" | head -20

Repository: NVIDIA/cuopt

Length of output: 38


Add explicit CUDA error checks after Thrust launches.

CUB calls are checked with RAFT_CUDA_TRY, but Thrust kernel launches lack explicit error verification. This inconsistency can hide launch and runtime failures, weakening error propagation.

🔧 Minimal fix pattern
   thrust::transform(handle_ptr->get_thrust_policy(),
                     row_bounds_begin,
                     row_bounds_end,
                     row_min_nonzero_bounds.begin(),
                     min_nonzero_bounds_abs_transform_t<f_t>{});
+  RAFT_CUDA_TRY(cudaPeekAtLastError());

Apply the same pattern after each Thrust launch.

Affected locations: 146–150, 207–225, 320–321, 325–336, 331–336, 352–353, 374–387, 396–402, 462–468, 470–475, 476–481.

Per coding guidelines: "Every CUDA kernel launch and memory operation must have error checking with CUDA_CHECK or equivalent verification" and "Verify error propagation from CUDA to user-facing APIs is complete."

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
thrust::transform(handle_ptr->get_thrust_policy(),
row_bounds_begin,
row_bounds_end,
row_min_nonzero_bounds.begin(),
min_nonzero_bounds_abs_transform_t<f_t>{});
thrust::transform(handle_ptr->get_thrust_policy(),
row_bounds_begin,
row_bounds_end,
row_min_nonzero_bounds.begin(),
min_nonzero_bounds_abs_transform_t<f_t>{});
RAFT_CUDA_TRY(cudaPeekAtLastError());
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cpp/src/mip_heuristics/mip_scaling_strategy.cu` around lines 146 - 150,
Thrust kernel launches in mip_scaling_strategy.cu (e.g., the thrust::transform
that uses handle_ptr->get_thrust_policy() and
min_nonzero_bounds_abs_transform_t<f_t>{}) are missing explicit CUDA error
checks; after each Thrust call (all thrust::transform / thrust::copy /
thrust::fill invocations at the reported locations) add a CUDA
device-synchronization error check using the same pattern as CUB calls (e.g.,
wrap a cudaDeviceSynchronize() or thrust::detail::backend::cuda::sync call
followed by RAFT_CUDA_TRY or
CUDA_CHECK(cudaGetLastError())/CUDA_CHECK(cudaDeviceSynchronize()) ) so any
launch or runtime error is caught and propagated—apply this change to each
affected symbol/site in the file (the transform using
min_nonzero_bounds_abs_transform_t, other thrust::transform/copy/fill
invocations around lines reported) to ensure consistent error handling.

@github-actions
Copy link

🔔 Hi @anandhkb, this pull request has had no activity for 7 days. Please update or let us know if it can be closed. Thank you!

If this is an "epic" issue, then please add the "epic" label to this issue.
If it is a PR and not ready for review, then please convert this to draft.
If you just want to switch off this notification, then use the "skip inactivity reminder" label.

@rgsl888prabhu rgsl888prabhu changed the base branch from main to release/26.04 March 19, 2026 16:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

improvement Improves an existing functionality non-breaking Introduces a non-breaking change

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants