Scaling test by akifcorduk · Pull Request #906 · NVIDIA/cuopt

akifcorduk · 2026-02-25T07:33:43Z

This PR adds row scaling to MIP and uses PDLP scaling for both barrier and PDLP.

Summary by CodeRabbit

New Features
- MIP row-scaling enabled by default for improved numeric robustness.
- Solver returns partial results early when time limits are reached; time-limit enforcement is more proactive.
- New selectable MIP scaling configurations with logging of the selected scaling setup.
Bug Fixes
- Fixed inconsistent unscaling and feasibility handling to prevent mismatched solution states.
Chores
- Removed a deprecated MIP configuration option.
- Consolidated and replaced the scaling strategy; tests and build updated accordingly.

coderabbitai

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@cpp/src/mip_heuristics/mip_scaling_strategy.cu`:
- Around line 146-150: Thrust kernel launches in mip_scaling_strategy.cu (e.g.,
the thrust::transform that uses handle_ptr->get_thrust_policy() and
min_nonzero_bounds_abs_transform_t<f_t>{}) are missing explicit CUDA error
checks; after each Thrust call (all thrust::transform / thrust::copy /
thrust::fill invocations at the reported locations) add a CUDA
device-synchronization error check using the same pattern as CUB calls (e.g.,
wrap a cudaDeviceSynchronize() or thrust::detail::backend::cuda::sync call
followed by RAFT_CUDA_TRY or
CUDA_CHECK(cudaGetLastError())/CUDA_CHECK(cudaDeviceSynchronize()) ) so any
launch or runtime error is caught and propagated—apply this change to each
affected symbol/site in the file (the transform using
min_nonzero_bounds_abs_transform_t, other thrust::transform/copy/fill
invocations around lines reported) to ensure consistent error handling.
- Around line 300-309: Clamp the user-provided
row_scaling_k_min/row_scaling_k_max to a safe exponent range before calling
ldexp and before passing into the kernel: compute safe_k_min/safe_k_max by
clamping row_scaling_k_min/row_scaling_k_max to a predetermined safe exponent
window (e.g., the sensible defaults like -20..20 or the numeric limits for f_t's
exponent range), then use safe_k_min/safe_k_max when computing
min_scaling/max_scaling (instead of raw row_scaling_k_min/row_scaling_k_max),
include those safe values in the kernel lambda capture, and replace the per-row
k and required_k clamps with clamps against safe_k_min/safe_k_max so ldexp(1,k)
cannot underflow/overflow.

ℹ️ Review info

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 6b2a631 and e2b3aea.

📒 Files selected for processing (4)

cpp/src/mip_heuristics/diversity/diversity_manager.cu
cpp/src/mip_heuristics/diversity/diversity_manager.cuh
cpp/src/mip_heuristics/mip_scaling_strategy.cu
cpp/src/mip_heuristics/mip_scaling_strategy.cuh

coderabbitai · 2026-02-26T08:08:10Z

cpp/src/mip_heuristics/mip_scaling_strategy.cu

+  thrust::transform(handle_ptr->get_thrust_policy(),
+                    row_bounds_begin,
+                    row_bounds_end,
+                    row_min_nonzero_bounds.begin(),
+                    min_nonzero_bounds_abs_transform_t<f_t>{});


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

# First, let's look at the file structure and the specific lines mentioned wc -l cpp/src/mip_heuristics/mip_scaling_strategy.cu

Repository: NVIDIA/cuopt

Length of output: 106

🏁 Script executed:

# Read the section around lines 146-150 to see the context sed -n '140,160p' cpp/src/mip_heuristics/mip_scaling_strategy.cu

Repository: NVIDIA/cuopt

Length of output: 1229

🏁 Script executed:

# Search for all thrust:: launches in the file and check if they are followed by RAFT_CUDA_TRY cat cpp/src/mip_heuristics/mip_scaling_strategy.cu | grep -n "thrust::" | head -20

Repository: NVIDIA/cuopt

Length of output: 1476

🏁 Script executed:

# More detailed check: look for the pattern of thrust launches and nearby error checks python3 <<'EOF' import re with open('cpp/src/mip_heuristics/mip_scaling_strategy.cu', 'r') as f: lines = f.readlines() # Find all thrust:: calls with context results = [] for i, line in enumerate(lines): if 'thrust::' in line and not line.strip().startswith('//'): # Look at the next few lines to see if there's a RAFT_CUDA_TRY context_start = max(0, i - 1) context_end = min(len(lines), i + 5) context = ''.join(lines[context_start:context_end]) # Check if RAFT_CUDA_TRY appears in the next few lines has_check = 'RAFT_CUDA_TRY' in context results.append({ 'line': i + 1, 'code': line.strip(), 'has_check': has_check, 'context': context.strip() }) # Print summary print("Thrust launches found:") print("=" * 80) for r in results: print(f"Line {r['line']}: {r['code'][:70]}") print(f" Has error check: {r['has_check']}") print() EOF

Repository: NVIDIA/cuopt

Length of output: 4849

🏁 Script executed:

# Also check what RAFT_CUDA_TRY pattern looks like in this file grep -n "RAFT_CUDA_TRY" cpp/src/mip_heuristics/mip_scaling_strategy.cu | head -10

Repository: NVIDIA/cuopt

Length of output: 626

🏁 Script executed:

# Check the specific ranges mentioned in the comment echo "=== Line 146-150 ===" sed -n '146,150p' cpp/src/mip_heuristics/mip_scaling_strategy.cu echo "" echo "=== Line 207-225 ===" sed -n '207,225p' cpp/src/mip_heuristics/mip_scaling_strategy.cu echo "" echo "=== Line 320-321 ===" sed -n '320,321p' cpp/src/mip_heuristics/mip_scaling_strategy.cu echo "" echo "=== Line 325-336 ===" sed -n '325,336p' cpp/src/mip_heuristics/mip_scaling_strategy.cu

Repository: NVIDIA/cuopt

Length of output: 1835

🏁 Script executed:

# Check whether this file uses a specific pattern - perhaps thrust operations are wrapped at a higher level grep -B2 -A2 "thrust::transform\|thrust::upper_bound\|thrust::count\|thrust::fill" cpp/src/mip_heuristics/mip_scaling_strategy.cu | head -50

Repository: NVIDIA/cuopt

Length of output: 2085

🏁 Script executed:

# Check if there's any wrapper or special handling in thrust_policy find . -type f -name "*.hpp" -o -name "*.h" | xargs grep -l "thrust_policy\|get_thrust_policy" | head -5

Repository: NVIDIA/cuopt

Length of output: 222

🏁 Script executed:

# Check the RAFT handle implementation to see if get_thrust_policy includes error checking rg "get_thrust_policy|thrust_policy" -A 3 -B 1 --max-count=10

Repository: NVIDIA/cuopt

Length of output: 50369

🏁 Script executed:

# Look for any error checking patterns after thrust operations in the broader codebase rg "thrust::(transform|fill|upper_bound|count|transform_reduce)" -A 2 | grep -E "CUDA|cuda|error|check" | head -20

Repository: NVIDIA/cuopt

Length of output: 1439

🏁 Script executed:

# Specifically check other scaling/heuristics files in the same directory for comparison ls -la cpp/src/mip_heuristics/

Repository: NVIDIA/cuopt

Length of output: 1531

🏁 Script executed:

# Check how other files in the same directory handle Thrust launches find cpp/src/mip_heuristics -name "*.cu" -o -name "*.cuh" | xargs grep -l "thrust::" | head -3

Repository: NVIDIA/cuopt

Length of output: 210

🏁 Script executed:

# Sample one other file to see if this is a systemic pattern head -250 cpp/src/mip_heuristics/mip_scaling_strategy.cu | tail -50 | rg "thrust::" -A 1

Repository: NVIDIA/cuopt

Length of output: 857

🏁 Script executed:

# More comprehensive check: do ANY thrust calls in this repo have post-call error checks? rg "thrust::(transform|fill|upper_bound|count)" -A 1 | grep -B1 "RAFT_CUDA_TRY\|cudaPeekAtLastError" | head -20

Repository: NVIDIA/cuopt

Length of output: 38

Add explicit CUDA error checks after Thrust launches.

CUB calls are checked with RAFT_CUDA_TRY, but Thrust kernel launches lack explicit error verification. This inconsistency can hide launch and runtime failures, weakening error propagation.

🔧 Minimal fix pattern

thrust::transform(handle_ptr->get_thrust_policy(), row_bounds_begin, row_bounds_end, row_min_nonzero_bounds.begin(), min_nonzero_bounds_abs_transform_t<f_t>{}); + RAFT_CUDA_TRY(cudaPeekAtLastError());

Apply the same pattern after each Thrust launch.

Affected locations: 146–150, 207–225, 320–321, 325–336, 331–336, 352–353, 374–387, 396–402, 462–468, 470–475, 476–481.

Per coding guidelines: "Every CUDA kernel launch and memory operation must have error checking with CUDA_CHECK or equivalent verification" and "Verify error propagation from CUDA to user-facing APIs is complete."

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

thrust::transform(handle_ptr->get_thrust_policy(),

row_bounds_begin,

row_bounds_end,

row_min_nonzero_bounds.begin(),

min_nonzero_bounds_abs_transform_t<f_t>{});

thrust::transform(handle_ptr->get_thrust_policy(),

row_bounds_begin,

row_bounds_end,

row_min_nonzero_bounds.begin(),

min_nonzero_bounds_abs_transform_t<f_t>{});

RAFT_CUDA_TRY(cudaPeekAtLastError());

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@cpp/src/mip_heuristics/mip_scaling_strategy.cu` around lines 146 - 150, Thrust kernel launches in mip_scaling_strategy.cu (e.g., the thrust::transform that uses handle_ptr->get_thrust_policy() and min_nonzero_bounds_abs_transform_t<f_t>{}) are missing explicit CUDA error checks; after each Thrust call (all thrust::transform / thrust::copy / thrust::fill invocations at the reported locations) add a CUDA device-synchronization error check using the same pattern as CUB calls (e.g., wrap a cudaDeviceSynchronize() or thrust::detail::backend::cuda::sync call followed by RAFT_CUDA_TRY or CUDA_CHECK(cudaGetLastError())/CUDA_CHECK(cudaDeviceSynchronize()) ) so any launch or runtime error is caught and propagated—apply this change to each affected symbol/site in the file (the transform using min_nonzero_bounds_abs_transform_t, other thrust::transform/copy/fill invocations around lines reported) to ensure consistent error handling.

cpp/src/mip_heuristics/mip_scaling_strategy.cu

github-actions · 2026-03-18T09:23:59Z

🔔 Hi @anandhkb, this pull request has had no activity for 7 days. Please update or let us know if it can be closed. Thank you!

If this is an "epic" issue, then please add the "epic" label to this issue.
If it is a PR and not ready for review, then please convert this to draft.
If you just want to switch off this notification, then use the "skip inactivity reminder" label.

akifcorduk and others added 30 commits February 11, 2026 04:12

fix timer issues

b758b9f

disable jobserver flag when not actually using jobserver

0392a62

disable jobserver unless explicitely requested

ee54477

better workaround fix build

f876fc0

scaling test

4bbf743

Merge branch 'main' of github.com:NVIDIA/cuopt into fix_timer

1c02baa

add timers to right_looking_lu and refactoring the basis

7de08d2

remove timers from cuts

942de9c

convert lambda to function and remove unnecessary checks

0b944d1

fix thrust changes

71b7f2f

handle review comments

82b2d64

handle review comments

0c81173

revert scaling

8f93926

initial row scaling for mip

80e2650

fix pdlp issues and finalize lp scaling

f021ba3

move timer with inout parameters

609c578

fix merge conflicts

3d32acd

fix merge conflicts

0c54ecf

revert cmake comment

fc414e7

Merge branch 'fix_timer' into scaling_test

41f5c3e

root node scaling

aa0b795

sscaling off

1c03794

Merge branch 'main' of github.com:NVIDIA/cuopt into scaling_test

74ec555

correct mip gap computation

e849c70

wip

ea09141

with scaling

adfbf7d

make pdlp/barrier scaling default

92b165e

fix compule error

2b080cc

skip mip scaling

7a61d13

without skipping big M

0ee8344

without big M

6b2a631

anandhkb added this to the 26.04 milestone Feb 25, 2026

coderabbitai bot reviewed Feb 26, 2026

View reviewed changes

try with mip scaling

cfcf0ab

akifcorduk force-pushed the scaling_test branch from 373d374 to cfcf0ab Compare February 27, 2026 10:38

akifcorduk added 9 commits February 27, 2026 02:41

with assertS

1db603c

try without any lp scaling

0f39c06

Merge branch 'main' of github.com:NVIDIA/cuopt into scaling_test

9e2d2d1

Merge branch 'main' of github.com:NVIDIA/cuopt into scaling_test

4ecf8f9

scaling before presolve

ca5b07e

fix objective issues and pdlp solver mode

2f58450

with stable 3

071ec5d

fix issues solver mode 2

a1030a7

without scaling

5811ebb

akifcorduk force-pushed the scaling_test branch from 24ab358 to 5811ebb Compare March 5, 2026 13:59

akifcorduk added 11 commits March 5, 2026 07:46

lp only relative tolerance

8f8fb84

lp with abs tolerance

7ec1ef7

only relative tolerance

aaa6ef9

per constraint residual with relative tolerance

68f6ac3

per constraint, absolute tolerance

f536037

only relative tolerance with submip scaling

77c4d7a

with absolute tolerance of 1e-6 and per constraint

89bd164

mip scaling on

2d54c28

gcd scaling and less agressive

95e14aa

fix scaling changes

d52074e

test fixes

4915897

akifcorduk force-pushed the scaling_test branch from 856aace to 4915897 Compare March 8, 2026 13:12

Merge branch 'main' of github.com:NVIDIA/cuopt into scaling_test

701fd70

rgsl888prabhu changed the base branch from main to release/26.04 March 19, 2026 16:34

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Scaling test#906

Scaling test#906
akifcorduk wants to merge 65 commits intoNVIDIA:release/26.04from
akifcorduk:scaling_test

akifcorduk commented Feb 25, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Feb 26, 2026

Uh oh!

Uh oh!

github-actions bot commented Mar 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

akifcorduk commented Feb 25, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 26, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

github-actions bot commented Mar 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

akifcorduk commented Feb 25, 2026 •

edited by coderabbitai bot

Loading