Skip to content

[DONOTMERGE] Refactor root relaxation#976

Draft
rg20 wants to merge 20 commits intoNVIDIA:release/26.04from
rg20:root_relaxation
Draft

[DONOTMERGE] Refactor root relaxation#976
rg20 wants to merge 20 commits intoNVIDIA:release/26.04from
rg20:root_relaxation

Conversation

@rg20
Copy link
Contributor

@rg20 rg20 commented Mar 19, 2026

Description

Issue

Checklist

  • I am familiar with the Contributing Guidelines.
  • Testing
    • New or existing tests cover these changes
    • Added tests
    • Created an issue to follow-up
    • NA
  • Documentation
    • The documentation is up to date with these changes
    • Added new documentation
    • NA

@rg20 rg20 requested review from a team as code owners March 19, 2026 19:58
@copy-pr-bot
Copy link

copy-pr-bot bot commented Mar 19, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@rg20 rg20 marked this pull request as draft March 19, 2026 19:58
@coderabbitai
Copy link

coderabbitai bot commented Mar 19, 2026

📝 Walkthrough

Walkthrough

This pull request updates the cuOpt project from version 26.04 to 26.06 across metadata, build configurations, and dependencies. It introduces concurrent root LP solving infrastructure in the branch-and-bound solver with new threading and callback mechanisms.

Changes

Cohort / File(s) Summary
Version Metadata
.claude-plugin/marketplace.json, .cursor-plugin/plugin.json, VERSION, gemini-extension.json, skills/**/SKILL.md (16 files)
Version strings updated from 26.04.00 to 26.06.00 across plugin manifests, version files, and skill documentation.
Helm Chart Configuration
helmchart/cuopt-server/Chart.yaml, helmchart/cuopt-server/values.yaml
Chart and app versions bumped from 26.4.0 to 26.6.0; container image tag updated to 26.6.0.
GitHub Workflows
.github/workflows/build.yaml, .github/workflows/build_test_publish_images.yaml, .github/workflows/pr.yaml, .github/workflows/test.yaml, .github/workflows/trigger-breaking-change-alert.yaml
Updated shared workflow references from @release/26.04 to @main; container images updated from rapidsai/ci-conda:26.04-latest to 26.06-latest; RAPIDS_BRANCH updated to main.
Documentation & Installation
README.md, docs/cuopt/source/..., docs/cuopt/source/faq.rst, docs/cuopt/source/cuopt-python/routing/routing-example.ipynb
Installation instructions and version references updated from 26.04.* to 26.06.*; GitHub documentation URLs changed from release/26.04 to main branch.
Conda Environment Dependencies
conda/environments/all_cuda-129_arch-aarch64.yaml, conda/environments/all_cuda-129_arch-x86_64.yaml, conda/environments/all_cuda-131_arch-aarch64.yaml, conda/environments/all_cuda-131_arch-x86_64.yaml
RAPIDS component version pins (cudf, libraft-headers, librmm, pylibraft, rmm) updated from 26.4.* to 26.6.*.
Build & Dependency Manifests
dependencies.yaml, python/cuopt/pyproject.toml, python/libcuopt/pyproject.toml, python/cuopt_server/pyproject.toml, python/cuopt_self_hosted/pyproject.toml
RAPIDS dependency version constraints bumped from 26.4.* to 26.6.* across conda and pip specifications.
PDLP Solver Core
cpp/include/cuopt/linear_programming/pdlp/solver_settings.hpp, cpp/include/cuopt/linear_programming/pdlp/solver_solution.hpp, cpp/src/pdlp/pdlp.cu, cpp/src/pdlp/solve.cu, cpp/src/pdlp/solver_solution.cu
Added halt_set_by_caller flag to solver settings; added const overload for get_reduced_cost(); updated concurrent-halt signaling logic to respect caller-set flags.
Dual Simplex Root LP Infrastructure
cpp/src/dual_simplex/crossover.cpp, cpp/src/dual_simplex/phase2.cpp, cpp/src/dual_simplex/presolve.cpp, cpp/src/dual_simplex/simplex_solver_settings.hpp, cpp/src/dual_simplex/types.hpp, cpp/src/dual_simplex/solve.cpp
Added on_first_lp_solution_available callback; introduced root_relaxation_first_solution_t type for root solution communication; modified dual_push to accept precomputed CSR matrix; added float template instantiations; removed concurrent halt side-effect from Phase 2.
Branch-and-Bound Concurrency
cpp/src/branch_and_bound/branch_and_bound.hpp, cpp/src/branch_and_bound/branch_and_bound.cpp
Refactored constructor to accept mip_problem_ptr and pdlp_root_num_gpus; implemented run_concurrent_pdlp_and_barrier_with_crossover() for threaded root-LP solving; reworked root relaxation solution selection logic; changed original_problem_ from reference to value type.
MIP Heuristics Root LP
cpp/src/mip_heuristics/root_lp.cuh, cpp/src/mip_heuristics/root_lp.cu, cpp/src/mip_heuristics/CMakeLists.txt
New file pair implementing concurrent PDLP/Barrier root-LP solving with crush+crossover post-processing, supporting optional first-solution callbacks and winner-race semantics.
Diversity Manager & Problem Integration
cpp/src/mip_heuristics/diversity/diversity_manager.cuh, cpp/src/mip_heuristics/diversity/diversity_manager.cu, cpp/src/mip_heuristics/diversity/lns/rins.cu, cpp/src/mip_heuristics/diversity/recombiners/sub_mip.cuh, cpp/src/mip_heuristics/problem/problem.cuh, cpp/src/mip_heuristics/problem/problem.cu
Added on_first_lp_solution() callback and synchronization primitives (first_solution_cv_, first_solution_mutex_, first_solution_ready_) to diversity manager; removed set_root_relaxation_solution_callback infrastructure; updated B&B construction to pass problem pointer and GPU count; modified sub-MIP setup to reconstruct user problems from fixed problems.
Main Solver Orchestration
cpp/src/mip_heuristics/solver.cu
Reworked B&B initialization to construct fresh user problems from problem pointers; updated callback wiring to invoke dm.on_first_lp_solution(...) via solver settings; removed old set_root_relaxation_solution_callback assignment.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

🚥 Pre-merge checks | ✅ 1 | ❌ 2

❌ Failed checks (2 warnings)

Check name Status Explanation Resolution
Description check ⚠️ Warning The pull request description is essentially a blank template with no actual description content provided by the author—only empty sections and unchecked checklist items are present. Provide a substantive description explaining the refactoring changes, the motivation for moving root solves into B&B, the new concurrent solver coordination mechanisms, and any breaking API changes or migration guidance.
Docstring Coverage ⚠️ Warning Docstring coverage is 5.71% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (1 passed)
Check name Status Explanation
Title check ✅ Passed The title '[DONOTMERGE] Refactor root relaxation' is directly related to the changeset's main objective of refactoring the root relaxation solving mechanism within the branch-and-bound flow, as evidenced by the substantial changes to root LP solving infrastructure, solver coordination, and concurrent execution patterns.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
📝 Coding Plan
  • Generate coding plan for human review comments

Warning

There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure.

🔧 Ruff (0.15.6)
docs/cuopt/source/cuopt-python/routing/routing-example.ipynb

Unexpected end of JSON input

🔧 Trivy (0.69.3)

Failed to read Trivy output file: ENOENT: no such file or directory, open '/inmem/1258/nsjail-81b28a4d-21e8-4578-8d0c-cca266ee2205/merged/.trivy-output.json'


Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 7

🧹 Nitpick comments (2)
cpp/src/dual_simplex/crossover.cpp (1)

419-436: Consider hoisting allocation outside the loop for performance.

The delta_expanded vector is allocated and zero-initialized on every iteration inside the while (superbasic_list.size() > 0) loop. For problems with many superbasic variables and large n, this repeated O(n) allocation overhead could be significant.

The algorithm itself is correct and effectively exploits hyper-sparsity in delta_y_sparse.

♻️ Proposed optimization to avoid repeated allocation
+  std::vector<f_t> delta_expanded(n, 0.);
+  std::vector<i_t> touched_cols;  // track columns to clear
+  touched_cols.reserve(n);
+
   while (superbasic_list.size() > 0) {
     // ... existing code ...

     // delta_zN = -N^T delta_y
     std::vector<f_t> delta_zN(n - m);
-    std::vector<f_t> delta_expanded(n, 0.);
-    
+    touched_cols.clear();
+
     // Iterate directly over sparse delta_y instead of checking zeros
     for (i_t nnz_idx = 0; nnz_idx < delta_y_sparse.i.size(); ++nnz_idx) {
       const i_t row = delta_y_sparse.i[nnz_idx];
       const f_t val = delta_y_sparse.x[nnz_idx];
       
       // Accumulate contributions from this row to all columns
       const i_t row_start = Arow.row_start[row];
       const i_t row_end   = Arow.row_start[row + 1];
       for (i_t p = row_start; p < row_end; ++p) {
         const i_t col = Arow.j[p];
+        if (delta_expanded[col] == 0.) touched_cols.push_back(col);
         delta_expanded[col] += Arow.x[p] * val;
       }
     }
     for (i_t k = 0; k < n - m; ++k) {
       delta_zN[k] = -delta_expanded[nonbasic_list[k]];
     }
+    // Clear only touched elements for next iteration
+    for (i_t col : touched_cols) {
+      delta_expanded[col] = 0.;
+    }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cpp/src/dual_simplex/crossover.cpp` around lines 419 - 436, The
delta_expanded vector is being allocated and zero-initialized inside the `while
(superbasic_list.size() > 0)` loop causing O(n) allocation each iteration; move
the allocation of `delta_expanded` outside the loop (e.g., create it once before
the loop using its size `n`) and then reuse it each iteration by resetting its
contents (use `std::fill` or equivalent) before accumulating contributions from
`delta_y_sparse`. Keep the existing accumulation logic that iterates
`delta_y_sparse.i` and updates `delta_expanded[col]`, and then assign
`delta_zN[k] = -delta_expanded[nonbasic_list[k]]` as before; this preserves
behavior while removing repeated allocations.
cpp/include/cuopt/linear_programming/pdlp/solver_solution.hpp (1)

238-238: Optional: align Doxygen return type with the actual accessor type.

The get_reduced_cost() docs still say rmm::device_uvector<i_t>, but the method returns rmm::device_uvector<f_t> (both const and non-const overloads).

Suggested doc tweak
- * `@return` rmm::device_uvector<i_t> The device memory container for the reduced cost.
+ * `@return` rmm::device_uvector<f_t> The device memory container for the reduced cost.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cpp/include/cuopt/linear_programming/pdlp/solver_solution.hpp` at line 238,
The Doxygen return type for get_reduced_cost() is incorrect—update its comment
to match the actual accessor signature(s) which return rmm::device_uvector<f_t>
(both const and non-const overloads); locate get_reduced_cost() in
solver_solution.hpp and change any mention of rmm::device_uvector<i_t> to
rmm::device_uvector<f_t> so the docs and function signature are consistent.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@cpp/src/branch_and_bound/branch_and_bound.cpp`:
- Around line 3970-3974: The preprocessor guard name is misspelled: replace
occurrences of MIP_INSTANTIATION_FLOAT with the correct macro
MIP_INSTANTIATE_FLOAT so the float template instantiations are emitted;
specifically update the conditional around the explicit instantiation "template
class branch_and_bound_t<int, float>;" (and the analogous guard in the file
containing root LP CUDA instantiations) to use MIP_INSTANTIATE_FLOAT so float
specializations link correctly.

In `@cpp/src/dual_simplex/presolve.cpp`:
- Around line 1576-1632: The float template instantiations (templates for
convert_user_problem, convert_user_lp_with_guess, presolve, crush_/uncrush_* and
uncrush_solution) rely on hardcoded double-precision tolerances (e.g. 1e-13 and
1e-6 used in dependent row detection and multiple checks) and are untested
because CUOPT_INSTANTIATE_FLOAT is disabled; fix by making tolerances
precision-aware (use a scaled std::numeric_limits<f_t>::epsilon() or a
f_t-specific constant derived from epsilon for checks inside presolve and
dependent-row detection code paths) or explicitly document/keep float
instantiation disabled by setting CUOPT_INSTANTIATE_FLOAT=0 and adding a clear
TODO explaining why float paths are unsupported; update the instantiation block
and any tolerance uses referenced by presolve, dependent-row detection, and the
functions listed above to use the new precision-aware tolerance mechanism or
leave the instantiation guarded with a comment explaining the limitation.

In `@cpp/src/mip_heuristics/diversity/diversity_manager.cu`:
- Around line 839-865: The callback unconditionally overwrites shared LP state;
guard against stale PDLP/Barrier callbacks by checking and honoring the
simplex-won flag before updating: inside on_first_lp_solution(), while holding
relaxed_solution_mutex, early-return if ls.lp_optimal_exists is already true (or
if lp_optimal exists and the incoming result.user_objective is not strictly
better), otherwise perform the raft::copy into lp_optimal_solution and
lp_dual_optimal_solution, call set_new_user_bound(result.user_objective) and
then set ls.lp_optimal_exists = true; keep the existing
first_solution_ready_/first_solution_cv_ behavior but ensure the
relaxed_solution_mutex check prevents races between on_first_lp_solution, other
LP callbacks, and simplex completion.
- Around line 417-425: Remove the resetting of the first-solution latch inside
the waiting branch (do not call first_solution_ready_.store(false, ...) in the
run_solver() waiting path) so an earlier notification isn't erased; leave the
wait as just acquiring first_solution_mutex_ and calling
first_solution_cv_.wait(...). Additionally, ensure set_simplex_solution(...)
publishes the latch after stream sync by setting
first_solution_ready_.store(true, std::memory_order_release) while holding
first_solution_mutex_ (use std::lock_guard<std::mutex> on first_solution_mutex_)
and then calling first_solution_cv_.notify_all() so the heuristics thread is
reliably woken.

In `@cpp/src/mip_heuristics/root_lp.cu`:
- Around line 25-45: copy_lp_result_to_root_solution currently discards the LP
termination status; update it to also capture and propagate the solver status
from optimization_problem_solution_t (e.g. lp_result.get_status() or equivalent)
into root_relaxation_first_solution_t so callers (like
run_crush_crossover_and_maybe_win and on_first_lp_solution) can gate further
work. Specifically: add a status field to root_relaxation_first_solution_t (if
missing), set it from lp_result, and only materialize/copy
primal/dual/reduced_costs (or mark them invalid/empty) when the status indicates
a usable solution; otherwise leave vectors empty or flagged and still sync the
stream and set objective/iterations as appropriate. Ensure error/status
propagation follows CUDA-to-API mapping guidelines so downstream logic can
reject infeasible or errored LP results instead of proceeding.

In `@RAPIDS_BRANCH`:
- Line 1: The RAPIDS_BRANCH file currently points at the mutable "main" branch
which risks pulling moving-head changes into the release flow; update the branch
marker to a stable release branch or tag (for example use "release/26.04" or an
explicit immutable ref) by replacing the "main" entry in RAPIDS_BRANCH with the
chosen release branch or tag so builds remain reproducible.

In `@skills/cuopt-installation-api-python/SKILL.md`:
- Line 3: The documented skill version ("version: \"26.06.00\"") and the CUDA12
pip package pin ("cuopt-cu12==26.2.*") in SKILL.md are out of sync; update the
pip install pin to match the new skill version by replacing the
"cuopt-cu12==26.2.*" specifier with a matching 26.06.* (or 26.06.00-compatible)
pin so the CUDA 12 install command aligns with "26.06.00".

---

Nitpick comments:
In `@cpp/include/cuopt/linear_programming/pdlp/solver_solution.hpp`:
- Line 238: The Doxygen return type for get_reduced_cost() is incorrect—update
its comment to match the actual accessor signature(s) which return
rmm::device_uvector<f_t> (both const and non-const overloads); locate
get_reduced_cost() in solver_solution.hpp and change any mention of
rmm::device_uvector<i_t> to rmm::device_uvector<f_t> so the docs and function
signature are consistent.

In `@cpp/src/dual_simplex/crossover.cpp`:
- Around line 419-436: The delta_expanded vector is being allocated and
zero-initialized inside the `while (superbasic_list.size() > 0)` loop causing
O(n) allocation each iteration; move the allocation of `delta_expanded` outside
the loop (e.g., create it once before the loop using its size `n`) and then
reuse it each iteration by resetting its contents (use `std::fill` or
equivalent) before accumulating contributions from `delta_y_sparse`. Keep the
existing accumulation logic that iterates `delta_y_sparse.i` and updates
`delta_expanded[col]`, and then assign `delta_zN[k] =
-delta_expanded[nonbasic_list[k]]` as before; this preserves behavior while
removing repeated allocations.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: b263ed17-c270-4dc4-8917-5c584619d2c1

📥 Commits

Reviewing files that changed from the base of the PR and between 4d5f5e5 and c9e39d3.

📒 Files selected for processing (66)
  • .claude-plugin/marketplace.json
  • .cursor-plugin/plugin.json
  • .github/workflows/build.yaml
  • .github/workflows/build_test_publish_images.yaml
  • .github/workflows/pr.yaml
  • .github/workflows/test.yaml
  • .github/workflows/trigger-breaking-change-alert.yaml
  • RAPIDS_BRANCH
  • README.md
  • VERSION
  • conda/environments/all_cuda-129_arch-aarch64.yaml
  • conda/environments/all_cuda-129_arch-x86_64.yaml
  • conda/environments/all_cuda-131_arch-aarch64.yaml
  • conda/environments/all_cuda-131_arch-x86_64.yaml
  • cpp/include/cuopt/linear_programming/pdlp/solver_settings.hpp
  • cpp/include/cuopt/linear_programming/pdlp/solver_solution.hpp
  • cpp/src/branch_and_bound/branch_and_bound.cpp
  • cpp/src/branch_and_bound/branch_and_bound.hpp
  • cpp/src/dual_simplex/crossover.cpp
  • cpp/src/dual_simplex/phase2.cpp
  • cpp/src/dual_simplex/presolve.cpp
  • cpp/src/dual_simplex/simplex_solver_settings.hpp
  • cpp/src/dual_simplex/solve.cpp
  • cpp/src/dual_simplex/types.hpp
  • cpp/src/mip_heuristics/CMakeLists.txt
  • cpp/src/mip_heuristics/diversity/diversity_manager.cu
  • cpp/src/mip_heuristics/diversity/diversity_manager.cuh
  • cpp/src/mip_heuristics/diversity/lns/rins.cu
  • cpp/src/mip_heuristics/diversity/recombiners/sub_mip.cuh
  • cpp/src/mip_heuristics/problem/problem.cu
  • cpp/src/mip_heuristics/problem/problem.cuh
  • cpp/src/mip_heuristics/root_lp.cu
  • cpp/src/mip_heuristics/root_lp.cuh
  • cpp/src/mip_heuristics/solver.cu
  • cpp/src/pdlp/pdlp.cu
  • cpp/src/pdlp/solve.cu
  • cpp/src/pdlp/solver_solution.cu
  • dependencies.yaml
  • docs/cuopt/source/cuopt-python/routing/routing-example.ipynb
  • docs/cuopt/source/faq.rst
  • gemini-extension.json
  • helmchart/cuopt-server/Chart.yaml
  • helmchart/cuopt-server/values.yaml
  • python/cuopt/pyproject.toml
  • python/cuopt_self_hosted/pyproject.toml
  • python/cuopt_server/pyproject.toml
  • python/libcuopt/pyproject.toml
  • skills/cuopt-developer/SKILL.md
  • skills/cuopt-installation-api-c/SKILL.md
  • skills/cuopt-installation-api-python/SKILL.md
  • skills/cuopt-installation-common/SKILL.md
  • skills/cuopt-installation-developer/SKILL.md
  • skills/cuopt-lp-milp-api-c/SKILL.md
  • skills/cuopt-lp-milp-api-cli/SKILL.md
  • skills/cuopt-lp-milp-api-python/SKILL.md
  • skills/cuopt-qp-api-c/SKILL.md
  • skills/cuopt-qp-api-cli/SKILL.md
  • skills/cuopt-qp-api-python/SKILL.md
  • skills/cuopt-routing-api-python/SKILL.md
  • skills/cuopt-server-api-python/SKILL.md
  • skills/cuopt-server-common/SKILL.md
  • skills/cuopt-user-rules/SKILL.md
  • skills/lp-milp-formulation/SKILL.md
  • skills/qp-formulation/SKILL.md
  • skills/routing-formulation/SKILL.md
  • skills/skill-evolution/SKILL.md
💤 Files with no reviewable changes (2)
  • cpp/src/dual_simplex/phase2.cpp
  • cpp/src/mip_heuristics/problem/problem.cuh
👮 Files not reviewed due to content moderation or server errors (8)
  • cpp/src/mip_heuristics/CMakeLists.txt
  • cpp/include/cuopt/linear_programming/pdlp/solver_settings.hpp
  • cpp/src/pdlp/solve.cu
  • .github/workflows/trigger-breaking-change-alert.yaml
  • cpp/src/pdlp/pdlp.cu
  • .github/workflows/build.yaml
  • .github/workflows/test.yaml
  • cpp/src/dual_simplex/simplex_solver_settings.hpp

Comment on lines +3970 to +3974
#ifdef MIP_INSTANTIATION_FLOAT

template class branch_and_bound_t<int, float>;

#endif
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
rg -n 'MIP_INSTANTIAT(E|ION)_FLOAT|MIP_INSTANTIAT(E|ION)_DOUBLE|DUAL_SIMPLEX_INSTANTIATE_(FLOAT|DOUBLE)' cpp

Repository: NVIDIA/cuopt

Length of output: 13609


Fix the float instantiation guard name.

The file uses MIP_INSTANTIATION_FLOAT, but the correct macro name throughout the codebase is MIP_INSTANTIATE_FLOAT. This typo appears in both cpp/src/branch_and_bound/branch_and_bound.cpp and cpp/src/mip_heuristics/root_lp.cu; without the fix, float template specializations won't be emitted and float builds will fail at link time.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cpp/src/branch_and_bound/branch_and_bound.cpp` around lines 3970 - 3974, The
preprocessor guard name is misspelled: replace occurrences of
MIP_INSTANTIATION_FLOAT with the correct macro MIP_INSTANTIATE_FLOAT so the
float template instantiations are emitted; specifically update the conditional
around the explicit instantiation "template class branch_and_bound_t<int,
float>;" (and the analogous guard in the file containing root LP CUDA
instantiations) to use MIP_INSTANTIATE_FLOAT so float specializations link
correctly.

Comment on lines +1576 to +1632
#if CUOPT_INSTANTIATE_FLOAT

template void convert_user_problem<int, float>(
const user_problem_t<int, float>& user_problem,
const simplex_solver_settings_t<int, float>& settings,
lp_problem_t<int, float>& problem,
std::vector<int>& new_slacks,
dualize_info_t<int, float>& dualize_info);

template void convert_user_lp_with_guess<int, float>(
const user_problem_t<int, float>& user_problem,
const lp_solution_t<int, float>& initial_solution,
const std::vector<float>& initial_slack,
lp_problem_t<int, float>& lp,
lp_solution_t<int, float>& converted_solution);

template int presolve<int, float>(const lp_problem_t<int, float>& original,
const simplex_solver_settings_t<int, float>& settings,
lp_problem_t<int, float>& presolved,
presolve_info_t<int, float>& presolve_info);

template void crush_primal_solution<int, float>(const user_problem_t<int, float>& user_problem,
const lp_problem_t<int, float>& problem,
const std::vector<float>& user_solution,
const std::vector<int>& new_slacks,
std::vector<float>& solution);

template float crush_dual_solution<int, float>(const user_problem_t<int, float>& user_problem,
const lp_problem_t<int, float>& problem,
const std::vector<int>& new_slacks,
const std::vector<float>& user_y,
const std::vector<float>& user_z,
std::vector<float>& y,
std::vector<float>& z);

template void uncrush_primal_solution<int, float>(const user_problem_t<int, float>& user_problem,
const lp_problem_t<int, float>& problem,
const std::vector<float>& solution,
std::vector<float>& user_solution);

template void uncrush_dual_solution<int, float>(const user_problem_t<int, float>& user_problem,
const lp_problem_t<int, float>& problem,
const std::vector<float>& y,
const std::vector<float>& z,
std::vector<float>& user_y,
std::vector<float>& user_z);

template void uncrush_solution<int, float>(const presolve_info_t<int, float>& presolve_info,
const simplex_solver_settings_t<int, float>& settings,
const std::vector<float>& crushed_x,
const std::vector<float>& crushed_y,
const std::vector<float>& crushed_z,
std::vector<float>& uncrushed_x,
std::vector<float>& uncrushed_y,
std::vector<float>& uncrushed_z);

#endif
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Find where float instantiations might be called and check tolerance handling
rg -n --type=cpp -C3 'presolve<int,\s*float>|convert_user_problem<int,\s*float>' cpp/src/

Repository: NVIDIA/cuopt

Length of output: 1224


🏁 Script executed:

#!/bin/bash
# Find where these float instantiated functions are called
echo "=== Searching for calls to float-instantiated functions ==="
rg -n --type=cpp 'presolve<int,\s*float>|convert_user_problem<int,\s*float>|crush_primal_solution<int,\s*float>|crush_dual_solution<int,\s*float>' --no-filename cpp/src/ | grep -v "^template"

echo -e "\n=== Checking tolerance constants in presolve.cpp ==="
rg -n '1e-13|1e-6|1e-7|tolerance' cpp/src/dual_simplex/presolve.cpp | head -20

Repository: NVIDIA/cuopt

Length of output: 1231


🏁 Script executed:

#!/bin/bash
# Check if there's any conditional logic for float precision handling
echo "=== Looking for float-specific handling or guards ==="
rg -n --type=cpp -B2 -A2 'CUOPT_INSTANTIATE_FLOAT|float.*tolerance|precision' cpp/src/dual_simplex/presolve.cpp | head -40

echo -e "\n=== Check if float instantiations are guarded or conditionally used ==="
rg -n --type=cpp 'if.*float|INSTANTIATE.*FLOAT' cpp/src/ | head -20

Repository: NVIDIA/cuopt

Length of output: 1400


🏁 Script executed:

#!/bin/bash
# Find the actual usage of presolve in the codebase to understand calling patterns
echo "=== Finding where presolve function is called ==="
rg -n 'presolve\(' cpp/src/ --type=cpp | grep -v template | grep -v 'presolve\[' | head -20

echo -e "\n=== Check headers for documentation about float vs double ==="
find cpp/include -name "*.h" -o -name "*.hpp" | xargs rg -l 'float.*precision|tolerance' | head -5

Repository: NVIDIA/cuopt

Length of output: 1357


🏁 Script executed:

#!/bin/bash
# Find where CUOPT_INSTANTIATE_FLOAT is defined
echo "=== Where is CUOPT_INSTANTIATE_FLOAT defined? ==="
rg -n 'define.*CUOPT_INSTANTIATE_FLOAT|CUOPT_INSTANTIATE_FLOAT' cpp/ | head -20

echo -e "\n=== Check build configuration ==="
find . -name "CMakeLists.txt" -o -name "*.cmake" | xargs rg -l 'CUOPT_INSTANTIATE_FLOAT' 2>/dev/null

Repository: NVIDIA/cuopt

Length of output: 712


🏁 Script executed:

#!/bin/bash
# Look at solve.cpp context where presolve is called
echo "=== Context around presolve calls in solve.cpp ==="
sed -n '140,170p' cpp/src/dual_simplex/solve.cpp
echo "---"
sed -n '340,360p' cpp/src/dual_simplex/solve.cpp

Repository: NVIDIA/cuopt

Length of output: 2563


🏁 Script executed:

#!/bin/bash
# Check the actual template definition of presolve to understand how it handles types
echo "=== Presolve template definition signature ==="
sed -n '819,830p' cpp/src/dual_simplex/presolve.cpp

echo -e "\n=== Check if there are any float-specific tolerance adjustments ==="
rg -n 'std::is_same.*float|if constexpr.*float|enable_if.*float' cpp/src/dual_simplex/presolve.cpp

Repository: NVIDIA/cuopt

Length of output: 617


🏁 Script executed:

#!/bin/bash
# Check tolerance usage more carefully - are they derived from precision?
echo "=== Check if tolerances are derived from numeric traits ==="
rg -n 'numeric_limits|epsilon|std::numeric' cpp/src/dual_simplex/presolve.cpp

echo -e "\n=== Look at the actual values and how they're used ==="
sed -n '440,460p' cpp/src/dual_simplex/presolve.cpp
echo "---"
sed -n '490,505p' cpp/src/dual_simplex/presolve.cpp

Repository: NVIDIA/cuopt

Length of output: 1396


Float template instantiations use hardcoded tolerances unsuitable for single-precision arithmetic.

The float-instantiated functions (lines 1576–1632) use tolerances hardcoded for double precision: 1e-13 in dependent row detection (line 448) and 1e-6 in multiple checks (lines 496, 1233, 1243, 1299). These values were tuned assuming ~15 decimal digits of precision; float provides only ~7 decimal digits.

More critically, CUOPT_INSTANTIATE_FLOAT is currently set to 0 in cpp/include/cuopt/linear_programming/constants.h, meaning these instantiations are not compiled in default builds—this is untested aspirational code. If enabled in the future without addressing precision-aware tolerance tuning, float LP solves will fail on valid problems.

Either:

  1. Adjust tolerances using std::numeric_limits<f_t>::epsilon() or precision-aware constants (preferred for general float support), or
  2. Explicitly document and mark float support as disabled with a clear reason (if float paths are not intended).
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cpp/src/dual_simplex/presolve.cpp` around lines 1576 - 1632, The float
template instantiations (templates for convert_user_problem,
convert_user_lp_with_guess, presolve, crush_/uncrush_* and uncrush_solution)
rely on hardcoded double-precision tolerances (e.g. 1e-13 and 1e-6 used in
dependent row detection and multiple checks) and are untested because
CUOPT_INSTANTIATE_FLOAT is disabled; fix by making tolerances precision-aware
(use a scaled std::numeric_limits<f_t>::epsilon() or a f_t-specific constant
derived from epsilon for checks inside presolve and dependent-row detection code
paths) or explicitly document/keep float instantiation disabled by setting
CUOPT_INSTANTIATE_FLOAT=0 and adding a clear TODO explaining why float paths are
unsupported; update the instantiation block and any tolerance uses referenced by
presolve, dependent-row detection, and the functions listed above to use the new
precision-aware tolerance mechanism or leave the instantiation guarded with a
comment explaining the limitation.

Comment on lines 417 to +425
bool bb_thread_solution_exists = simplex_solution_exists.load();
if (bb_thread_solution_exists) {
ls.lp_optimal_exists = true;
} else if (branch_and_bound_ptr != nullptr &&
branch_and_bound_ptr->enable_concurrent_lp_root_solve()) {
// B&B drives root relaxation; wait for first solution (PDLP/Barrier or dual simplex)
first_solution_ready_.store(false, std::memory_order_release);
std::unique_lock<std::mutex> lock(first_solution_mutex_);
first_solution_cv_.wait(lock, [this]() { return first_solution_ready_.load(); });
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Don't reset the first-solution latch inside the waiting branch.

Line 423 can erase a notification that arrived before run_solver() reached this block, and this wait is never woken when dual simplex publishes the root solution via set_simplex_solution(). That leaves the heuristics thread blocked even though B&B already has a usable root LP solution.

Suggested fix
   } else if (branch_and_bound_ptr != nullptr &&
              branch_and_bound_ptr->enable_concurrent_lp_root_solve()) {
     // B&B drives root relaxation; wait for first solution (PDLP/Barrier or dual simplex)
-    first_solution_ready_.store(false, std::memory_order_release);
     std::unique_lock<std::mutex> lock(first_solution_mutex_);
     first_solution_cv_.wait(lock, [this]() { return first_solution_ready_.load(); });
     lock.unlock();
     clamp_within_var_bounds(lp_optimal_solution, problem_ptr, problem_ptr->handle_ptr);

Also publish the same latch from set_simplex_solution(...) after the stream sync:

{
  std::lock_guard<std::mutex> lock(first_solution_mutex_);
  first_solution_ready_.store(true, std::memory_order_release);
}
first_solution_cv_.notify_all();

As per coding guidelines, "Ensure race conditions are absent in multi-threaded server implementations; verify proper synchronization of shared state."

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cpp/src/mip_heuristics/diversity/diversity_manager.cu` around lines 417 -
425, Remove the resetting of the first-solution latch inside the waiting branch
(do not call first_solution_ready_.store(false, ...) in the run_solver() waiting
path) so an earlier notification isn't erased; leave the wait as just acquiring
first_solution_mutex_ and calling first_solution_cv_.wait(...). Additionally,
ensure set_simplex_solution(...) publishes the latch after stream sync by
setting first_solution_ready_.store(true, std::memory_order_release) while
holding first_solution_mutex_ (use std::lock_guard<std::mutex> on
first_solution_mutex_) and then calling first_solution_cv_.notify_all() so the
heuristics thread is reliably woken.

Comment on lines +839 to +865
template <typename i_t, typename f_t>
void diversity_manager_t<i_t, f_t>::on_first_lp_solution(
cuopt::linear_programming::dual_simplex::root_relaxation_first_solution_t<i_t, f_t> const& result)
{
{
std::lock_guard<std::mutex> lock(relaxed_solution_mutex);
cuopt_assert(result.primal.size() == lp_optimal_solution.size(),
"First LP solution primal size mismatch");
cuopt_assert(result.dual.size() == lp_dual_optimal_solution.size(),
"First LP solution dual size mismatch");
raft::copy(lp_optimal_solution.data(),
result.primal.data(),
result.primal.size(),
problem_ptr->handle_ptr->get_stream());
raft::copy(lp_dual_optimal_solution.data(),
result.dual.data(),
result.dual.size(),
problem_ptr->handle_ptr->get_stream());
problem_ptr->handle_ptr->sync_stream();
ls.lp_optimal_exists = true;
set_new_user_bound(result.user_objective);
}
{
std::lock_guard<std::mutex> lock(first_solution_mutex_);
first_solution_ready_.store(true, std::memory_order_release);
first_solution_cv_.notify_all();
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Ignore late PDLP/Barrier callbacks after simplex has already won.

on_first_lp_solution() unconditionally overwrites lp_optimal_solution and the reported bound. If dual simplex wins first, the first PDLP/Barrier thread to stop can still replace that state with a weaker or partially converged result.

Guard against stale callbacks
 void diversity_manager_t<i_t, f_t>::on_first_lp_solution(
   cuopt::linear_programming::dual_simplex::root_relaxation_first_solution_t<i_t, f_t> const& result)
 {
   {
     std::lock_guard<std::mutex> lock(relaxed_solution_mutex);
+    if (simplex_solution_exists.load(std::memory_order_acquire)) { return; }
     cuopt_assert(result.primal.size() == lp_optimal_solution.size(),
                  "First LP solution primal size mismatch");

As per coding guidelines, "Validate algorithm correctness in optimization logic: simplex pivots, branch-and-bound decisions, routing heuristics, and constraint/objective handling must produce correct results" and "Ensure race conditions are absent in multi-threaded server implementations; verify proper synchronization of shared state."

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cpp/src/mip_heuristics/diversity/diversity_manager.cu` around lines 839 -
865, The callback unconditionally overwrites shared LP state; guard against
stale PDLP/Barrier callbacks by checking and honoring the simplex-won flag
before updating: inside on_first_lp_solution(), while holding
relaxed_solution_mutex, early-return if ls.lp_optimal_exists is already true (or
if lp_optimal exists and the incoming result.user_objective is not strictly
better), otherwise perform the raft::copy into lp_optimal_solution and
lp_dual_optimal_solution, call set_new_user_bound(result.user_objective) and
then set ls.lp_optimal_exists = true; keep the existing
first_solution_ready_/first_solution_cv_ behavior but ensure the
relaxed_solution_mutex check prevents races between on_first_lp_solution, other
LP callbacks, and simplex completion.

Comment on lines +25 to +45
cuopt::linear_programming::dual_simplex::root_relaxation_first_solution_t<i_t, f_t>
copy_lp_result_to_root_solution(problem_t<i_t, f_t>* problem,
const optimization_problem_solution_t<i_t, f_t>& lp_result)
{
cuopt::linear_programming::dual_simplex::root_relaxation_first_solution_t<i_t, f_t> result;
auto stream = problem->handle_ptr->get_stream();
result.primal.resize(lp_result.get_primal_solution().size());
result.dual.resize(lp_result.get_dual_solution().size());
result.reduced_costs.resize(lp_result.get_reduced_cost().size());
raft::copy(
result.primal.data(), lp_result.get_primal_solution().data(), result.primal.size(), stream);
raft::copy(result.dual.data(), lp_result.get_dual_solution().data(), result.dual.size(), stream);
raft::copy(result.reduced_costs.data(),
lp_result.get_reduced_cost().data(),
result.reduced_costs.size(),
stream);
problem->handle_ptr->sync_stream();
result.objective = problem->get_solver_obj_from_user_obj(lp_result.get_objective_value());
result.user_objective = lp_result.get_objective_value();
result.iterations = lp_result.get_additional_termination_information().number_of_steps_taken;
return result;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Keep the LP termination status in this handoff.

solve_lp_with_method() returns a status, but this helper drops it and always materializes primal/dual/reduced_costs. That prevents callers from reproducing the old logic in diversity_manager.cu that rejected infeasible outputs, and it means run_crush_crossover_and_maybe_win() may drive crossover or callbacks from a non-usable result.

Return the termination status alongside the vectors and gate on_first_lp_solution / crossover on the statuses that actually guarantee a usable root solution. As per coding guidelines, "Validate algorithm correctness in optimization logic: simplex pivots, branch-and-bound decisions, routing heuristics, and constraint/objective handling must produce correct results" and "Verify error propagation from CUDA to user-facing APIs is complete; ensure CUDA errors are caught and mapped to meaningful user error codes."

Also applies to: 78-80

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cpp/src/mip_heuristics/root_lp.cu` around lines 25 - 45,
copy_lp_result_to_root_solution currently discards the LP termination status;
update it to also capture and propagate the solver status from
optimization_problem_solution_t (e.g. lp_result.get_status() or equivalent) into
root_relaxation_first_solution_t so callers (like
run_crush_crossover_and_maybe_win and on_first_lp_solution) can gate further
work. Specifically: add a status field to root_relaxation_first_solution_t (if
missing), set it from lp_result, and only materialize/copy
primal/dual/reduced_costs (or mark them invalid/empty) when the status indicates
a usable solution; otherwise leave vectors empty or flagged and still sync the
stream and set objective/iterations as appropriate. Ensure error/status
propagation follows CUDA-to-API mapping guidelines so downstream logic can
reject infeasible or errored LP results instead of proceeding.

@@ -1 +1 @@
release/26.04
main
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Avoid pointing release flow to main for this branch marker.
Using main here can pull moving-head workflow/config changes into a release line and make builds non-reproducible. Prefer pinning to the release branch/tag used for this train (e.g., release/26.04 or an explicit immutable ref).

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@RAPIDS_BRANCH` at line 1, The RAPIDS_BRANCH file currently points at the
mutable "main" branch which risks pulling moving-head changes into the release
flow; update the branch marker to a stable release branch or tag (for example
use "release/26.04" or an explicit immutable ref) by replacing the "main" entry
in RAPIDS_BRANCH with the chosen release branch or tag so builds remain
reproducible.

---
name: cuopt-installation-api-python
version: "26.04.00"
version: "26.06.00"
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Sync the documented package pin with the new skill version.

Line 3 declares 26.06.00, but the CUDA 12 pip command in this file still pins cuopt-cu12==26.2.* (Line 26). That mismatch can route users to an older release stream.

🛠️ Proposed doc fix
-  pip install --extra-index-url=https://pypi.nvidia.com 'cuopt-cu12==26.2.*'
+  pip install --extra-index-url=https://pypi.nvidia.com 'cuopt-cu12==26.6.*'
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@skills/cuopt-installation-api-python/SKILL.md` at line 3, The documented
skill version ("version: \"26.06.00\"") and the CUDA12 pip package pin
("cuopt-cu12==26.2.*") in SKILL.md are out of sync; update the pip install pin
to match the new skill version by replacing the "cuopt-cu12==26.2.*" specifier
with a matching 26.06.* (or 26.06.00-compatible) pin so the CUDA 12 install
command aligns with "26.06.00".

rg20 added 5 commits March 19, 2026 13:18
…Replace linear degree-bucket search with O(1) swap-with-last removal using col_pos/row_pos position arrays, and eliminate O(row_degree) pre-traversal in schur_complement via a persistent last_in_row[] array
Allocate buffers once before the superbasic loop and reset with std::fill
each iteration to avoid repeated O(n) allocations (PR NVIDIA#948 review).

Made-with: Cursor
@rg20 rg20 added the non-breaking Introduces a non-breaking change label Mar 20, 2026
@rg20 rg20 added the improvement Improves an existing functionality label Mar 20, 2026
@rg20 rg20 added this to the 26.04 milestone Mar 20, 2026

// First LP solution from either PDLP/Barrier or dual simplex; used to notify diversity manager
// without B&B depending on PDLP types.
template <typename i_t, typename f_t>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This does not belong in types.hpp. types.hpp is super low-level and meant to include only very common definitions used across files.

Please move this to another file.

std::function<void()> heuristic_preemption_callback;
std::function<void(std::vector<f_t>&, std::vector<f_t>&, f_t)> set_simplex_solution_callback;
// Called by B&B when first LP solution is available (PDLP/Barrier or dual simplex).
std::function<void(root_relaxation_first_solution_t<i_t, f_t> const&)>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does this differ from the existing set_simplex_solution_callback? Do we need both callbacks?

@@ -30,7 +30,7 @@ struct element_t {
f_t x; // coefficient value
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's not merge in changes from another PR. That makes things difficult to review.


#endif

#if CUOPT_INSTANTIATE_FLOAT
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add FLOAT instation in another PR? I think there is a ton of work to get the code to compile and work with float. I'm not sure we should do that in the same PR that we are changing the concurrent root solve.

@@ -331,6 +331,7 @@ void compute_dual_solution_from_basis(const lp_problem_t<i_t, f_t>& lp,

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's not include the dual push changes in this PR. This makes the PR difficult to review.

Copy link
Contributor

@chris-maes chris-maes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you separate out the other PRs?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

improvement Improves an existing functionality non-breaking Introduces a non-breaking change

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants