[DONOTMERGE] Refactor root relaxation#976
[DONOTMERGE] Refactor root relaxation#976rg20 wants to merge 20 commits intoNVIDIA:release/26.04from
Conversation
…or Barrier+crossover
This reverts commit 4d5f5e5.
This PR updates the repository to version 26.06. This is part of the 26.04 release burndown process.
📝 WalkthroughWalkthroughThis pull request updates the cuOpt project from version 26.04 to 26.06 across metadata, build configurations, and dependencies. It introduces concurrent root LP solving infrastructure in the branch-and-bound solver with new threading and callback mechanisms. Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes 🚥 Pre-merge checks | ✅ 1 | ❌ 2❌ Failed checks (2 warnings)
✅ Passed checks (1 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
📝 Coding Plan
Warning There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure. 🔧 Ruff (0.15.6)docs/cuopt/source/cuopt-python/routing/routing-example.ipynbUnexpected end of JSON input 🔧 Trivy (0.69.3)Failed to read Trivy output file: ENOENT: no such file or directory, open '/inmem/1258/nsjail-81b28a4d-21e8-4578-8d0c-cca266ee2205/merged/.trivy-output.json' Comment |
There was a problem hiding this comment.
Actionable comments posted: 7
🧹 Nitpick comments (2)
cpp/src/dual_simplex/crossover.cpp (1)
419-436: Consider hoisting allocation outside the loop for performance.The
delta_expandedvector is allocated and zero-initialized on every iteration inside thewhile (superbasic_list.size() > 0)loop. For problems with many superbasic variables and largen, this repeated O(n) allocation overhead could be significant.The algorithm itself is correct and effectively exploits hyper-sparsity in
delta_y_sparse.♻️ Proposed optimization to avoid repeated allocation
+ std::vector<f_t> delta_expanded(n, 0.); + std::vector<i_t> touched_cols; // track columns to clear + touched_cols.reserve(n); + while (superbasic_list.size() > 0) { // ... existing code ... // delta_zN = -N^T delta_y std::vector<f_t> delta_zN(n - m); - std::vector<f_t> delta_expanded(n, 0.); - + touched_cols.clear(); + // Iterate directly over sparse delta_y instead of checking zeros for (i_t nnz_idx = 0; nnz_idx < delta_y_sparse.i.size(); ++nnz_idx) { const i_t row = delta_y_sparse.i[nnz_idx]; const f_t val = delta_y_sparse.x[nnz_idx]; // Accumulate contributions from this row to all columns const i_t row_start = Arow.row_start[row]; const i_t row_end = Arow.row_start[row + 1]; for (i_t p = row_start; p < row_end; ++p) { const i_t col = Arow.j[p]; + if (delta_expanded[col] == 0.) touched_cols.push_back(col); delta_expanded[col] += Arow.x[p] * val; } } for (i_t k = 0; k < n - m; ++k) { delta_zN[k] = -delta_expanded[nonbasic_list[k]]; } + // Clear only touched elements for next iteration + for (i_t col : touched_cols) { + delta_expanded[col] = 0.; + }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@cpp/src/dual_simplex/crossover.cpp` around lines 419 - 436, The delta_expanded vector is being allocated and zero-initialized inside the `while (superbasic_list.size() > 0)` loop causing O(n) allocation each iteration; move the allocation of `delta_expanded` outside the loop (e.g., create it once before the loop using its size `n`) and then reuse it each iteration by resetting its contents (use `std::fill` or equivalent) before accumulating contributions from `delta_y_sparse`. Keep the existing accumulation logic that iterates `delta_y_sparse.i` and updates `delta_expanded[col]`, and then assign `delta_zN[k] = -delta_expanded[nonbasic_list[k]]` as before; this preserves behavior while removing repeated allocations.cpp/include/cuopt/linear_programming/pdlp/solver_solution.hpp (1)
238-238: Optional: align Doxygen return type with the actual accessor type.The
get_reduced_cost()docs still sayrmm::device_uvector<i_t>, but the method returnsrmm::device_uvector<f_t>(both const and non-const overloads).Suggested doc tweak
- * `@return` rmm::device_uvector<i_t> The device memory container for the reduced cost. + * `@return` rmm::device_uvector<f_t> The device memory container for the reduced cost.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@cpp/include/cuopt/linear_programming/pdlp/solver_solution.hpp` at line 238, The Doxygen return type for get_reduced_cost() is incorrect—update its comment to match the actual accessor signature(s) which return rmm::device_uvector<f_t> (both const and non-const overloads); locate get_reduced_cost() in solver_solution.hpp and change any mention of rmm::device_uvector<i_t> to rmm::device_uvector<f_t> so the docs and function signature are consistent.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@cpp/src/branch_and_bound/branch_and_bound.cpp`:
- Around line 3970-3974: The preprocessor guard name is misspelled: replace
occurrences of MIP_INSTANTIATION_FLOAT with the correct macro
MIP_INSTANTIATE_FLOAT so the float template instantiations are emitted;
specifically update the conditional around the explicit instantiation "template
class branch_and_bound_t<int, float>;" (and the analogous guard in the file
containing root LP CUDA instantiations) to use MIP_INSTANTIATE_FLOAT so float
specializations link correctly.
In `@cpp/src/dual_simplex/presolve.cpp`:
- Around line 1576-1632: The float template instantiations (templates for
convert_user_problem, convert_user_lp_with_guess, presolve, crush_/uncrush_* and
uncrush_solution) rely on hardcoded double-precision tolerances (e.g. 1e-13 and
1e-6 used in dependent row detection and multiple checks) and are untested
because CUOPT_INSTANTIATE_FLOAT is disabled; fix by making tolerances
precision-aware (use a scaled std::numeric_limits<f_t>::epsilon() or a
f_t-specific constant derived from epsilon for checks inside presolve and
dependent-row detection code paths) or explicitly document/keep float
instantiation disabled by setting CUOPT_INSTANTIATE_FLOAT=0 and adding a clear
TODO explaining why float paths are unsupported; update the instantiation block
and any tolerance uses referenced by presolve, dependent-row detection, and the
functions listed above to use the new precision-aware tolerance mechanism or
leave the instantiation guarded with a comment explaining the limitation.
In `@cpp/src/mip_heuristics/diversity/diversity_manager.cu`:
- Around line 839-865: The callback unconditionally overwrites shared LP state;
guard against stale PDLP/Barrier callbacks by checking and honoring the
simplex-won flag before updating: inside on_first_lp_solution(), while holding
relaxed_solution_mutex, early-return if ls.lp_optimal_exists is already true (or
if lp_optimal exists and the incoming result.user_objective is not strictly
better), otherwise perform the raft::copy into lp_optimal_solution and
lp_dual_optimal_solution, call set_new_user_bound(result.user_objective) and
then set ls.lp_optimal_exists = true; keep the existing
first_solution_ready_/first_solution_cv_ behavior but ensure the
relaxed_solution_mutex check prevents races between on_first_lp_solution, other
LP callbacks, and simplex completion.
- Around line 417-425: Remove the resetting of the first-solution latch inside
the waiting branch (do not call first_solution_ready_.store(false, ...) in the
run_solver() waiting path) so an earlier notification isn't erased; leave the
wait as just acquiring first_solution_mutex_ and calling
first_solution_cv_.wait(...). Additionally, ensure set_simplex_solution(...)
publishes the latch after stream sync by setting
first_solution_ready_.store(true, std::memory_order_release) while holding
first_solution_mutex_ (use std::lock_guard<std::mutex> on first_solution_mutex_)
and then calling first_solution_cv_.notify_all() so the heuristics thread is
reliably woken.
In `@cpp/src/mip_heuristics/root_lp.cu`:
- Around line 25-45: copy_lp_result_to_root_solution currently discards the LP
termination status; update it to also capture and propagate the solver status
from optimization_problem_solution_t (e.g. lp_result.get_status() or equivalent)
into root_relaxation_first_solution_t so callers (like
run_crush_crossover_and_maybe_win and on_first_lp_solution) can gate further
work. Specifically: add a status field to root_relaxation_first_solution_t (if
missing), set it from lp_result, and only materialize/copy
primal/dual/reduced_costs (or mark them invalid/empty) when the status indicates
a usable solution; otherwise leave vectors empty or flagged and still sync the
stream and set objective/iterations as appropriate. Ensure error/status
propagation follows CUDA-to-API mapping guidelines so downstream logic can
reject infeasible or errored LP results instead of proceeding.
In `@RAPIDS_BRANCH`:
- Line 1: The RAPIDS_BRANCH file currently points at the mutable "main" branch
which risks pulling moving-head changes into the release flow; update the branch
marker to a stable release branch or tag (for example use "release/26.04" or an
explicit immutable ref) by replacing the "main" entry in RAPIDS_BRANCH with the
chosen release branch or tag so builds remain reproducible.
In `@skills/cuopt-installation-api-python/SKILL.md`:
- Line 3: The documented skill version ("version: \"26.06.00\"") and the CUDA12
pip package pin ("cuopt-cu12==26.2.*") in SKILL.md are out of sync; update the
pip install pin to match the new skill version by replacing the
"cuopt-cu12==26.2.*" specifier with a matching 26.06.* (or 26.06.00-compatible)
pin so the CUDA 12 install command aligns with "26.06.00".
---
Nitpick comments:
In `@cpp/include/cuopt/linear_programming/pdlp/solver_solution.hpp`:
- Line 238: The Doxygen return type for get_reduced_cost() is incorrect—update
its comment to match the actual accessor signature(s) which return
rmm::device_uvector<f_t> (both const and non-const overloads); locate
get_reduced_cost() in solver_solution.hpp and change any mention of
rmm::device_uvector<i_t> to rmm::device_uvector<f_t> so the docs and function
signature are consistent.
In `@cpp/src/dual_simplex/crossover.cpp`:
- Around line 419-436: The delta_expanded vector is being allocated and
zero-initialized inside the `while (superbasic_list.size() > 0)` loop causing
O(n) allocation each iteration; move the allocation of `delta_expanded` outside
the loop (e.g., create it once before the loop using its size `n`) and then
reuse it each iteration by resetting its contents (use `std::fill` or
equivalent) before accumulating contributions from `delta_y_sparse`. Keep the
existing accumulation logic that iterates `delta_y_sparse.i` and updates
`delta_expanded[col]`, and then assign `delta_zN[k] =
-delta_expanded[nonbasic_list[k]]` as before; this preserves behavior while
removing repeated allocations.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: b263ed17-c270-4dc4-8917-5c584619d2c1
📒 Files selected for processing (66)
.claude-plugin/marketplace.json.cursor-plugin/plugin.json.github/workflows/build.yaml.github/workflows/build_test_publish_images.yaml.github/workflows/pr.yaml.github/workflows/test.yaml.github/workflows/trigger-breaking-change-alert.yamlRAPIDS_BRANCHREADME.mdVERSIONconda/environments/all_cuda-129_arch-aarch64.yamlconda/environments/all_cuda-129_arch-x86_64.yamlconda/environments/all_cuda-131_arch-aarch64.yamlconda/environments/all_cuda-131_arch-x86_64.yamlcpp/include/cuopt/linear_programming/pdlp/solver_settings.hppcpp/include/cuopt/linear_programming/pdlp/solver_solution.hppcpp/src/branch_and_bound/branch_and_bound.cppcpp/src/branch_and_bound/branch_and_bound.hppcpp/src/dual_simplex/crossover.cppcpp/src/dual_simplex/phase2.cppcpp/src/dual_simplex/presolve.cppcpp/src/dual_simplex/simplex_solver_settings.hppcpp/src/dual_simplex/solve.cppcpp/src/dual_simplex/types.hppcpp/src/mip_heuristics/CMakeLists.txtcpp/src/mip_heuristics/diversity/diversity_manager.cucpp/src/mip_heuristics/diversity/diversity_manager.cuhcpp/src/mip_heuristics/diversity/lns/rins.cucpp/src/mip_heuristics/diversity/recombiners/sub_mip.cuhcpp/src/mip_heuristics/problem/problem.cucpp/src/mip_heuristics/problem/problem.cuhcpp/src/mip_heuristics/root_lp.cucpp/src/mip_heuristics/root_lp.cuhcpp/src/mip_heuristics/solver.cucpp/src/pdlp/pdlp.cucpp/src/pdlp/solve.cucpp/src/pdlp/solver_solution.cudependencies.yamldocs/cuopt/source/cuopt-python/routing/routing-example.ipynbdocs/cuopt/source/faq.rstgemini-extension.jsonhelmchart/cuopt-server/Chart.yamlhelmchart/cuopt-server/values.yamlpython/cuopt/pyproject.tomlpython/cuopt_self_hosted/pyproject.tomlpython/cuopt_server/pyproject.tomlpython/libcuopt/pyproject.tomlskills/cuopt-developer/SKILL.mdskills/cuopt-installation-api-c/SKILL.mdskills/cuopt-installation-api-python/SKILL.mdskills/cuopt-installation-common/SKILL.mdskills/cuopt-installation-developer/SKILL.mdskills/cuopt-lp-milp-api-c/SKILL.mdskills/cuopt-lp-milp-api-cli/SKILL.mdskills/cuopt-lp-milp-api-python/SKILL.mdskills/cuopt-qp-api-c/SKILL.mdskills/cuopt-qp-api-cli/SKILL.mdskills/cuopt-qp-api-python/SKILL.mdskills/cuopt-routing-api-python/SKILL.mdskills/cuopt-server-api-python/SKILL.mdskills/cuopt-server-common/SKILL.mdskills/cuopt-user-rules/SKILL.mdskills/lp-milp-formulation/SKILL.mdskills/qp-formulation/SKILL.mdskills/routing-formulation/SKILL.mdskills/skill-evolution/SKILL.md
💤 Files with no reviewable changes (2)
- cpp/src/dual_simplex/phase2.cpp
- cpp/src/mip_heuristics/problem/problem.cuh
👮 Files not reviewed due to content moderation or server errors (8)
- cpp/src/mip_heuristics/CMakeLists.txt
- cpp/include/cuopt/linear_programming/pdlp/solver_settings.hpp
- cpp/src/pdlp/solve.cu
- .github/workflows/trigger-breaking-change-alert.yaml
- cpp/src/pdlp/pdlp.cu
- .github/workflows/build.yaml
- .github/workflows/test.yaml
- cpp/src/dual_simplex/simplex_solver_settings.hpp
| #ifdef MIP_INSTANTIATION_FLOAT | ||
|
|
||
| template class branch_and_bound_t<int, float>; | ||
|
|
||
| #endif |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
rg -n 'MIP_INSTANTIAT(E|ION)_FLOAT|MIP_INSTANTIAT(E|ION)_DOUBLE|DUAL_SIMPLEX_INSTANTIATE_(FLOAT|DOUBLE)' cppRepository: NVIDIA/cuopt
Length of output: 13609
Fix the float instantiation guard name.
The file uses MIP_INSTANTIATION_FLOAT, but the correct macro name throughout the codebase is MIP_INSTANTIATE_FLOAT. This typo appears in both cpp/src/branch_and_bound/branch_and_bound.cpp and cpp/src/mip_heuristics/root_lp.cu; without the fix, float template specializations won't be emitted and float builds will fail at link time.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@cpp/src/branch_and_bound/branch_and_bound.cpp` around lines 3970 - 3974, The
preprocessor guard name is misspelled: replace occurrences of
MIP_INSTANTIATION_FLOAT with the correct macro MIP_INSTANTIATE_FLOAT so the
float template instantiations are emitted; specifically update the conditional
around the explicit instantiation "template class branch_and_bound_t<int,
float>;" (and the analogous guard in the file containing root LP CUDA
instantiations) to use MIP_INSTANTIATE_FLOAT so float specializations link
correctly.
| #if CUOPT_INSTANTIATE_FLOAT | ||
|
|
||
| template void convert_user_problem<int, float>( | ||
| const user_problem_t<int, float>& user_problem, | ||
| const simplex_solver_settings_t<int, float>& settings, | ||
| lp_problem_t<int, float>& problem, | ||
| std::vector<int>& new_slacks, | ||
| dualize_info_t<int, float>& dualize_info); | ||
|
|
||
| template void convert_user_lp_with_guess<int, float>( | ||
| const user_problem_t<int, float>& user_problem, | ||
| const lp_solution_t<int, float>& initial_solution, | ||
| const std::vector<float>& initial_slack, | ||
| lp_problem_t<int, float>& lp, | ||
| lp_solution_t<int, float>& converted_solution); | ||
|
|
||
| template int presolve<int, float>(const lp_problem_t<int, float>& original, | ||
| const simplex_solver_settings_t<int, float>& settings, | ||
| lp_problem_t<int, float>& presolved, | ||
| presolve_info_t<int, float>& presolve_info); | ||
|
|
||
| template void crush_primal_solution<int, float>(const user_problem_t<int, float>& user_problem, | ||
| const lp_problem_t<int, float>& problem, | ||
| const std::vector<float>& user_solution, | ||
| const std::vector<int>& new_slacks, | ||
| std::vector<float>& solution); | ||
|
|
||
| template float crush_dual_solution<int, float>(const user_problem_t<int, float>& user_problem, | ||
| const lp_problem_t<int, float>& problem, | ||
| const std::vector<int>& new_slacks, | ||
| const std::vector<float>& user_y, | ||
| const std::vector<float>& user_z, | ||
| std::vector<float>& y, | ||
| std::vector<float>& z); | ||
|
|
||
| template void uncrush_primal_solution<int, float>(const user_problem_t<int, float>& user_problem, | ||
| const lp_problem_t<int, float>& problem, | ||
| const std::vector<float>& solution, | ||
| std::vector<float>& user_solution); | ||
|
|
||
| template void uncrush_dual_solution<int, float>(const user_problem_t<int, float>& user_problem, | ||
| const lp_problem_t<int, float>& problem, | ||
| const std::vector<float>& y, | ||
| const std::vector<float>& z, | ||
| std::vector<float>& user_y, | ||
| std::vector<float>& user_z); | ||
|
|
||
| template void uncrush_solution<int, float>(const presolve_info_t<int, float>& presolve_info, | ||
| const simplex_solver_settings_t<int, float>& settings, | ||
| const std::vector<float>& crushed_x, | ||
| const std::vector<float>& crushed_y, | ||
| const std::vector<float>& crushed_z, | ||
| std::vector<float>& uncrushed_x, | ||
| std::vector<float>& uncrushed_y, | ||
| std::vector<float>& uncrushed_z); | ||
|
|
||
| #endif |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Find where float instantiations might be called and check tolerance handling
rg -n --type=cpp -C3 'presolve<int,\s*float>|convert_user_problem<int,\s*float>' cpp/src/Repository: NVIDIA/cuopt
Length of output: 1224
🏁 Script executed:
#!/bin/bash
# Find where these float instantiated functions are called
echo "=== Searching for calls to float-instantiated functions ==="
rg -n --type=cpp 'presolve<int,\s*float>|convert_user_problem<int,\s*float>|crush_primal_solution<int,\s*float>|crush_dual_solution<int,\s*float>' --no-filename cpp/src/ | grep -v "^template"
echo -e "\n=== Checking tolerance constants in presolve.cpp ==="
rg -n '1e-13|1e-6|1e-7|tolerance' cpp/src/dual_simplex/presolve.cpp | head -20Repository: NVIDIA/cuopt
Length of output: 1231
🏁 Script executed:
#!/bin/bash
# Check if there's any conditional logic for float precision handling
echo "=== Looking for float-specific handling or guards ==="
rg -n --type=cpp -B2 -A2 'CUOPT_INSTANTIATE_FLOAT|float.*tolerance|precision' cpp/src/dual_simplex/presolve.cpp | head -40
echo -e "\n=== Check if float instantiations are guarded or conditionally used ==="
rg -n --type=cpp 'if.*float|INSTANTIATE.*FLOAT' cpp/src/ | head -20Repository: NVIDIA/cuopt
Length of output: 1400
🏁 Script executed:
#!/bin/bash
# Find the actual usage of presolve in the codebase to understand calling patterns
echo "=== Finding where presolve function is called ==="
rg -n 'presolve\(' cpp/src/ --type=cpp | grep -v template | grep -v 'presolve\[' | head -20
echo -e "\n=== Check headers for documentation about float vs double ==="
find cpp/include -name "*.h" -o -name "*.hpp" | xargs rg -l 'float.*precision|tolerance' | head -5Repository: NVIDIA/cuopt
Length of output: 1357
🏁 Script executed:
#!/bin/bash
# Find where CUOPT_INSTANTIATE_FLOAT is defined
echo "=== Where is CUOPT_INSTANTIATE_FLOAT defined? ==="
rg -n 'define.*CUOPT_INSTANTIATE_FLOAT|CUOPT_INSTANTIATE_FLOAT' cpp/ | head -20
echo -e "\n=== Check build configuration ==="
find . -name "CMakeLists.txt" -o -name "*.cmake" | xargs rg -l 'CUOPT_INSTANTIATE_FLOAT' 2>/dev/nullRepository: NVIDIA/cuopt
Length of output: 712
🏁 Script executed:
#!/bin/bash
# Look at solve.cpp context where presolve is called
echo "=== Context around presolve calls in solve.cpp ==="
sed -n '140,170p' cpp/src/dual_simplex/solve.cpp
echo "---"
sed -n '340,360p' cpp/src/dual_simplex/solve.cppRepository: NVIDIA/cuopt
Length of output: 2563
🏁 Script executed:
#!/bin/bash
# Check the actual template definition of presolve to understand how it handles types
echo "=== Presolve template definition signature ==="
sed -n '819,830p' cpp/src/dual_simplex/presolve.cpp
echo -e "\n=== Check if there are any float-specific tolerance adjustments ==="
rg -n 'std::is_same.*float|if constexpr.*float|enable_if.*float' cpp/src/dual_simplex/presolve.cppRepository: NVIDIA/cuopt
Length of output: 617
🏁 Script executed:
#!/bin/bash
# Check tolerance usage more carefully - are they derived from precision?
echo "=== Check if tolerances are derived from numeric traits ==="
rg -n 'numeric_limits|epsilon|std::numeric' cpp/src/dual_simplex/presolve.cpp
echo -e "\n=== Look at the actual values and how they're used ==="
sed -n '440,460p' cpp/src/dual_simplex/presolve.cpp
echo "---"
sed -n '490,505p' cpp/src/dual_simplex/presolve.cppRepository: NVIDIA/cuopt
Length of output: 1396
Float template instantiations use hardcoded tolerances unsuitable for single-precision arithmetic.
The float-instantiated functions (lines 1576–1632) use tolerances hardcoded for double precision: 1e-13 in dependent row detection (line 448) and 1e-6 in multiple checks (lines 496, 1233, 1243, 1299). These values were tuned assuming ~15 decimal digits of precision; float provides only ~7 decimal digits.
More critically, CUOPT_INSTANTIATE_FLOAT is currently set to 0 in cpp/include/cuopt/linear_programming/constants.h, meaning these instantiations are not compiled in default builds—this is untested aspirational code. If enabled in the future without addressing precision-aware tolerance tuning, float LP solves will fail on valid problems.
Either:
- Adjust tolerances using
std::numeric_limits<f_t>::epsilon()or precision-aware constants (preferred for general float support), or - Explicitly document and mark float support as disabled with a clear reason (if float paths are not intended).
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@cpp/src/dual_simplex/presolve.cpp` around lines 1576 - 1632, The float
template instantiations (templates for convert_user_problem,
convert_user_lp_with_guess, presolve, crush_/uncrush_* and uncrush_solution)
rely on hardcoded double-precision tolerances (e.g. 1e-13 and 1e-6 used in
dependent row detection and multiple checks) and are untested because
CUOPT_INSTANTIATE_FLOAT is disabled; fix by making tolerances precision-aware
(use a scaled std::numeric_limits<f_t>::epsilon() or a f_t-specific constant
derived from epsilon for checks inside presolve and dependent-row detection code
paths) or explicitly document/keep float instantiation disabled by setting
CUOPT_INSTANTIATE_FLOAT=0 and adding a clear TODO explaining why float paths are
unsupported; update the instantiation block and any tolerance uses referenced by
presolve, dependent-row detection, and the functions listed above to use the new
precision-aware tolerance mechanism or leave the instantiation guarded with a
comment explaining the limitation.
| bool bb_thread_solution_exists = simplex_solution_exists.load(); | ||
| if (bb_thread_solution_exists) { | ||
| ls.lp_optimal_exists = true; | ||
| } else if (branch_and_bound_ptr != nullptr && | ||
| branch_and_bound_ptr->enable_concurrent_lp_root_solve()) { | ||
| // B&B drives root relaxation; wait for first solution (PDLP/Barrier or dual simplex) | ||
| first_solution_ready_.store(false, std::memory_order_release); | ||
| std::unique_lock<std::mutex> lock(first_solution_mutex_); | ||
| first_solution_cv_.wait(lock, [this]() { return first_solution_ready_.load(); }); |
There was a problem hiding this comment.
Don't reset the first-solution latch inside the waiting branch.
Line 423 can erase a notification that arrived before run_solver() reached this block, and this wait is never woken when dual simplex publishes the root solution via set_simplex_solution(). That leaves the heuristics thread blocked even though B&B already has a usable root LP solution.
Suggested fix
} else if (branch_and_bound_ptr != nullptr &&
branch_and_bound_ptr->enable_concurrent_lp_root_solve()) {
// B&B drives root relaxation; wait for first solution (PDLP/Barrier or dual simplex)
- first_solution_ready_.store(false, std::memory_order_release);
std::unique_lock<std::mutex> lock(first_solution_mutex_);
first_solution_cv_.wait(lock, [this]() { return first_solution_ready_.load(); });
lock.unlock();
clamp_within_var_bounds(lp_optimal_solution, problem_ptr, problem_ptr->handle_ptr);Also publish the same latch from set_simplex_solution(...) after the stream sync:
{
std::lock_guard<std::mutex> lock(first_solution_mutex_);
first_solution_ready_.store(true, std::memory_order_release);
}
first_solution_cv_.notify_all();As per coding guidelines, "Ensure race conditions are absent in multi-threaded server implementations; verify proper synchronization of shared state."
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@cpp/src/mip_heuristics/diversity/diversity_manager.cu` around lines 417 -
425, Remove the resetting of the first-solution latch inside the waiting branch
(do not call first_solution_ready_.store(false, ...) in the run_solver() waiting
path) so an earlier notification isn't erased; leave the wait as just acquiring
first_solution_mutex_ and calling first_solution_cv_.wait(...). Additionally,
ensure set_simplex_solution(...) publishes the latch after stream sync by
setting first_solution_ready_.store(true, std::memory_order_release) while
holding first_solution_mutex_ (use std::lock_guard<std::mutex> on
first_solution_mutex_) and then calling first_solution_cv_.notify_all() so the
heuristics thread is reliably woken.
| template <typename i_t, typename f_t> | ||
| void diversity_manager_t<i_t, f_t>::on_first_lp_solution( | ||
| cuopt::linear_programming::dual_simplex::root_relaxation_first_solution_t<i_t, f_t> const& result) | ||
| { | ||
| { | ||
| std::lock_guard<std::mutex> lock(relaxed_solution_mutex); | ||
| cuopt_assert(result.primal.size() == lp_optimal_solution.size(), | ||
| "First LP solution primal size mismatch"); | ||
| cuopt_assert(result.dual.size() == lp_dual_optimal_solution.size(), | ||
| "First LP solution dual size mismatch"); | ||
| raft::copy(lp_optimal_solution.data(), | ||
| result.primal.data(), | ||
| result.primal.size(), | ||
| problem_ptr->handle_ptr->get_stream()); | ||
| raft::copy(lp_dual_optimal_solution.data(), | ||
| result.dual.data(), | ||
| result.dual.size(), | ||
| problem_ptr->handle_ptr->get_stream()); | ||
| problem_ptr->handle_ptr->sync_stream(); | ||
| ls.lp_optimal_exists = true; | ||
| set_new_user_bound(result.user_objective); | ||
| } | ||
| { | ||
| std::lock_guard<std::mutex> lock(first_solution_mutex_); | ||
| first_solution_ready_.store(true, std::memory_order_release); | ||
| first_solution_cv_.notify_all(); | ||
| } |
There was a problem hiding this comment.
Ignore late PDLP/Barrier callbacks after simplex has already won.
on_first_lp_solution() unconditionally overwrites lp_optimal_solution and the reported bound. If dual simplex wins first, the first PDLP/Barrier thread to stop can still replace that state with a weaker or partially converged result.
Guard against stale callbacks
void diversity_manager_t<i_t, f_t>::on_first_lp_solution(
cuopt::linear_programming::dual_simplex::root_relaxation_first_solution_t<i_t, f_t> const& result)
{
{
std::lock_guard<std::mutex> lock(relaxed_solution_mutex);
+ if (simplex_solution_exists.load(std::memory_order_acquire)) { return; }
cuopt_assert(result.primal.size() == lp_optimal_solution.size(),
"First LP solution primal size mismatch");As per coding guidelines, "Validate algorithm correctness in optimization logic: simplex pivots, branch-and-bound decisions, routing heuristics, and constraint/objective handling must produce correct results" and "Ensure race conditions are absent in multi-threaded server implementations; verify proper synchronization of shared state."
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@cpp/src/mip_heuristics/diversity/diversity_manager.cu` around lines 839 -
865, The callback unconditionally overwrites shared LP state; guard against
stale PDLP/Barrier callbacks by checking and honoring the simplex-won flag
before updating: inside on_first_lp_solution(), while holding
relaxed_solution_mutex, early-return if ls.lp_optimal_exists is already true (or
if lp_optimal exists and the incoming result.user_objective is not strictly
better), otherwise perform the raft::copy into lp_optimal_solution and
lp_dual_optimal_solution, call set_new_user_bound(result.user_objective) and
then set ls.lp_optimal_exists = true; keep the existing
first_solution_ready_/first_solution_cv_ behavior but ensure the
relaxed_solution_mutex check prevents races between on_first_lp_solution, other
LP callbacks, and simplex completion.
| cuopt::linear_programming::dual_simplex::root_relaxation_first_solution_t<i_t, f_t> | ||
| copy_lp_result_to_root_solution(problem_t<i_t, f_t>* problem, | ||
| const optimization_problem_solution_t<i_t, f_t>& lp_result) | ||
| { | ||
| cuopt::linear_programming::dual_simplex::root_relaxation_first_solution_t<i_t, f_t> result; | ||
| auto stream = problem->handle_ptr->get_stream(); | ||
| result.primal.resize(lp_result.get_primal_solution().size()); | ||
| result.dual.resize(lp_result.get_dual_solution().size()); | ||
| result.reduced_costs.resize(lp_result.get_reduced_cost().size()); | ||
| raft::copy( | ||
| result.primal.data(), lp_result.get_primal_solution().data(), result.primal.size(), stream); | ||
| raft::copy(result.dual.data(), lp_result.get_dual_solution().data(), result.dual.size(), stream); | ||
| raft::copy(result.reduced_costs.data(), | ||
| lp_result.get_reduced_cost().data(), | ||
| result.reduced_costs.size(), | ||
| stream); | ||
| problem->handle_ptr->sync_stream(); | ||
| result.objective = problem->get_solver_obj_from_user_obj(lp_result.get_objective_value()); | ||
| result.user_objective = lp_result.get_objective_value(); | ||
| result.iterations = lp_result.get_additional_termination_information().number_of_steps_taken; | ||
| return result; |
There was a problem hiding this comment.
Keep the LP termination status in this handoff.
solve_lp_with_method() returns a status, but this helper drops it and always materializes primal/dual/reduced_costs. That prevents callers from reproducing the old logic in diversity_manager.cu that rejected infeasible outputs, and it means run_crush_crossover_and_maybe_win() may drive crossover or callbacks from a non-usable result.
Return the termination status alongside the vectors and gate on_first_lp_solution / crossover on the statuses that actually guarantee a usable root solution. As per coding guidelines, "Validate algorithm correctness in optimization logic: simplex pivots, branch-and-bound decisions, routing heuristics, and constraint/objective handling must produce correct results" and "Verify error propagation from CUDA to user-facing APIs is complete; ensure CUDA errors are caught and mapped to meaningful user error codes."
Also applies to: 78-80
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@cpp/src/mip_heuristics/root_lp.cu` around lines 25 - 45,
copy_lp_result_to_root_solution currently discards the LP termination status;
update it to also capture and propagate the solver status from
optimization_problem_solution_t (e.g. lp_result.get_status() or equivalent) into
root_relaxation_first_solution_t so callers (like
run_crush_crossover_and_maybe_win and on_first_lp_solution) can gate further
work. Specifically: add a status field to root_relaxation_first_solution_t (if
missing), set it from lp_result, and only materialize/copy
primal/dual/reduced_costs (or mark them invalid/empty) when the status indicates
a usable solution; otherwise leave vectors empty or flagged and still sync the
stream and set objective/iterations as appropriate. Ensure error/status
propagation follows CUDA-to-API mapping guidelines so downstream logic can
reject infeasible or errored LP results instead of proceeding.
| @@ -1 +1 @@ | |||
| release/26.04 | |||
| main | |||
There was a problem hiding this comment.
Avoid pointing release flow to main for this branch marker.
Using main here can pull moving-head workflow/config changes into a release line and make builds non-reproducible. Prefer pinning to the release branch/tag used for this train (e.g., release/26.04 or an explicit immutable ref).
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@RAPIDS_BRANCH` at line 1, The RAPIDS_BRANCH file currently points at the
mutable "main" branch which risks pulling moving-head changes into the release
flow; update the branch marker to a stable release branch or tag (for example
use "release/26.04" or an explicit immutable ref) by replacing the "main" entry
in RAPIDS_BRANCH with the chosen release branch or tag so builds remain
reproducible.
| --- | ||
| name: cuopt-installation-api-python | ||
| version: "26.04.00" | ||
| version: "26.06.00" |
There was a problem hiding this comment.
Sync the documented package pin with the new skill version.
Line 3 declares 26.06.00, but the CUDA 12 pip command in this file still pins cuopt-cu12==26.2.* (Line 26). That mismatch can route users to an older release stream.
🛠️ Proposed doc fix
- pip install --extra-index-url=https://pypi.nvidia.com 'cuopt-cu12==26.2.*'
+ pip install --extra-index-url=https://pypi.nvidia.com 'cuopt-cu12==26.6.*'🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@skills/cuopt-installation-api-python/SKILL.md` at line 3, The documented
skill version ("version: \"26.06.00\"") and the CUDA12 pip package pin
("cuopt-cu12==26.2.*") in SKILL.md are out of sync; update the pip install pin
to match the new skill version by replacing the "cuopt-cu12==26.2.*" specifier
with a matching 26.06.* (or 26.06.00-compatible) pin so the CUDA 12 install
command aligns with "26.06.00".
…Replace linear degree-bucket search with O(1) swap-with-last removal using col_pos/row_pos position arrays, and eliminate O(row_degree) pre-traversal in schur_complement via a persistent last_in_row[] array
Allocate buffers once before the superbasic loop and reset with std::fill each iteration to avoid repeated O(n) allocations (PR NVIDIA#948 review). Made-with: Cursor
This reverts commit e2ea687.
|
|
||
| // First LP solution from either PDLP/Barrier or dual simplex; used to notify diversity manager | ||
| // without B&B depending on PDLP types. | ||
| template <typename i_t, typename f_t> |
There was a problem hiding this comment.
This does not belong in types.hpp. types.hpp is super low-level and meant to include only very common definitions used across files.
Please move this to another file.
| std::function<void()> heuristic_preemption_callback; | ||
| std::function<void(std::vector<f_t>&, std::vector<f_t>&, f_t)> set_simplex_solution_callback; | ||
| // Called by B&B when first LP solution is available (PDLP/Barrier or dual simplex). | ||
| std::function<void(root_relaxation_first_solution_t<i_t, f_t> const&)> |
There was a problem hiding this comment.
How does this differ from the existing set_simplex_solution_callback? Do we need both callbacks?
| @@ -30,7 +30,7 @@ struct element_t { | |||
| f_t x; // coefficient value | |||
There was a problem hiding this comment.
Let's not merge in changes from another PR. That makes things difficult to review.
|
|
||
| #endif | ||
|
|
||
| #if CUOPT_INSTANTIATE_FLOAT |
There was a problem hiding this comment.
Can we add FLOAT instation in another PR? I think there is a ton of work to get the code to compile and work with float. I'm not sure we should do that in the same PR that we are changing the concurrent root solve.
| @@ -331,6 +331,7 @@ void compute_dual_solution_from_basis(const lp_problem_t<i_t, f_t>& lp, | |||
|
|
|||
There was a problem hiding this comment.
Let's not include the dual push changes in this PR. This makes the PR difficult to review.
chris-maes
left a comment
There was a problem hiding this comment.
Can you separate out the other PRs?
Description
Issue
Checklist