Skip to content

Comments

Main merge release/26.02 4#830

Merged
rgsl888prabhu merged 6 commits intoNVIDIA:mainfrom
rgsl888prabhu:main-merge-release/26.02_4
Feb 6, 2026
Merged

Main merge release/26.02 4#830
rgsl888prabhu merged 6 commits intoNVIDIA:mainfrom
rgsl888prabhu:main-merge-release/26.02_4

Conversation

@rgsl888prabhu
Copy link
Collaborator

@rgsl888prabhu rgsl888prabhu commented Feb 5, 2026

For #827

Summary by CodeRabbit

  • New Features

    • Cutting‑plane framework for MIP (Gomory, MIR, Knapsack, Strong CG) with cut pool, generation, scoring and integration.
  • New Configuration

    • New solver options for cut control, node limits, GPU/thread tuning and cut‑tuning thresholds.
  • Improvements

    • More robust concurrent‑halt, presolve, basis update and diagnostics; better logging and timing.
  • API Changes

    • Several public interfaces extended (bounds strengthening, basis repair, branch‑and‑bound, sparse matrix/vector helpers, timer).
  • Tests / Regression

    • New unit tests for cuts and new regression benchmarking utilities and scripts.

Iroy30 and others added 3 commits February 4, 2026 20:05
This PR adds cuts to the MIP solver. This includes the following:

1.  Add constraints in the form C*x <= d to an LP that has already been solved to optimality (and has basis information).
    -  The constraints must be violated at the current relaxation solution x^star. That is, C*x^star > d. 
    - The constraint matrix, rhs, basis, and basis factorization, are all updated to include the additional constraints.
    - Dual simplex is started in phase 2 from a dual feasible solution. 
2.  Remove constraints from an LP that has already been solved to optimality.
    - The constraints must have slacks in the basis
    - The basis is refactored from scratch
3. Add cut pass loop after solving the root relaxation 
4. Add a cut pool to store cuts and select cuts
     - We currently score cuts  based on distance and orthogonality.  
6. Add Mixed Integer Gomory Cuts
     - These are computed via a MIR cut on a row of the simplex tableau
7. Add Mixed Integer Rounding (MIR) Cuts
    - These are constructed by aggregating rows of the constraint matrix.
8. Add Strong Chvatal-Gomory Cuts
   - These are constructed from a row of the tableau matrix and from rows of the constraint matrix.
9. Fixes to Handling of Steepest Edge Norms in Dual Simplex
    - Ensure that all basic variables have a positive steepest edge norms
10. Reduced Costs Fixing at Root Node 
   -  These are applied after each cut pass and after strong branching, if a heuristic solution is available.
12. Fix issues in Crossover when solving the dual problem
   - We were not correctly populating slack variables when solving the dual. This issue appeared on graph20-80-1rand
14. Fix issue in Crossover when basis became rank-deficient in dual push
15. Fix issues across the code with handling and propagating concurrent halt. 
16. New solver options: mip-cut-passes, mip-mixed-integer-gomory-cuts, mip-mir-cuts, mip-strong-chvatal-gomory-cuts, mip-knapsack-cuts, mip-cut-change-threshold, mip-cut-min-orthogonality.

Closes NVIDIA#698, NVIDIA#205 

Results from a GH200 A/B test with 64 threads and a 300 second time limit. Further runs needed with larger time limit.  Further work is needed to get the full benefit of cuts.

A: cuts PR with --mip-cut-passes=10
B: cuts PR with --mip-cut-passes=0

Geomean MIP GAP A / (B = baseline): 0.97
Geomean Time to Optimal A/B: 0.96
A optimal 45
B optimal 37
A problems with feasible solutions 225
B problems with feasible solutions 224

A wins
drayage-100-23 1.14 8.61
gfd-schedulen180f7d50m30k18 87.85 300.0
neos-827175 12.28 25.86
neos-1171448 28.85 98.2
n5-3 180.62 300.0
neos859080 0.78 1.38
seymour1 38.78 300.0
neos-860300 71.98 90.91
neos-3083819-nubu 30.48 300.0
neos-933966 248.26 300.0
neos-957323 46.36 300.0
neos-960392 50.12 115.68
netdiversion 121.59 300.0
ns1208400 162.15 300.0
nw04 39.89 45.98
piperout-27 8.22 14.09
supportcase7 30.48 300.0
supportcase6 70.22 300.0
uccase12 26.95 41.87
A wins  19

A losses
app1-1 2.3 0.74
cbs-cta 2.21 1.83
irp 17.59 12.94
istanbul-no-cutoff 18.26 13.56
mas76 300.0 20.98
neos-1122047 8.3 6.46
neos-1445765 1.86 1.52
neos-1582420 134.3 93.44
neos-3004026-krka 6.56 1.31
neos8 0.88 0.68
ns1952667 99.84 64.92
piperout-08 10.75 8.23
pk1 224.32 91.84
qap10 134.76 62.82
swath1 60.04 50.3
trento1 300.0 172.28
triptim1 246.65 130.92
A losses 17

Authors:
  - Chris Maes (https://github.com/chris-maes)

Approvers:
  - Ramakrishnap (https://github.com/rgsl888prabhu)
  - Akif ÇÖRDÜK (https://github.com/akifcorduk)
  - Alice Boucher (https://github.com/aliceb-nv)

URL: NVIDIA#814
@rgsl888prabhu rgsl888prabhu requested review from a team as code owners February 5, 2026 17:28
@rgsl888prabhu rgsl888prabhu added non-breaking Introduces a non-breaking change improvement Improves an existing functionality labels Feb 5, 2026
@coderabbitai
Copy link

coderabbitai bot commented Feb 5, 2026

📝 Walkthrough

Walkthrough

Adds a cuts subsystem (Gomory, MIR, Knapsack, Strong CG), integrates it into dual-simplex MIP flows, extends MIP/cut solver settings, threads basis/edge-norms and timing through branch-and-bound, standardizes concurrent-halt sentinels, updates sparse/CSR helpers, and adds regression benchmarking tooling and tests.

Changes

Cohort / File(s) Summary
MIP Constants & Settings
cpp/include/cuopt/linear_programming/constants.h, cpp/include/cuopt/linear_programming/mip/solver_settings.hpp, cpp/src/dual_simplex/simplex_solver_settings.hpp, cpp/src/math_optimization/solver_settings.cu
Added MIP/cut string constants and solver settings fields (cut controls, max_cut_passes, node_limit, num_gpus, sub_mip, reliability_branching, cut tuning params) and registered new parameters.
Cutting‑Plane Implementation
cpp/src/dual_simplex/cuts.hpp, cpp/src/dual_simplex/cuts.cpp, cpp/src/dual_simplex/CMakeLists.txt
New cuts framework: cut_pool_t, cut_generation_t, MIR/Gomory/CG/Knapsack generators, add/remove cuts, verification I/O and test integration; large new implementation and header added to build.
Branch & B&B Integration
cpp/src/dual_simplex/branch_and_bound.hpp, cpp/src/dual_simplex/branch_and_bound.cpp, cpp/src/dual_simplex/mip_node.hpp, cpp/src/mip/diversity/...
Extended branch_and_bound API (start_time), threaded edge_norms, basis_update, basic/nonbasic lists, added set_solution_at_root/find_reduced_cost_fixings/initialize_diving_heuristics_settings; added integer_infeasible in nodes; updated heuristics to pass tic().
Basis Updates & Matrix Ops
cpp/src/dual_simplex/basis_updates.hpp, cpp/src/dual_simplex/basis_updates.cpp, cpp/src/dual_simplex/dense_matrix.hpp
Added basis_update_mpf_t::append_cuts and refactored refactor flow to use superbasic_list/status-driven retry; added dense_matrix ctor(value).
Sparse Matrix/Vector Utilities
cpp/src/dual_simplex/sparse_matrix.hpp, cpp/src/dual_simplex/sparse_matrix.cpp, cpp/src/dual_simplex/sparse_vector.hpp, cpp/src/dual_simplex/sparse_vector.cpp
Added csr_matrix_t::append_rows/append_row, changed csr::check_matrix to return i_t with bounds checks; added sparse_vector ctor-from-row, dot, squeeze, sort, norm2_squared.
Error/Status & Sentinel Standardization
cpp/src/dual_simplex/types.hpp, cpp/src/dual_simplex/barrier.cu, cpp/src/dual_simplex/basis_solves.cpp, cpp/src/dual_simplex/bound_flipping_ratio_test.*, cpp/src/dual_simplex/crossover.cpp, cpp/src/dual_simplex/right_looking_lu.cpp, cpp/src/dual_simplex/sparse_cholesky.cuh, cpp/src/dual_simplex/primal.*, cpp/src/dual_simplex/phase2.cpp
Introduced CONCURRENT_HALT_RETURN and ratio-test named constants; replaced numerous numeric sentinel returns (-1/-2/-3/-4) with named constants; added CONCURRENT_LIMIT to primal status enum and propagated handling.
Bounds Strengthening API
cpp/src/dual_simplex/bounds_strengthening.hpp, cpp/src/dual_simplex/bounds_strengthening.cpp, cpp/src/dual_simplex/presolve.cpp
Changed bounds_strengthening signature to accept settings first and a std::vector<bool> bounds_changed input; removed internal bounds_changed member and updated callers and column-wise logic.
Phase2 / Crossover / Diagnostics
cpp/src/dual_simplex/phase2.cpp, cpp/src/dual_simplex/crossover.cpp, cpp/src/dual_simplex/solve.cpp, cpp/src/dual_simplex/folding.cpp, cpp/src/dual_simplex/pseudo_costs.cpp, cpp/src/dual_simplex/solution.hpp
Added primal_infeasibility_breakdown and extended prepare_optimality; folded coloring_status_t for folding; updated timing/logging for strong branching; tightened concurrent-halt propagation and added extra diagnostics and slack handling during solve/crossover.
Sparse Cholesky / LU concurrent paths
cpp/src/dual_simplex/sparse_cholesky.cuh, cpp/src/dual_simplex/right_looking_lu.cpp
Replaced hard-coded -2/-1 returns with CONCURRENT_HALT_RETURN in multiple GPU/cuDSS and LU paths.
Solver / Presolve / Timer
cpp/src/mip/solver.cu, cpp/src/mip/solve.cu, cpp/src/mip/presolve/third_party_presolve.cpp, cpp/src/utilities/timer.hpp
Gated presolve by settings.presolve; passed cut/timing/node_limit settings into branch_and_bound; switched timer calls to get_time_limit/get_tic_start; added timer_t::get_tic_start().
CMake / Tests
cpp/src/dual_simplex/CMakeLists.txt, cpp/tests/mip/CMakeLists.txt, cpp/tests/mip/cuts_test.cu, cpp/tests/mip/termination_test.cu, python/.../test_python_API.py
Added cuts.cpp to build, added CUTS_TEST and new cuts_test.cu; tests updated to use tolerances (EXPECT_NEAR) for floating comparisons.
Regression Benchmarking Suite
regression/* (scripts, configs, Python tools)
Added full regression infrastructure: benchmark runner, utils, dataset downloader, result aggregation, ASV DB updater, HTML/Slack reporting, shell orchestration scripts, and configs for LP/MIP/routing.

Estimated code review effort

🎯 5 (Critical) | ⏱️ ~120 minutes

Possibly related PRs

  • Batch PDLP #805: touches PDLP/strong-branching configuration and related solver-setting propagation; likely overlaps with this PR's strong-branching / settings changes.

Suggested labels

mip

Suggested reviewers

  • aliceb-nv
🚥 Pre-merge checks | ✅ 1 | ❌ 2
❌ Failed checks (1 warning, 1 inconclusive)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 2.86% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Title check ❓ Inconclusive The title 'Main merge release/26.02 4' is vague and generic, using non-descriptive phrasing that does not clearly convey the main changes in the changeset. Revise the title to be more specific and descriptive of the primary change, such as 'Add superbasic variable support in basis repair and improve root solve after cuts' or similar.
✅ Passed checks (1 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 19

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (4)
cpp/src/dual_simplex/mip_node.hpp (2)

232-241: ⚠️ Potential issue | 🟠 Major

Carry integer_infeasible through detach_copy().
The detached node should preserve the integer infeasibility flag; otherwise, downstream logic may observe a stale/default value.

🐛 Suggested fix
     copy.branch_var         = branch_var;
     copy.branch_dir         = branch_dir;
     copy.branch_var_lower   = branch_var_lower;
     copy.branch_var_upper   = branch_var_upper;
     copy.fractional_val     = fractional_val;
+    copy.integer_infeasible = integer_infeasible;
     copy.objective_estimate = objective_estimate;
     copy.node_id            = node_id;

37-49: ⚠️ Potential issue | 🟠 Major

Initialize integer_infeasible in the default constructor.

The default constructor at lines 37-49 fails to initialize integer_infeasible, leaving it with an indeterminate value. This member is accessed during branching logic and reporting (e.g., branch_and_bound.cpp references node->integer_infeasible), which can cause nondeterministic behavior. Initialize to -1 to match the second constructor's convention.

Additionally, the detach_copy() method (lines 232-243) does not copy integer_infeasible when creating a detached node, causing loss of this state during diving operations.

🐛 Suggested fix for default constructor
   mip_node_t()
     : status(node_status_t::PENDING),
       lower_bound(-std::numeric_limits<f_t>::infinity()),
       depth(0),
       parent(nullptr),
       node_id(0),
       branch_var(-1),
       branch_dir(rounding_direction_t::NONE),
       branch_var_lower(-std::numeric_limits<f_t>::infinity()),
       branch_var_upper(std::numeric_limits<f_t>::infinity()),
       fractional_val(std::numeric_limits<f_t>::infinity()),
+      integer_infeasible(-1),
       objective_estimate(std::numeric_limits<f_t>::infinity()),
       vstatus(0)
cpp/src/dual_simplex/sparse_matrix.cpp (1)

572-580: ⚠️ Potential issue | 🟡 Minor

Fix the col_start size guard to prevent OOB in debug checks.

The guard uses j >= col_start.size(), but you read col_start[j + 1]; if col_start.size() == n, the current check won’t trigger and still OOB. Guard on j + 1 (or pre-check size() < n + 1).

✅ Safer bound check
-    if (j >= col_start.size()) {
+    if (j + 1 >= col_start.size()) {
       printf("Col start too small size %ld n %d\n", col_start.size(), this->n);
       return -1;
     }
cpp/src/dual_simplex/branch_and_bound.cpp (1)

1556-1573: ⚠️ Potential issue | 🟠 Major

Break loop when dual simplex task completes, not just when signals arrive.

The loop at lines 1569–1573 spins indefinitely if neither root_crossover_solution_set_ nor root_concurrent_halt() changes. If dual simplex finishes (e.g., TIME_LIMIT) without an external solver providing a solution via set_root_relaxation_solution(), both signals stay false and the loop never exits. Add a check for the future's completion to guarantee the loop terminates:

Suggested guard
-  while (!root_crossover_solution_set_.load(std::memory_order_acquire) &&
-         *get_root_concurrent_halt() == 0) {
+  while (!root_crossover_solution_set_.load(std::memory_order_acquire) &&
+         *get_root_concurrent_halt() == 0 &&
+         root_status_future.wait_for(std::chrono::milliseconds(0)) != std::future_status::ready) {
     std::this_thread::sleep_for(std::chrono::milliseconds(1));
     continue;
   }
🤖 Fix all issues with AI agents
In `@cpp/src/dual_simplex/bounds_strengthening.cpp`:
- Around line 105-116: The loop indexes bounds_changed by j assuming it has
length n; add validation before the loop to avoid out-of-bounds access by
checking that bounds_changed.size() == static_cast<size_t>(n) (or
bounds_changed.size() >= n) and handle the mismatch (e.g., assert,
log+return/error, or resize/initialize bounds_changed) so the subsequent loop
over j and access to bounds_changed[j] is safe; apply this check near the start
of the block that references bounds_changed (the code surrounding the for (i_t j
= 0; j < n; ++j) loop that also touches A.col_start and constraint_changed) and
ensure constraint_changed is still reset only when the size check passes.

In `@cpp/src/dual_simplex/cuts.cpp`:
- Around line 44-95: Guard against near-zero norms in cut_distance and
cut_orthogonality: compute an epsilon (e.g., std::numeric_limits<f_t>::epsilon()
* max(1.0, cut_norm) or a small constant like 1e-12), then if cut_norm (used in
cut_distance and stored in cut_norms_) or either norm_i/norm_j is below that
epsilon, set cut_violation and cut_norm consistently and return a neutral score
(for cut_distance return 0.0 and for cut_orthogonality return 1.0) instead of
performing the division; reference the functions cut_pool_t::cut_distance,
cut_pool_t::cut_orthogonality and the field cut_norms_ when adding these guards.
- Around line 8-9: The cut scoring functions cut_distance() and
cut_orthogonality() currently divide by norms that can be zero; add a small
epsilon check before dividing to avoid NaN/Inf propagation. In cut_distance(),
compute cut_norm as before but only divide cut_violation by cut_norm when
cut_norm > epsilon (otherwise set distance to 0.0); in cut_orthogonality(),
check both norm_i and norm_j against epsilon and only compute the division when
(norm_i > epsilon && norm_j > epsilon) (otherwise set orthogonality to 0.0). Use
a consistent small epsilon (e.g. std::numeric_limits<f_t>::epsilon() or a tiny
constant) and ensure the functions return a well-defined 0.0 when norms are too
small.

In `@regression/benchmark_scripts/benchmark.py`:
- Around line 253-260: The code currently hard-codes n_files = 1 which prevents
iterating all discovered files and can cause IndexError when data_files is
empty; change n_files to use the actual number of discovered files (n_files =
len(data_files)) and short‑circuit when len(data_files) == 0 (e.g.,
return/continue or raise a clear error) before indexing into data_files; update
the logic around variables data_files, d_type, dataset_file_path, gpu_id/idx to
rely on n_files = len(data_files) so workers process all files and avoid
out‑of‑range access.
- Around line 237-333: The worker function currently has multiple processes
appending to test_status_file without synchronization (race/interleaving); fix
by creating a multiprocessing.Lock (or Manager().Lock) in run, pass that lock
into worker (add lock to worker signature and Process args), and in worker
surround the append to test_status_file with lock acquisition (use
lock.acquire()/release or a context manager) so only one process writes at a
time; alternatively implement per-process temp status files (named by gpu_id)
and merge them into test_status_file in run after all processes join.

In `@regression/benchmark_scripts/configs/README.md`:
- Around line 10-12: The fenced code block containing the command "aws s3 cp
/path/to/files s3://cuopt-datasets/regression_datasets/" in README.md lacks a
language tag; update the opening fence from ``` to ```bash so the block is
annotated (e.g., change the triple-backtick line immediately before that command
to include "bash") to satisfy markdownlint rule MD040.

In `@regression/config.sh`:
- Around line 30-34: Replace hard-coded placeholders by making
CUOPT_SLACK_APP_ID, S3_FILE_PREFIX and S3_URL_PREFIX required environment
variables and fail fast with a clear error if unset: update the logic that
defines CUOPT_SLACK_APP_ID, WEBHOOK_URL, S3_FILE_PREFIX and S3_URL_PREFIX to
check for empty values and print an explicit error message and exit when any
required var is missing; alternatively, if you prefer optional defaults, add
documentation comments above these vars explaining they MUST be overridden and
show example export commands for CUOPT_SLACK_APP_ID and S3_FILE_PREFIX to avoid
silent failures.

In `@regression/create-html-reports.sh`:
- Around line 173-180: The current snippet builds plot and log links using the
loop variable f which may be unset or only hold the last value; change the logic
in create-html-reports.sh to derive fixed relative paths from RESULTS_DIR
instead of using f: compute plot_rel_path and log_rel_path from RESULTS_DIR
(e.g. remove "$RESULTS_DIR/" and "$RESULTS_DIR/benchmarks/results/") and then
set plot_path and log_path from those fixed rel paths before echoing into
$report; update references to the variables plot_rel_path, plot_path,
log_rel_path, log_path and stop relying on f so links are always correct even
when the ALL_REPORTS loop has finished.

In `@regression/cronjob.sh`:
- Line 137: The cp invocation using glob patterns (cp
$ROUTING_CONFIGS_PATH/*config.json $LP_DATASETS_PATH/*config.json
$MIP_DATASETS_PATH/*config.json $ALL_CONFIGS_PATH/) can fail under set -e when
any glob doesn't match; update the script to either enable bash nullglob (shopt
-s nullglob) before this line and restore it after, or perform explicit
existence checks for each source pattern (e.g., test -e
"$ROUTING_CONFIGS_PATH/"*config.json) and only include patterns that match when
calling cp; reference the cp call and the variables ROUTING_CONFIGS_PATH,
LP_DATASETS_PATH, MIP_DATASETS_PATH, and ALL_CONFIGS_PATH when applying the fix.

In `@regression/functions.sh`:
- Around line 108-123: In cloneRepo, the function currently calls git clone
without the repo_name and uses an undefined error function; fix it by invoking
git clone with the explicit target directory (use repo_url and repo_name) and
handle failures locally (replace the undefined error call with a local non-zero
return or exit, e.g., return 1), ensure all variable expansions are quoted
(repo_url, repo_name, dest_dir) and check the rm -rf removal result by testing
if the directory still exists and returning an error code if so; keep pushd/popd
logic but make error handling self-contained inside cloneRepo.

In `@regression/get_datasets.py`:
- Around line 875-904: Both download_lp_dataset and download_mip_dataset may
attempt to write files into a non-existent directory causing FileNotFoundError;
fix by ensuring the target directory exists at the start of each function (e.g.,
call os.makedirs(dir, exist_ok=True) at the top of download_lp_dataset and
download_mip_dataset before using download() or constructing file paths) so
subsequent open/write in download() and extract() succeed.
- Around line 840-872: In extract(...) avoid shell=True and string interpolation
to prevent command injection: replace each subprocess.run(...) that currently
uses f-strings with subprocess.run([...], cwd=dir, check=True) using argument
lists (e.g., ["bzip2","-d", basefile] or ["gunzip","-c", basefile]) and for the
gunzip case capture stdout to a file via subprocess.run(..., stdout=opened_file)
instead of using ">" redirection; remove uses of "rm -rf ..." and instead delete
files with os.remove() or shutil.rmtree() (e.g., remove unzippedfile and emps
binaries), and ensure download(...) and the gcc/emps invocations also use
argument lists and cwd plus check=True; keep references to
MittelmannInstances["emps"], download, and the local filenames (basefile,
unzippedfile, outfile) when making these changes.
- Around line 823-835: The download function currently calls
urllib.request.urlopen on any scheme and doesn't close the response; update
download to validate the URL scheme (allow only "http" and "https", raising an
exception for others) and use a context manager to ensure the response is
closed: replace the direct urllib.request.urlopen(...) usage with a "with
urllib.request.urlopen(...)" block (passing the ssl context when "plato.asu.edu"
is in the URL), then read data inside that block and return or write it as
before; keep the scheme check and the special SSL context logic but ensure the
response variable is only used inside the with-statement so resources are
cleaned up.

In `@regression/run_regression.sh`:
- Around line 1-16: Enable strict Bash mode and guard against an empty
SCRATCH_DIR before any destructive operations: add shell strict flags (e.g., set
-euo pipefail with emphasis on set -u per repo preference) at the top of
run_regression.sh, check that SCRATCH_DIR is non-empty (fail early if
unbound/empty) before calling rm -rf $SCRATCH_DIR/routing_configs/*, and quote
all expansions of SCRATCH_DIR in commands (rm -rf
"$SCRATCH_DIR"/routing_configs/*, aws s3 cp
"s3://cuopt-datasets/regression_datasets/" "$SCRATCH_DIR"/routing_configs/
--recursive, python "$SCRATCH_DIR"/cuopt/regression/get_datasets.py
"$SCRATCH_DIR"/lp_datasets lp, etc.) so destructive ops never run when
SCRATCH_DIR is unset or contains spaces.

In `@regression/save_benchmark_results.py`:
- Line 10: The function name create_update_benchamrk_db contains a typo; rename
it to create_update_benchmark_db and update every reference (calls, imports,
tests, docs) to use the corrected name so the signature (benchmark_path,
output_path, commit_hash) remains the same; ensure any exported symbols or
__all__ entries are updated accordingly and run tests to confirm no remaining
references to create_update_benchamrk_db.
- Around line 21-32: The loop shadows the outer DataFrame named data by
assigning data = pd.read_csv(...) inside the for index, rows in data.iterrows()
loop; rename the inner variable (e.g., existing or existing_data) and update
subsequent uses (pd.concat and to_csv) to reference that new name so the outer
data and the loop variable rows remain unchanged and all rows are processed
correctly.

In `@regression/save_benchmarks.sh`:
- Around line 36-38: In save_benchmarks.sh ensure the script validates the
required argument ($1) before calling python
${CUOPT_SCRIPTS_DIR}/save_benchmark_results.py and aborts with a non‑zero exit
code if missing; after changing to ${CUOPT_SCRIPTS_DIR} check for
staged/unstaged changes (e.g., git status --porcelain) before running git
add/commit to avoid failing on "nothing to commit", only run git commit when
there are changes and handle git commit failure by exiting with an error, and
perform git push with basic retry logic and error handling so push failures are
detected and cause the script to fail instead of silently continuing. Use the
symbols save_benchmarks.sh, ${CUOPT_SCRIPTS_DIR}, ${RESULTS_DIR}, and
save_benchmark_results.py to locate where to add these checks.

In `@regression/test-container.sh`:
- Around line 10-18: The script calls logger (the function defined in
functions.sh) but only sources config.sh, so add sourcing of the functions file
after PROJECT_DIR is set; for example, after export PROJECT_DIR line source the
functions file that defines logger (e.g., source "${PROJECT_DIR}/functions.sh"
or the appropriate functions file in regression) so logger is available before
it's invoked and the script won't fail under set -e.

In `@regression/update_asv_database.py`:
- Around line 12-33: The function update_asv_db uses inputs like results_dir and
commitTime without validation (e.g., Path(results_dir) and commitTime * 1000),
so add an early required-arguments guard at the top of update_asv_db that checks
for presence/valid types of results_dir and either commitHash or commitTime (and
that commitTime is numeric if used); if validation fails, raise a clear
exception (ValueError) or return early with a clear message. Ensure the checks
reference the same parameter names (results_dir, commitTime, commitHash) so
callers get a deterministic error before any Path(...) or arithmetic is
attempted.
🧹 Nitpick comments (24)
cpp/src/dual_simplex/folding.cpp (1)

212-220: Consider propagating errors instead of using exit(1) in helper functions.

The new coloring_status_t pattern improves error handling in color_graph, but helper functions like find_colors_to_split (line 216) and split_colors (lines 262, 286, 328) still use exit(1) for error conditions. This can terminate the entire process unexpectedly in a library context.

Consider refactoring these helpers in a follow-up to return error status that can be propagated up through color_graph, ensuring consistent error handling throughout the folding module. As per coding guidelines, error propagation should be complete to user-facing APIs.

regression/lp_regression_test.sh (1)

5-10: Add set -u for unbound variable safety.

🔧 Suggested change
 # Abort script on first error
 set -e
+set -u

Based on learnings: In this repository, prefer using 'set -u' in Bash scripts to detect unbound variables and rely on the default unbound-variable error messages rather than implementing explicit guards with custom error messages. This approach keeps scripts concise and leverages Bash's built-in error handling.

regression/routing_regression_test.sh (1)

5-10: Add set -u for unbound variable safety.

🔧 Suggested change
 # Abort script on first error
 set -e
+set -u

Based on learnings: In this repository, prefer using 'set -u' in Bash scripts to detect unbound variables and rely on the default unbound-variable error messages rather than implementing explicit guards with custom error messages. This approach keeps scripts concise and leverages Bash's built-in error handling.

regression/mip_regression_test.sh (1)

5-10: Add set -u for unbound variable safety.

🔧 Suggested change
 set -e
+set -u

Based on learnings: In this repository, prefer using 'set -u' in Bash scripts to detect unbound variables and rely on the default unbound-variable error messages rather than implementing explicit guards with custom error messages. This approach keeps scripts concise and leverages Bash's built-in error handling.

cpp/src/dual_simplex/types.hpp (1)

22-23: Replace the macro with a typed constexpr constant.

The CONCURRENT_HALT_RETURN macro is used exclusively as a sentinel return value (37+ usages across the codebase) and never in preprocessor directives. Converting to inline constexpr int is safe and improves type safety and scoping without any functional impact.

♻️ Suggested refactor
-#define CONCURRENT_HALT_RETURN -2
+inline constexpr int CONCURRENT_HALT_RETURN = -2;
cpp/src/utilities/timer.hpp (1)

58-85: Potential precision loss in epoch conversion.

The function converts microseconds since epoch to double for tv_sec (Line 81). Since us_since_epoch.count() returns a large integer (~1.7×10¹⁵ for current timestamps), and double has only ~15-16 significant digits, the integer division result stored in double may lose precision for the fractional part calculation.

Consider keeping intermediate values as integers:

♻️ Suggested improvement
-    double tv_sec  = us_since_epoch.count() / 1000000;
-    double tv_usec = us_since_epoch.count() % 1000000;
-
-    return tv_sec + 1e-6 * tv_usec;
+    auto total_us = us_since_epoch.count();
+    return static_cast<double>(total_us) / 1000000.0;
cpp/src/dual_simplex/bound_flipping_ratio_test.hpp (1)

17-21: Consider using constexpr instead of #define macros.

The symbolic constants improve readability. For better type safety and debugging, consider using constexpr:

♻️ Suggested improvement
-#define RATIO_TEST_NO_ENTERING_VARIABLE -1
-#define RATIO_TEST_CONCURRENT_LIMIT     CONCURRENT_HALT_RETURN  // -2
-#define RATIO_TEST_TIME_LIMIT           -3
-#define RATIO_TEST_NUMERICAL_ISSUES     -4
+inline constexpr int RATIO_TEST_NO_ENTERING_VARIABLE = -1;
+inline constexpr int RATIO_TEST_CONCURRENT_LIMIT     = CONCURRENT_HALT_RETURN;  // -2
+inline constexpr int RATIO_TEST_TIME_LIMIT           = -3;
+inline constexpr int RATIO_TEST_NUMERICAL_ISSUES     = -4;

Note: The static analysis error about initial_basis.hpp not found is a false positive—the analyzer lacks full project context.

cpp/tests/mip/CMakeLists.txt (1)

29-31: LGTM! The CUTS_TEST addition is correctly configured and the referenced cuts_test.cu file exists and validates numerical correctness with EXPECT_NEAR assertions on objective values.

Consider expanding test coverage in cuts_test.cu to include degenerate cases (infeasible, unbounded, empty problems) and free variables to align with broader optimization test best practices.

regression/benchmark_scripts/configs/test_name_config.json (1)

1-23: Replace placeholder names/details before real use.

Keeping "test_name" and "Add details about you test" may leak into reports; consider filling these with real identifiers.

regression/write-meta-data.sh (1)

5-6: Enable set -u for unbound-variable detection.
This script relies on env vars like PROJECT_DIR and METADATA_FILE; set -u makes missing vars fail fast.

♻️ Proposed change
-# Abort script on first error
-set -e
+# Abort script on first error
+set -eu
Based on learnings: In this repository, prefer using 'set -u' in Bash scripts to detect unbound variables and rely on the default unbound-variable error messages rather than implementing explicit guards with custom error messages.
regression/test-container.sh (1)

7-8: Consider set -u to catch missing env vars early.

♻️ Suggested change
-# Abort script on first error
-set -e
+# Abort script on first error
+set -eu
Based on learnings: In this repository, prefer using 'set -u' in Bash scripts to detect unbound variables and rely on the default unbound-variable error messages rather than implementing explicit guards with custom error messages.
regression/benchmark_scripts/configs/example_test_creation.py (1)

17-23: Avoid hard-coded absolute dataset paths in the example script.
The current path is developer-specific; consider an env/CLI parameter or a repo-relative default so others can run it without edits.

♻️ Proposed change
-from cuopt.routing import utils
-import json
+from cuopt.routing import utils
+import json
+import os
@@
-base_file_name = (
-    "/home/nfs/rgandham/git-repos/reopt/datasets/cvrptw/RC2_10_5.TXT"
-)
+base_file_name = os.environ.get(
+    "CUOPT_DATASET",
+    "datasets/cvrptw/RC2_10_5.TXT",
+)
cpp/src/dual_simplex/sparse_vector.cpp (1)

236-256: Consider reserving before clearing to avoid reallocation.

The current order reserve → clear is inefficient; clear() doesn't release capacity, but calling reserve after clear would be more idiomatic. Also, the two-pass approach (count then fill) could be replaced with a single pass if you reserve an upper bound.

♻️ Optional: Single-pass with upper-bound reserve
 template <typename i_t, typename f_t>
 void sparse_vector_t<i_t, f_t>::squeeze(sparse_vector_t<i_t, f_t>& y) const
 {
   y.n = n;
-
-  i_t nz       = 0;
   const i_t nx = x.size();
-  for (i_t k = 0; k < nx; k++) {
-    if (x[k] != 0.0) { nz++; }
-  }
-  y.i.reserve(nz);
-  y.x.reserve(nz);
   y.i.clear();
   y.x.clear();
+  y.i.reserve(nx);
+  y.x.reserve(nx);
   for (i_t k = 0; k < nx; k++) {
     if (x[k] != 0.0) {
       y.i.push_back(i[k]);
       y.x.push_back(x[k]);
     }
   }
+  y.i.shrink_to_fit();
+  y.x.shrink_to_fit();
 }
regression/config.sh (1)

1-7: Consider enabling strict mode for unbound variable detection.

Adding set -u would help catch undefined variables early. The # shellcheck disable=all disables useful linting; consider enabling shellcheck and fixing issues instead.

♻️ Proposed fix
 #!/bin/bash
-# shellcheck disable=all
 # SPDX-FileCopyrightText: Copyright (c) 2025-2026, NVIDIA CORPORATION. All rights reserved.
 # SPDX-License-Identifier: Apache-2.0
 
+set -u
 
-THIS_DIR=$(cd $(dirname ${BASH_SOURCE[0]}) && pwd)
+THIS_DIR=$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)

Based on learnings: "In this repository, prefer using 'set -u' in Bash scripts to detect unbound variables."

regression/save_benchmark_results.py (1)

35-51: Use required=True in argparse instead of manual validation.

This provides better error messages and follows argparse conventions. Also, consider guarding module-level execution with if __name__ == "__main__":.

♻️ Proposed refactor
-parser = argparse.ArgumentParser()
-parser.add_argument(
-    "-b", "--benchmark_path", help="Path to new sets of results"
-)
-parser.add_argument("-o", "--output_path", help="Path to save results")
-parser.add_argument("-c", "--commit_hash", help="Git commit hash for the run")
-
-# Read arguments from command line
-args = parser.parse_args()
-
-if args.benchmark_path and args.output_path and args.commit_hash:
-    create_update_benchamrk_db(
-        args.benchmark_path, args.output_path, args.commit_hash
-    )
-else:
-    raise ValueError(
-        "Missing mandatory options, please provide all the options"
-    )
+if __name__ == "__main__":
+    parser = argparse.ArgumentParser()
+    parser.add_argument(
+        "-b", "--benchmark_path", required=True, help="Path to new sets of results"
+    )
+    parser.add_argument("-o", "--output_path", required=True, help="Path to save results")
+    parser.add_argument("-c", "--commit_hash", required=True, help="Git commit hash for the run")
+
+    args = parser.parse_args()
+    create_update_benchmark_db(
+        args.benchmark_path, args.output_path, args.commit_hash
+    )
cpp/tests/mip/cuts_test.cu (1)

28-166: Consider adding edge case tests for cut robustness.

The current tests cover basic functionality. Per coding guidelines, consider adding tests for:

  • Infeasible problems (cuts should not change infeasibility)
  • Problems where cuts are insufficient (requires branching)
  • Empty or singleton problems

Would you like me to generate additional test cases for edge case handling?

regression/save_benchmarks.sh (1)

6-10: Add set -u and quote variable expansions.

The script uses set -e but not set -u. The positional parameter $1 on line 36 will fail silently if not provided. Also, quote the dirname expression.

♻️ Proposed fix
 # Abort script on first error
 set -e
+set -u
 
 # Must ensure PROJECT_DIR is exported first then load rapids-mg-tools env
-export PROJECT_DIR=${PROJECT_DIR:-$(cd "$(dirname ${BASH_SOURCE[0]})" && pwd)}
+export PROJECT_DIR=${PROJECT_DIR:-$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)}

Based on learnings: "In this repository, prefer using 'set -u' in Bash scripts to detect unbound variables."

regression/cronjob.sh (1)

7-7: Add set -u for unbound variable detection.

Based on learnings from this repository, scripts should use set -u to detect unbound variables and rely on Bash's default error messages. This helps catch typos and missing variable definitions early.

Suggested change
-set -e
+set -eu
regression/send-slack-report.sh (2)

7-7: Add set -u for unbound variable detection.

Consistent with repository conventions, add set -u to catch unbound variables. Several variables like WEBHOOK_URL, S3_FILE_PREFIX, and S3_URL_PREFIX are used without explicit validation.

Suggested change
-set -e
+set -eu

115-118: Consider validating WEBHOOK_URL before posting.

If WEBHOOK_URL is empty or unset, the curl command will fail. Adding a guard would provide a clearer error message.

Suggested improvement
+    if [[ -z "${WEBHOOK_URL:-}" ]]; then
+        logger "Error: WEBHOOK_URL is not set. Cannot post to Slack."
+        exit 1
+    fi
     echo "$(envsubst < ${PROJECT_DIR}/slack_msg.json)"
     curl -X POST \
          -H 'Content-type: application/json' \
          --data "$(envsubst < ${PROJECT_DIR}/slack_msg.json)" \
          ${WEBHOOK_URL}
regression/create-html-reports.sh (1)

1-9: Add set -u to catch unset variables early.
This script relies on many exported variables; enabling set -u surfaces missing exports immediately and aligns with repo Bash conventions.

🔧 Suggested change
#!/bin/bash
+set -u
# shellcheck disable=SC1090
Based on learnings: In this repository, prefer using 'set -u' in Bash scripts to detect unbound variables and rely on the default unbound-variable error messages rather than implementing explicit guards with custom error messages.
regression/benchmark_scripts/utils.py (1)

55-67: Prefer os.path.join for path assembly.
This avoids edge cases with trailing slashes and keeps path handling consistent.

♻️ Suggested change
-        with open(data_file_path + "/" + d_type + "_config.json") as f:
+        with open(os.path.join(data_file_path, f"{d_type}_config.json")) as f:
@@
-        data_model, solver_settings = build_routing_datamodel_from_json(
-            data_file_path + "/" + data["file_name"]
-        )
+        data_model, solver_settings = build_routing_datamodel_from_json(
+            os.path.join(data_file_path, data["file_name"])
+        )
cpp/src/dual_simplex/phase2.cpp (1)

1982-2005: Use solver tolerances instead of hard-coded 1e-10

A fixed 1e-10 can be too strict or too loose for scaled problems. Prefer a settings-derived tolerance (e.g., settings.zero_tol or settings.dual_tol) to keep behavior consistent.

🔧 Example adjustment
-  f_t tol     = 1e-10;
+  f_t tol     = settings.zero_tol;
As per coding guidelines: Check numerical stability: prevent overflow/underflow, precision loss, division by zero/near-zero, and use epsilon comparisons for floating-point equality checks.
cpp/src/dual_simplex/cuts.cpp (1)

193-195: drop_cuts() is a stub—either implement or guard usage.

If this is called, the pool won’t be pruned and can grow without bound. Consider implementing it or adding a clear assertion/log to prevent accidental use.

If you want, I can help draft a pruning policy and implementation.

Comment on lines 105 to 116
if (!bounds_changed.empty()) {
std::fill(constraint_changed.begin(), constraint_changed.end(), false);
for (i_t i = 0; i < n; ++i) {
if (bounds_changed[i]) {
const i_t row_start = A.col_start[i];
const i_t row_end = A.col_start[i + 1];
for (i_t p = row_start; p < row_end; ++p) {
const i_t j = A.i[p];
constraint_changed[j] = true;
for (i_t j = 0; j < n; ++j) {
if (bounds_changed[j]) {
const i_t col_start = A.col_start[j];
const i_t col_end = A.col_start[j + 1];
for (i_t p = col_start; p < col_end; ++p) {
const i_t i = A.i[p];
constraint_changed[i] = true;
}
}
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Validate bounds_changed size before indexing

If bounds_changed is non-empty but not n long, the loop reads out of bounds and corrupts memory. Add a size check (or assert) before indexing.

🛡️ Suggested guard
   if (!bounds_changed.empty()) {
+    if (bounds_changed.size() != static_cast<size_t>(n)) {
+      settings.log.printf("bounds_changed size mismatch: %zu vs %d\n",
+                          bounds_changed.size(),
+                          n);
+      return false;
+    }
     std::fill(constraint_changed.begin(), constraint_changed.end(), false);
     for (i_t j = 0; j < n; ++j) {
As per coding guidelines: Validate correct initialization of variable bounds, constraint coefficients, and algorithm state before solving; ensure reset when transitioning between algorithm phases (presolve, simplex, diving, crossover).
🤖 Prompt for AI Agents
In `@cpp/src/dual_simplex/bounds_strengthening.cpp` around lines 105 - 116, The
loop indexes bounds_changed by j assuming it has length n; add validation before
the loop to avoid out-of-bounds access by checking that bounds_changed.size() ==
static_cast<size_t>(n) (or bounds_changed.size() >= n) and handle the mismatch
(e.g., assert, log+return/error, or resize/initialize bounds_changed) so the
subsequent loop over j and access to bounds_changed[j] is safe; apply this check
near the start of the block that references bounds_changed (the code surrounding
the for (i_t j = 0; j < n; ++j) loop that also touches A.col_start and
constraint_changed) and ensure constraint_changed is still reset only when the
size check passes.

Comment on lines +8 to +9
#include <dual_simplex/basis_solves.hpp>
#include <dual_simplex/cuts.hpp>
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

# First, locate the header file
fd -a "basis_solves.hpp"

Repository: NVIDIA/cuopt

Length of output: 112


🏁 Script executed:

# Check the directory structure of dual_simplex
find . -type d -name "dual_simplex" | head -20

Repository: NVIDIA/cuopt

Length of output: 103


🏁 Script executed:

# List files in dual_simplex directories
find . -path "*/dual_simplex/*" -type f | grep -E "\.(hpp|h|cpp|cu|cuh)$" | head -30

Repository: NVIDIA/cuopt

Length of output: 1299


🏁 Script executed:

# Check the actual includes in cuts.cpp
head -20 cpp/src/dual_simplex/cuts.cpp

Repository: NVIDIA/cuopt

Length of output: 758


🏁 Script executed:

# Search for basis_solves in the codebase
rg "basis_solves" --type-list | head -5
rg "basis_solves" -l

Repository: NVIDIA/cuopt

Length of output: 809


🏁 Script executed:

# Get file size to determine how to read it
wc -l cpp/src/dual_simplex/cuts.cpp

Repository: NVIDIA/cuopt

Length of output: 90


🏁 Script executed:

# Read the full cuts.cpp file
cat -n cpp/src/dual_simplex/cuts.cpp

Repository: NVIDIA/cuopt

Length of output: 50369


🏁 Script executed:

# Check if drop_cuts() is called anywhere
rg "drop_cuts" --type cpp --type hpp

Repository: NVIDIA/cuopt

Length of output: 83


🏁 Script executed:

# Check if drop_cuts() is called anywhere
rg "drop_cuts" -t c++

Repository: NVIDIA/cuopt

Length of output: 83


🏁 Script executed:

# Check if drop_cuts() is called anywhere
rg "drop_cuts"

Repository: NVIDIA/cuopt

Length of output: 174


🏁 Script executed:

# Check if drop_cuts() is actually invoked/called
rg "drop_cuts\(\)"

Repository: NVIDIA/cuopt

Length of output: 174


Guard against division by zero in cut scoring functions.

Lines 61 and 94 divide by norms that can be zero or near-zero:

  • cut_distance() divides by cut_norm = std::sqrt(dot) without checking if the norm is zero
  • cut_orthogonality() divides by norm_i * norm_j without checking for zero norms

Add epsilon checks to prevent NaN/Inf values from propagating into cut selection:

Example guard for cut_distance()
const f_t distance = (cut_norm > epsilon) ? (cut_violation / cut_norm) : 0.0;
🧰 Tools
🪛 Clang (14.0.6)

[error] 8-8: 'dual_simplex/basis_solves.hpp' file not found

(clang-diagnostic-error)

🤖 Prompt for AI Agents
In `@cpp/src/dual_simplex/cuts.cpp` around lines 8 - 9, The cut scoring functions
cut_distance() and cut_orthogonality() currently divide by norms that can be
zero; add a small epsilon check before dividing to avoid NaN/Inf propagation. In
cut_distance(), compute cut_norm as before but only divide cut_violation by
cut_norm when cut_norm > epsilon (otherwise set distance to 0.0); in
cut_orthogonality(), check both norm_i and norm_j against epsilon and only
compute the division when (norm_i > epsilon && norm_j > epsilon) (otherwise set
orthogonality to 0.0). Use a consistent small epsilon (e.g.
std::numeric_limits<f_t>::epsilon() or a tiny constant) and ensure the functions
return a well-defined 0.0 when norms are too small.

Comment on lines +44 to +95
f_t cut_pool_t<i_t, f_t>::cut_distance(i_t row,
const std::vector<f_t>& x,
f_t& cut_violation,
f_t& cut_norm)
{
const i_t row_start = cut_storage_.row_start[row];
const i_t row_end = cut_storage_.row_start[row + 1];
f_t cut_x = 0.0;
f_t dot = 0.0;
for (i_t p = row_start; p < row_end; p++) {
const i_t j = cut_storage_.j[p];
const f_t cut_coeff = cut_storage_.x[p];
cut_x += cut_coeff * x[j];
dot += cut_coeff * cut_coeff;
}
cut_violation = rhs_storage_[row] - cut_x;
cut_norm = std::sqrt(dot);
const f_t distance = cut_violation / cut_norm;
return distance;
}

template <typename i_t, typename f_t>
f_t cut_pool_t<i_t, f_t>::cut_density(i_t row)
{
const i_t row_start = cut_storage_.row_start[row];
const i_t row_end = cut_storage_.row_start[row + 1];
const i_t cut_nz = row_end - row_start;
const i_t original_vars = original_vars_;
return static_cast<f_t>(cut_nz) / original_vars;
}

template <typename i_t, typename f_t>
f_t cut_pool_t<i_t, f_t>::cut_orthogonality(i_t i, i_t j)
{
const i_t i_start = cut_storage_.row_start[i];
const i_t i_end = cut_storage_.row_start[i + 1];
const i_t i_nz = i_end - i_start;
const i_t j_start = cut_storage_.row_start[j];
const i_t j_end = cut_storage_.row_start[j + 1];
const i_t j_nz = j_end - j_start;

f_t dot = sparse_dot(cut_storage_.j.data() + i_start,
cut_storage_.x.data() + i_start,
i_nz,
cut_storage_.j.data() + j_start,
cut_storage_.x.data() + j_start,
j_nz);

f_t norm_i = cut_norms_[i];
f_t norm_j = cut_norms_[j];
return 1.0 - std::abs(dot) / (norm_i * norm_j);
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Guard cut-norm usage to prevent divide-by-zero/NaNs.

cut_distance and cut_orthogonality divide by a norm that can be zero or near-zero, producing NaN/Inf and destabilizing cut scoring/selection. Add an epsilon guard to return a neutral score when norms are too small.

🛡️ Proposed fix
 f_t cut_pool_t<i_t, f_t>::cut_distance(i_t row,
                                        const std::vector<f_t>& x,
                                        f_t& cut_violation,
                                        f_t& cut_norm)
 {
@@
   cut_violation      = rhs_storage_[row] - cut_x;
   cut_norm           = std::sqrt(dot);
-  const f_t distance = cut_violation / cut_norm;
-  return distance;
+  const f_t eps = 1e-12;
+  if (cut_norm <= eps) { return 0.0; }
+  const f_t distance = cut_violation / cut_norm;
+  return distance;
 }
@@
   f_t norm_i = cut_norms_[i];
   f_t norm_j = cut_norms_[j];
-  return 1.0 - std::abs(dot) / (norm_i * norm_j);
+  const f_t eps   = 1e-12;
+  const f_t denom = norm_i * norm_j;
+  if (denom <= eps) { return 0.0; }
+  return 1.0 - std::abs(dot) / denom;
 }

Based on learnings: Guard numerical stability by guarding near-zero norms in cut-related computations (e.g., cut_orthogonality) to avoid instability.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
f_t cut_pool_t<i_t, f_t>::cut_distance(i_t row,
const std::vector<f_t>& x,
f_t& cut_violation,
f_t& cut_norm)
{
const i_t row_start = cut_storage_.row_start[row];
const i_t row_end = cut_storage_.row_start[row + 1];
f_t cut_x = 0.0;
f_t dot = 0.0;
for (i_t p = row_start; p < row_end; p++) {
const i_t j = cut_storage_.j[p];
const f_t cut_coeff = cut_storage_.x[p];
cut_x += cut_coeff * x[j];
dot += cut_coeff * cut_coeff;
}
cut_violation = rhs_storage_[row] - cut_x;
cut_norm = std::sqrt(dot);
const f_t distance = cut_violation / cut_norm;
return distance;
}
template <typename i_t, typename f_t>
f_t cut_pool_t<i_t, f_t>::cut_density(i_t row)
{
const i_t row_start = cut_storage_.row_start[row];
const i_t row_end = cut_storage_.row_start[row + 1];
const i_t cut_nz = row_end - row_start;
const i_t original_vars = original_vars_;
return static_cast<f_t>(cut_nz) / original_vars;
}
template <typename i_t, typename f_t>
f_t cut_pool_t<i_t, f_t>::cut_orthogonality(i_t i, i_t j)
{
const i_t i_start = cut_storage_.row_start[i];
const i_t i_end = cut_storage_.row_start[i + 1];
const i_t i_nz = i_end - i_start;
const i_t j_start = cut_storage_.row_start[j];
const i_t j_end = cut_storage_.row_start[j + 1];
const i_t j_nz = j_end - j_start;
f_t dot = sparse_dot(cut_storage_.j.data() + i_start,
cut_storage_.x.data() + i_start,
i_nz,
cut_storage_.j.data() + j_start,
cut_storage_.x.data() + j_start,
j_nz);
f_t norm_i = cut_norms_[i];
f_t norm_j = cut_norms_[j];
return 1.0 - std::abs(dot) / (norm_i * norm_j);
}
f_t cut_pool_t<i_t, f_t>::cut_distance(i_t row,
const std::vector<f_t>& x,
f_t& cut_violation,
f_t& cut_norm)
{
const i_t row_start = cut_storage_.row_start[row];
const i_t row_end = cut_storage_.row_start[row + 1];
f_t cut_x = 0.0;
f_t dot = 0.0;
for (i_t p = row_start; p < row_end; p++) {
const i_t j = cut_storage_.j[p];
const f_t cut_coeff = cut_storage_.x[p];
cut_x += cut_coeff * x[j];
dot += cut_coeff * cut_coeff;
}
cut_violation = rhs_storage_[row] - cut_x;
cut_norm = std::sqrt(dot);
const f_t eps = 1e-12;
if (cut_norm <= eps) { return 0.0; }
const f_t distance = cut_violation / cut_norm;
return distance;
}
template <typename i_t, typename f_t>
f_t cut_pool_t<i_t, f_t>::cut_density(i_t row)
{
const i_t row_start = cut_storage_.row_start[row];
const i_t row_end = cut_storage_.row_start[row + 1];
const i_t cut_nz = row_end - row_start;
const i_t original_vars = original_vars_;
return static_cast<f_t>(cut_nz) / original_vars;
}
template <typename i_t, typename f_t>
f_t cut_pool_t<i_t, f_t>::cut_orthogonality(i_t i, i_t j)
{
const i_t i_start = cut_storage_.row_start[i];
const i_t i_end = cut_storage_.row_start[i + 1];
const i_t i_nz = i_end - i_start;
const i_t j_start = cut_storage_.row_start[j];
const i_t j_end = cut_storage_.row_start[j + 1];
const i_t j_nz = j_end - j_start;
f_t dot = sparse_dot(cut_storage_.j.data() + i_start,
cut_storage_.x.data() + i_start,
i_nz,
cut_storage_.j.data() + j_start,
cut_storage_.x.data() + j_start,
j_nz);
f_t norm_i = cut_norms_[i];
f_t norm_j = cut_norms_[j];
const f_t eps = 1e-12;
const f_t denom = norm_i * norm_j;
if (denom <= eps) { return 0.0; }
return 1.0 - std::abs(dot) / denom;
}
🤖 Prompt for AI Agents
In `@cpp/src/dual_simplex/cuts.cpp` around lines 44 - 95, Guard against near-zero
norms in cut_distance and cut_orthogonality: compute an epsilon (e.g.,
std::numeric_limits<f_t>::epsilon() * max(1.0, cut_norm) or a small constant
like 1e-12), then if cut_norm (used in cut_distance and stored in cut_norms_) or
either norm_i/norm_j is below that epsilon, set cut_violation and cut_norm
consistently and return a neutral score (for cut_distance return 0.0 and for
cut_orthogonality return 1.0) instead of performing the division; reference the
functions cut_pool_t::cut_distance, cut_pool_t::cut_orthogonality and the field
cut_norms_ when adding these guards.

Comment on lines +237 to +333
def worker(
gpu_id,
dataset_file_path,
csv_path,
git_commit,
log_path,
test_status_file,
n_gpus,
d_type="routing",
):
import os

os.environ["CUDA_VISIBLE_DEVICES"] = gpu_id

from utils import get_configuration

data_files = []
if d_type == "lp" or d_type == "mip":
data_files = glob.glob(dataset_file_path + "/*.mps")
else:
data_files = glob.glob(dataset_file_path + "/*_config.json")
idx = int(gpu_id)
n_files = 1 # len(data_files)

while idx < n_files:
mr, stats_mr = reinitialize_rmm()

data_file = data_files[idx]
test_name = str(data_file)
status = "FAILED"
try:
test_name, data_model, solver_settings, requested_metrics = (
get_configuration(data_file, dataset_file_path, d_type)
)
log.basicConfig(
level=log.INFO,
filename=log_path + "/" + test_name + "_log.txt",
filemode="a+",
format="%(asctime)-15s %(levelname)-8s %(message)s",
)
log.getLogger().setLevel(log.INFO)
log.info(
f"------------- Test Start : {test_name} gpu id : {gpu_id} -------------------"
)
status = run_benchmark(
test_name,
data_model,
solver_settings,
requested_metrics,
csv_path,
git_commit,
test_status_file,
d_type,
)
except Exception as e:
log.error(str(e))

with open(test_status_file, "a") as f:
f.write("\n")
f.write(test_name + ": " + status)

# Delete instance of rmm
del mr
del stats_mr

log.info(
f"------------- Test End : {test_name} gpu id : {gpu_id} -------------------"
)
idx = idx + n_gpus


def run(
dataset_file_path,
csv_path,
git_commit,
log_path,
test_status_file,
n_gpus,
d_type,
):
# Restricting n_gpus to one to avoid resource sharing
# n_gpus = 1
procs = []
for gpu_id in range(int(n_gpus)):
p = Process(
target=worker,
args=(
str(gpu_id),
dataset_file_path,
csv_path,
git_commit,
log_path,
test_status_file,
int(n_gpus),
d_type,
),
)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

# First, locate and read the benchmark file
fd -p "benchmark\.py" regression/

Repository: NVIDIA/cuopt

Length of output: 97


🏁 Script executed:

# Get file info and read the relevant section
wc -l regression/benchmark_scripts/benchmark.py

Repository: NVIDIA/cuopt

Length of output: 101


🏁 Script executed:

# Read the worker and run functions with surrounding context
sed -n '230,350p' regression/benchmark_scripts/benchmark.py

Repository: NVIDIA/cuopt

Length of output: 3135


🏁 Script executed:

# Search for any existing synchronization imports or Lock usage in the file
rg -n "(Lock|Queue|Manager|Semaphore)" regression/benchmark_scripts/benchmark.py

Repository: NVIDIA/cuopt

Length of output: 38


🏁 Script executed:

# Check the entire file to see if test_status_file is used elsewhere
rg -n "test_status_file" regression/benchmark_scripts/benchmark.py

Repository: NVIDIA/cuopt

Length of output: 295


🏁 Script executed:

# Verify process creation and ensure multiple processes are spawned
sed -n '310,340p' regression/benchmark_scripts/benchmark.py

Repository: NVIDIA/cuopt

Length of output: 717


Guard test_status_file writes across processes

Multiple processes append to the same file without synchronization; lines can interleave or be lost. Use a shared lock (or per-process files + merge).

🔒 Suggested locking
-from multiprocessing import Process
+from multiprocessing import Process, Lock
@@
 def worker(
     gpu_id,
     dataset_file_path,
     csv_path,
     git_commit,
     log_path,
     test_status_file,
+    status_lock,
     n_gpus,
     d_type="routing",
 ):
@@
-        with open(test_status_file, "a") as f:
-            f.write("\n")
-            f.write(test_name + ": " + status)
+        with status_lock:
+            with open(test_status_file, "a") as f:
+                f.write(f"\n{test_name}: {status}")
@@
 def run(
     dataset_file_path,
     csv_path,
     git_commit,
     log_path,
     test_status_file,
     n_gpus,
     d_type,
 ):
@@
-    procs = []
+    procs = []
+    status_lock = Lock()
     for gpu_id in range(int(n_gpus)):
         p = Process(
             target=worker,
             args=(
                 str(gpu_id),
                 dataset_file_path,
                 csv_path,
                 git_commit,
                 log_path,
                 test_status_file,
+                status_lock,
                 int(n_gpus),
                 d_type,
             ),
         )
🧰 Tools
🪛 Ruff (0.14.14)

[warning] 291-291: Do not catch blind exception: Exception

(BLE001)


[warning] 292-292: Use logging.exception instead of logging.error

Replace with exception

(TRY400)

🤖 Prompt for AI Agents
In `@regression/benchmark_scripts/benchmark.py` around lines 237 - 333, The worker
function currently has multiple processes appending to test_status_file without
synchronization (race/interleaving); fix by creating a multiprocessing.Lock (or
Manager().Lock) in run, pass that lock into worker (add lock to worker signature
and Process args), and in worker surround the append to test_status_file with
lock acquisition (use lock.acquire()/release or a context manager) so only one
process writes at a time; alternatively implement per-process temp status files
(named by gpu_id) and merge them into test_status_file in run after all
processes join.

Comment on lines +253 to +260
data_files = []
if d_type == "lp" or d_type == "mip":
data_files = glob.glob(dataset_file_path + "/*.mps")
else:
data_files = glob.glob(dataset_file_path + "/*_config.json")
idx = int(gpu_id)
n_files = 1 # len(data_files)

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Process all dataset files (n_files is hard-coded to 1)

This limits each worker to a single file and can raise IndexError when data_files is empty. Use len(data_files) and short‑circuit on empty input.

✅ Proposed fix
     if d_type == "lp" or d_type == "mip":
         data_files = glob.glob(dataset_file_path + "/*.mps")
     else:
         data_files = glob.glob(dataset_file_path + "/*_config.json")
     idx = int(gpu_id)
-    n_files = 1  # len(data_files)
+    n_files = len(data_files)
+    if n_files == 0:
+        log.warning("No dataset files found under %s", dataset_file_path)
+        return
🤖 Prompt for AI Agents
In `@regression/benchmark_scripts/benchmark.py` around lines 253 - 260, The code
currently hard-codes n_files = 1 which prevents iterating all discovered files
and can cause IndexError when data_files is empty; change n_files to use the
actual number of discovered files (n_files = len(data_files)) and short‑circuit
when len(data_files) == 0 (e.g., return/continue or raise a clear error) before
indexing into data_files; update the logic around variables data_files, d_type,
dataset_file_path, gpu_id/idx to rely on n_files = len(data_files) so workers
process all files and avoid out‑of‑range access.

import argparse


def create_update_benchamrk_db(benchmark_path, output_path, commit_hash):
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Typo in function name: benchamrkbenchmark.

✏️ Proposed fix
-def create_update_benchamrk_db(benchmark_path, output_path, commit_hash):
+def create_update_benchmark_db(benchmark_path, output_path, commit_hash):
🤖 Prompt for AI Agents
In `@regression/save_benchmark_results.py` at line 10, The function name
create_update_benchamrk_db contains a typo; rename it to
create_update_benchmark_db and update every reference (calls, imports, tests,
docs) to use the corrected name so the signature (benchmark_path, output_path,
commit_hash) remains the same; ensure any exported symbols or __all__ entries
are updated accordingly and run tests to confirm no remaining references to
create_update_benchamrk_db.

Comment on lines +21 to +32
for index, rows in data.iterrows():
out_file = index.split(".")[0] + ".csv"
out_file_path = out_path + "/" + out_file

if os.path.exists(out_file_path):
data = pd.read_csv(out_file_path)
data = pd.concat(
[data, rows.to_frame().T], ignore_index=True
)
data.to_csv(out_file_path, index=False)
else:
rows.to_frame().T.to_csv(out_file_path, index=False)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Variable data is shadowed inside the loop, causing logic error.

The outer data (line 19) is overwritten when reading the existing file (line 26), which means subsequent rows from the original CSV will be lost. Each row iteration should use a distinct variable name.

🐛 Proposed fix
             for index, rows in data.iterrows():
                 out_file = index.split(".")[0] + ".csv"
                 out_file_path = out_path + "/" + out_file
 
                 if os.path.exists(out_file_path):
-                    data = pd.read_csv(out_file_path)
-                    data = pd.concat(
-                        [data, rows.to_frame().T], ignore_index=True
+                    existing_data = pd.read_csv(out_file_path)
+                    updated_data = pd.concat(
+                        [existing_data, rows.to_frame().T], ignore_index=True
                     )
-                    data.to_csv(out_file_path, index=False)
+                    updated_data.to_csv(out_file_path, index=False)
                 else:
                     rows.to_frame().T.to_csv(out_file_path, index=False)
🤖 Prompt for AI Agents
In `@regression/save_benchmark_results.py` around lines 21 - 32, The loop shadows
the outer DataFrame named data by assigning data = pd.read_csv(...) inside the
for index, rows in data.iterrows() loop; rename the inner variable (e.g.,
existing or existing_data) and update subsequent uses (pd.concat and to_csv) to
reference that new name so the outer data and the loop variable rows remain
unchanged and all rows are processed correctly.

Comment on lines +36 to +38
python ${CUOPT_SCRIPTS_DIR}/save_benchmark_results.py -b ${RESULTS_DIR} -o ${CUOPT_SCRIPTS_DIR} -c $1

cd ${CUOPT_SCRIPTS_DIR}; git add benchmarks/*; git commit -m "update benchmarks"; git push; cd -
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Git push in CI without error handling or validation.

The chained git commands could fail partially (e.g., nothing to commit, push rejected). Consider:

  1. Validating that $1 is provided before use
  2. Handling the case where there are no changes to commit
  3. Using git push with appropriate retry or error handling
🛡️ Suggested improvement
-python ${CUOPT_SCRIPTS_DIR}/save_benchmark_results.py -b ${RESULTS_DIR} -o ${CUOPT_SCRIPTS_DIR} -c $1
+if [ -z "${1:-}" ]; then
+    echo "Error: commit hash argument required"
+    exit 1
+fi
+
+python "${CUOPT_SCRIPTS_DIR}/save_benchmark_results.py" -b "${RESULTS_DIR}" -o "${CUOPT_SCRIPTS_DIR}" -c "$1"
 
-cd ${CUOPT_SCRIPTS_DIR}; git add benchmarks/*; git commit -m "update benchmarks"; git push; cd -
+cd "${CUOPT_SCRIPTS_DIR}"
+git add benchmarks/* || true
+if ! git diff --cached --quiet; then
+    git commit -m "update benchmarks"
+    git push
+fi
+cd -
🤖 Prompt for AI Agents
In `@regression/save_benchmarks.sh` around lines 36 - 38, In save_benchmarks.sh
ensure the script validates the required argument ($1) before calling python
${CUOPT_SCRIPTS_DIR}/save_benchmark_results.py and aborts with a non‑zero exit
code if missing; after changing to ${CUOPT_SCRIPTS_DIR} check for
staged/unstaged changes (e.g., git status --porcelain) before running git
add/commit to avoid failing on "nothing to commit", only run git commit when
there are changes and handle git commit failure by exiting with an error, and
perform git push with basic retry logic and error handling so push failures are
detected and cause the script to fail instead of silently continuing. Use the
symbols save_benchmarks.sh, ${CUOPT_SCRIPTS_DIR}, ${RESULTS_DIR}, and
save_benchmark_results.py to locate where to add these checks.

Comment on lines +10 to +18
# Must ensure PROJECT_DIR is exported first then load rapids-mg-tools env
export PROJECT_DIR=${PROJECT_DIR:-$(cd "$(dirname ${BASH_SOURCE[0]})" && pwd)}
source ${PROJECT_DIR}/config.sh

################################################################################

# Test
logger "Testing container image $IMAGE"
python -c "import cuopt; print(cuopt)"
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

logger will be undefined unless functions.sh is sourced.
This script only sources config.sh but calls logger, which is defined in regression/functions.sh. With set -e, this will abort.

🐛 Proposed fix
 export PROJECT_DIR=${PROJECT_DIR:-$(cd "$(dirname ${BASH_SOURCE[0]})" && pwd)}
 source ${PROJECT_DIR}/config.sh
+source ${PROJECT_DIR}/functions.sh
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
# Must ensure PROJECT_DIR is exported first then load rapids-mg-tools env
export PROJECT_DIR=${PROJECT_DIR:-$(cd "$(dirname ${BASH_SOURCE[0]})" && pwd)}
source ${PROJECT_DIR}/config.sh
################################################################################
# Test
logger "Testing container image $IMAGE"
python -c "import cuopt; print(cuopt)"
# Must ensure PROJECT_DIR is exported first then load rapids-mg-tools env
export PROJECT_DIR=${PROJECT_DIR:-$(cd "$(dirname ${BASH_SOURCE[0]})" && pwd)}
source ${PROJECT_DIR}/config.sh
source ${PROJECT_DIR}/functions.sh
################################################################################
# Test
logger "Testing container image $IMAGE"
python -c "import cuopt; print(cuopt)"
🤖 Prompt for AI Agents
In `@regression/test-container.sh` around lines 10 - 18, The script calls logger
(the function defined in functions.sh) but only sources config.sh, so add
sourcing of the functions file after PROJECT_DIR is set; for example, after
export PROJECT_DIR line source the functions file that defines logger (e.g.,
source "${PROJECT_DIR}/functions.sh" or the appropriate functions file in
regression) so logger is available before it's invoked and the script won't fail
under set -e.

Comment on lines +12 to +33
def update_asv_db(
commitHash=None,
commitTime=None,
branch=None,
repo_url=None,
results_dir=None,
machine_name=None,
gpu_type=None,
configs=None,
):
"""
Read the benchmark_result* files in results_dir/benchmarks and
update an existing asv benchmark database or create one if one
does not exist in results_dir/benchmarks/asv. If no
benchmark_result* files are present, return without updating or
creating.
"""

# commitHash = commitHash + str(int(time.time()))
benchmark_dir_path = Path(results_dir) / "benchmarks" / "results" / "csvs"
asv_dir_path = Path(results_dir) / "benchmarks" / "results" / "asv"

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Validate required inputs before use.

update_asv_db exposes optional parameters but uses them unconditionally (e.g., Path(results_dir) and commitTime * 1000). A programmatic call with defaults will crash with unclear errors. Add an explicit required-args check early.

🛡️ Proposed validation guard
 def update_asv_db(
     commitHash=None,
     commitTime=None,
     branch=None,
     repo_url=None,
     results_dir=None,
     machine_name=None,
     gpu_type=None,
     configs=None,
 ):
@@
-    # commitHash = commitHash + str(int(time.time()))
+    required = {
+        "commitHash": commitHash,
+        "commitTime": commitTime,
+        "branch": branch,
+        "repo_url": repo_url,
+        "results_dir": results_dir,
+        "machine_name": machine_name,
+        "gpu_type": gpu_type,
+        "configs": configs,
+    }
+    missing = [k for k, v in required.items() if v in (None, "")]
+    if missing:
+        raise ValueError(f"Missing required args: {', '.join(missing)}")
+
+    # commitHash = commitHash + str(int(time.time()))
     benchmark_dir_path = Path(results_dir) / "benchmarks" / "results" / "csvs"
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
def update_asv_db(
commitHash=None,
commitTime=None,
branch=None,
repo_url=None,
results_dir=None,
machine_name=None,
gpu_type=None,
configs=None,
):
"""
Read the benchmark_result* files in results_dir/benchmarks and
update an existing asv benchmark database or create one if one
does not exist in results_dir/benchmarks/asv. If no
benchmark_result* files are present, return without updating or
creating.
"""
# commitHash = commitHash + str(int(time.time()))
benchmark_dir_path = Path(results_dir) / "benchmarks" / "results" / "csvs"
asv_dir_path = Path(results_dir) / "benchmarks" / "results" / "asv"
def update_asv_db(
commitHash=None,
commitTime=None,
branch=None,
repo_url=None,
results_dir=None,
machine_name=None,
gpu_type=None,
configs=None,
):
"""
Read the benchmark_result* files in results_dir/benchmarks and
update an existing asv benchmark database or create one if one
does not exist in results_dir/benchmarks/asv. If no
benchmark_result* files are present, return without updating or
creating.
"""
required = {
"commitHash": commitHash,
"commitTime": commitTime,
"branch": branch,
"repo_url": repo_url,
"results_dir": results_dir,
"machine_name": machine_name,
"gpu_type": gpu_type,
"configs": configs,
}
missing = [k for k, v in required.items() if v in (None, "")]
if missing:
raise ValueError(f"Missing required args: {', '.join(missing)}")
# commitHash = commitHash + str(int(time.time()))
benchmark_dir_path = Path(results_dir) / "benchmarks" / "results" / "csvs"
asv_dir_path = Path(results_dir) / "benchmarks" / "results" / "asv"
🤖 Prompt for AI Agents
In `@regression/update_asv_database.py` around lines 12 - 33, The function
update_asv_db uses inputs like results_dir and commitTime without validation
(e.g., Path(results_dir) and commitTime * 1000), so add an early
required-arguments guard at the top of update_asv_db that checks for
presence/valid types of results_dir and either commitHash or commitTime (and
that commitTime is numeric if used); if validation fails, raise a clear
exception (ValueError) or return early with a clear message. Ensure the checks
reference the same parameter names (results_dir, commitTime, commitHash) so
callers get a deterministic error before any Path(...) or arithmetic is
attempted.

rgsl888prabhu and others added 3 commits February 5, 2026 14:41
…uts make it infeasible (NVIDIA#831)

Fixes an issue on neos-4413714-turia where basis repair was called during dual push, and the slack needed was superbasic. 

Fixes an issue where cbs-cta was incorrectly classified as infeasible after cuts were added.

Authors:
  - Chris Maes (https://github.com/chris-maes)

Approvers:
  - Alice Boucher (https://github.com/aliceb-nv)

URL: NVIDIA#831
@rgsl888prabhu rgsl888prabhu merged commit a24b3c6 into NVIDIA:main Feb 6, 2026
30 of 31 checks passed
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
cpp/src/dual_simplex/branch_and_bound.cpp (1)

408-455: ⚠️ Potential issue | 🟠 Major

Recompute obj after re-crushing to avoid stale comparisons.

If original_lp_ changed since the first crush, crushed_solution is rebuilt but obj still reflects the old LP, so acceptance/repair checks can run against a stale objective.

🛠️ Suggested fix
     if (crushed_solution.size() != original_lp_.num_cols) {
       // original problem has been modified since the solution was crushed
       // we need to re-crush the solution
       crush_primal_solution<i_t, f_t>(
         original_problem_, original_lp_, solution, new_slacks_, crushed_solution);
+      obj = compute_objective(original_lp_, crushed_solution);
     }
cpp/src/dual_simplex/primal.cpp (1)

299-321: ⚠️ Potential issue | 🟠 Major

Guard against superbasic output from basis_repair in primal_phase2.

basis_repair can now populate superbasic_list. If that happens, nonbasic_list may end up smaller than n - m, but subsequent loops still index it as n - m, risking OOB access and incorrect pricing. Rebuild the lists after repair and either merge superbasics or fail fast.

🛠️ Suggested fix
     basis_repair(lp.A,
                  settings,
                  lp.lower,
                  lp.upper,
                  deficient,
                  slacks_needed,
                  basic_list,
                  nonbasic_list,
                  superbasic_list,
                  vstatus);
+    get_basis_from_vstatus(m, vstatus, basic_list, nonbasic_list, superbasic_list);
+    if (!superbasic_list.empty()) {
+      settings.log.printf("Superbasic variables are not supported in primal_phase2\n");
+      return primal::status_t::NUMERICAL;
+    }
     rank = factorize_basis(lp.A, settings, basic_list, L, U, p, pinv, q, deficient, slacks_needed);
As per coding guidelines: “Validate algorithm correctness in optimization logic: simplex pivots, branch-and-bound decisions, routing heuristics, and constraint/objective handling must produce correct results.”
🤖 Fix all issues with AI agents
In `@cpp/src/dual_simplex/basis_solves.cpp`:
- Around line 667-684: The branch that assigns vstatus for bad_j (inside the if
checking nonbasic_map[replace_j]) misses the case where lower[bad_j] ==
upper[bad_j] (fixed variable) and currently falls through to NONBASIC_LOWER;
update the condition order in that block to check for fixed bounds first and
assign variable_status_t::NONBASIC_FIXED when lower[bad_j] == upper[bad_j],
keeping the existing checks for NONBASIC_FREE, NONBASIC_LOWER and NONBASIC_UPPER
for the other cases; also replace the final assert(1 == 0) with assert(false &&
"unexpected bound configuration") to provide a clearer failure message.

Comment on lines +667 to 684
if (nonbasic_map[replace_j] != -1) {
nonbasic_list[nonbasic_map[replace_j]] = bad_j;
// This is the main issue. What value should bad_j take on.
if (lower[bad_j] == -inf && upper[bad_j] == inf) {
vstatus[bad_j] = variable_status_t::NONBASIC_FREE;
} else if (lower[bad_j] > -inf) {
vstatus[bad_j] = variable_status_t::NONBASIC_LOWER;
} else if (upper[bad_j] < inf) {
vstatus[bad_j] = variable_status_t::NONBASIC_UPPER;
} else {
assert(1 == 0);
}
} else if (superbasic_map[replace_j] != -1) {
superbasic_list[superbasic_map[replace_j]] = bad_j;
vstatus[bad_j] = variable_status_t::SUPERBASIC;
} else {
assert(1 == 0);
assert(nonbasic_map[replace_j] != -1 || superbasic_map[replace_j] != -1);
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

# Check the variable_status_t enum to see if NONBASIC_FIXED exists
rg -n 'variable_status_t|NONBASIC_FIXED' --type=cpp --type=hpp -A2 -B2 | head -80

Repository: NVIDIA/cuopt

Length of output: 83


🏁 Script executed:

# Examine the basis_solves.cpp file structure around lines 667-684
sed -n '650,700p' cpp/src/dual_simplex/basis_solves.cpp

Repository: NVIDIA/cuopt

Length of output: 1947


🏁 Script executed:

# Check how nonbasic_map and superbasic_map are used after line 684 in this function
rg -n 'nonbasic_map|superbasic_map' cpp/src/dual_simplex/basis_solves.cpp | tail -20

Repository: NVIDIA/cuopt

Length of output: 766


🏁 Script executed:

# Find the variable_status_t enum definition
rg -n 'enum.*variable_status_t|NONBASIC_FIXED' -t c -t h -C3

Repository: NVIDIA/cuopt

Length of output: 789


🏁 Script executed:

# Check the function signature and full context
sed -n '620,690p' cpp/src/dual_simplex/basis_solves.cpp

Repository: NVIDIA/cuopt

Length of output: 2643


🏁 Script executed:

# Find where this function is defined and used
rg -n 'fix_basis|deficient_basis' cpp/src/dual_simplex/basis_solves.cpp -A1 -B1 | head -30

Repository: NVIDIA/cuopt

Length of output: 38


🏁 Script executed:

# Check how NONBASIC_FIXED is used in the codebase
rg -n 'NONBASIC_FIXED' --type-list

Repository: NVIDIA/cuopt

Length of output: 445


🏁 Script executed:

# Look for uses of NONBASIC_FIXED to understand if it's important for correctness
rg -n 'NONBASIC_FIXED' -C2

Repository: NVIDIA/cuopt

Length of output: 11626


🏁 Script executed:

# Check if the bounds checking logic elsewhere handles NONBASIC_FIXED
rg -n 'variable_status_t::NONBASIC' -C2 | head -100

Repository: NVIDIA/cuopt

Length of output: 7606


Missing NONBASIC_FIXED status assignment for fixed variables (lower == upper).

The bound-checking logic (lines 670–678) omits the case where lower[bad_j] == upper[bad_j]. This matters: elsewhere in the dual simplex solver (primal.cpp:465, phase2.cpp:1999, crossover.cpp:460 and beyond), variables with equal bounds are explicitly marked NONBASIC_FIXED and handled differently from NONBASIC_LOWER or NONBASIC_UPPER. The current code assigns NONBASIC_LOWER to such variables, which is corrected later but indicates an initial basis inconsistency.

Add a check before the lower > -inf condition:

if (lower[bad_j] == -inf && upper[bad_j] == inf) {
  vstatus[bad_j] = variable_status_t::NONBASIC_FREE;
} else if (lower[bad_j] > -inf && lower[bad_j] == upper[bad_j]) {
  vstatus[bad_j] = variable_status_t::NONBASIC_FIXED;
} else if (lower[bad_j] > -inf) {
  vstatus[bad_j] = variable_status_t::NONBASIC_LOWER;
} else if (upper[bad_j] < inf) {
  vstatus[bad_j] = variable_status_t::NONBASIC_UPPER;
} else {
  assert(false && "unexpected bound configuration");
}

Also prefer assert(false) with a descriptive message over assert(1 == 0) (lines 677, 683).

🤖 Prompt for AI Agents
In `@cpp/src/dual_simplex/basis_solves.cpp` around lines 667 - 684, The branch
that assigns vstatus for bad_j (inside the if checking nonbasic_map[replace_j])
misses the case where lower[bad_j] == upper[bad_j] (fixed variable) and
currently falls through to NONBASIC_LOWER; update the condition order in that
block to check for fixed bounds first and assign
variable_status_t::NONBASIC_FIXED when lower[bad_j] == upper[bad_j], keeping the
existing checks for NONBASIC_FREE, NONBASIC_LOWER and NONBASIC_UPPER for the
other cases; also replace the final assert(1 == 0) with assert(false &&
"unexpected bound configuration") to provide a clearer failure message.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

improvement Improves an existing functionality non-breaking Introduces a non-breaking change

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants