Skip to content

Warm start barrier by taking PDLP solution, moving it to the interior, and doing PDHG on to find a point on the central path#955

Open
chris-maes wants to merge 3 commits intoNVIDIA:release/26.04from
chris-maes:pdhg_central_path
Open

Warm start barrier by taking PDLP solution, moving it to the interior, and doing PDHG on to find a point on the central path#955
chris-maes wants to merge 3 commits intoNVIDIA:release/26.04from
chris-maes:pdhg_central_path

Conversation

@chris-maes
Copy link
Contributor

Uses parts of Algorithm 1 from Rothberg's https://arxiv.org/pdf/2603.03150

@chris-maes chris-maes requested review from a team as code owners March 12, 2026 20:17
@copy-pr-bot
Copy link

copy-pr-bot bot commented Mar 12, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@coderabbitai
Copy link

coderabbitai bot commented Mar 12, 2026

📝 Walkthrough

Walkthrough

The PR introduces a central_path solver function for barrier-based optimization using PDHG (Primal-Dual Hybrid Gradient) methods. It adds precomputed PDHG state variables to the barrier solver, implements presolve handling for barrier workflow, extends utility functions for matrix norm estimation and random value generation, converts crossover source from C++ to CUDA, and refactors dual variable initialization in PDLP translation.

Changes

Cohort / File(s) Summary
Barrier solver PDHG variables
cpp/src/barrier/barrier.hpp, cpp/src/barrier/barrier.cu
Added five public PDHG state member variables (x_pdhg, w_pdhg, y_pdhg, z_pdhg, v_pdhg) to barrier_solver_t and conditional logic to override initial point data when PDHG variables are populated.
Central path solver function
cpp/src/dual_simplex/crossover.hpp, cpp/src/dual_simplex/crossover.cu
Declared and implemented new central_path template function with substantial logic for barrier-based solving via PDHG, including barrier presolve configuration, GPU-accelerated PDHG loop with convergence checks, and solution mapping back to barrier variables.
Presolve enhancements
cpp/src/dual_simplex/presolve.hpp, cpp/src/dual_simplex/presolve.cpp
Added negated_variables member to presolve_info_t, introduced barrier-presolve handling for variables with no lower bound, and implemented crush_solution_to_presolve_space to map solutions from problem space back to presolve space.
Utility functions
cpp/src/dual_simplex/random.hpp, cpp/src/dual_simplex/sparse_matrix.hpp, cpp/src/dual_simplex/sparse_matrix.cpp
Added random_value<T> template method for generic real-valued range generation and norm2_estimate method implementing iterative power-method-like estimation of L2-norm with randomization fallback.
Build and integration
cpp/src/dual_simplex/CMakeLists.txt, cpp/src/pdlp/solve.cu, cpp/src/pdlp/translate.hpp
Changed crossover source file from .cpp to .cu, replaced crossover call with central_path in PDLP crossover path, and refactored dual variable initialization to compute z via explicit dual feasibility expression instead of reduced cost copy.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~70 minutes

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The pull request title directly and clearly describes the main change: implementing a warm-start barrier approach using a PDLP solution, interior movement, and PDHG to find a central path point, which aligns with all the key changes in the changeset.
Description check ✅ Passed The pull request description references Algorithm 1 from Rothberg's research paper, which is directly related to the implementation of the PDHG-based central path approach introduced throughout the changeset.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
📝 Coding Plan
  • Generate coding plan for human review comments

Comment @coderabbitai help to get the list of available commands and usage tips.

Tip

You can customize the high-level summary generated by CodeRabbit.

Configure the reviews.high_level_summary_instructions setting to provide custom instructions for generating the high-level summary.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 9

🧹 Nitpick comments (2)
cpp/src/barrier/barrier.hpp (1)

79-84: Bundle the PDHG warm start into a single validated object.

These five vectors are one logical state, but exposing them independently makes partial initialization easy and forces initial_point() to infer validity from x_pdhg alone. A small warm-start struct plus setter/clearer would reduce the coupling between PDHG and barrier and give you one place to validate dimensions and interiority.

Based on learnings: Reduce tight coupling between solver components (presolve, simplex, basis, barrier); increase modularity and reusability of optimization algorithms.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cpp/src/barrier/barrier.hpp` around lines 79 - 84, Replace the five separate
PDHG warm-start vectors (x_pdhg, w_pdhg, y_pdhg, z_pdhg, v_pdhg) with a single
PDHGWarmStart struct that owns those vectors and provides validate(), set(const
PDHGWarmStart&), clear(), and isSet() methods; update initial_point() and any
code that referenced the individual vectors to accept/ask the PDHGWarmStart
object and call validate() (checking dimensions and interiority) before using
its contents; ensure the setter performs dimension checks against the problem
sizes and that clear() resets the isSet flag so partial initialization cannot
occur.
cpp/src/dual_simplex/crossover.cu (1)

1884-1906: Remove the per-iteration stream synchronizations from this hot path.

These kernels are already issued on the same stream, so the sync_stream() calls serialize every PDHG iteration and will dominate runtime long before the math does. Synchronize only where host-visible data is actually needed.

As per coding guidelines, "Eliminate unnecessary host-device synchronization (cudaDeviceSynchronize) in hot paths that blocks GPU pipeline; use streams and events for async execution".

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cpp/src/dual_simplex/crossover.cu` around lines 1884 - 1906, Remove the
per-iteration host synchronizations: delete the lp.handle_ptr->sync_stream()
calls surrounding the cusparse_view.transpose_spmv and the two
cub::DeviceTransform::Transform invocations (the sync before transpose_spmv and
the two syncs after the transforms). These kernels are launched on the same
stream so the sync_stream() calls serialize the hot path; rely on the stream for
ordering and only synchronize later at points where host-visible data is
actually required (or replace with stream/event-based sync there), leaving the
cusparse_view.transpose_spmv, both cub::DeviceTransform::Transform calls, and
the RAFT_CHECK_CUDA checks intact.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@cpp/src/barrier/barrier.cu`:
- Around line 1974-1980: Validate that all warm-start vectors x_pdhg, w_pdhg,
y_pdhg, z_pdhg, v_pdhg are present and exactly match the current presolved
problem dimensions (match data.x.size(), data.w.size(), data.y.size(),
data.z.size(), data.v.size() or the problem n/m counts) before assigning them to
data.x/data.w/data.y/data.z/data.v; if any size mismatches occur, skip the warm
start and leave existing data or reset to a safe default. After a successful
full-size overwrite, re-establish strict interiority by pushing any non‑positive
entries in data.x/data.z and data.w/data.v to a small positive tolerance (e.g.,
eps) so subsequent divisions in gpu_compute_search_direction() cannot divide by
zero or negative values. Ensure you reference and check the five symbols x_pdhg,
w_pdhg, y_pdhg, z_pdhg, v_pdhg and update data.x, data.w, data.y, data.z, data.v
atomically only when all validations pass.

In `@cpp/src/dual_simplex/crossover.cu`:
- Around line 1974-2013: The branch that handles barrier_status after
barrier_solver.solve currently returns based on barrier_status but never
copies/postsolves barrier_solution into the caller-facing solution nor
initializes vstatus, and it collapses TIME_LIMIT/CONCURRENT_LIMIT into
NUMERICAL_ISSUES; update the code after barrier_solver.solve to (1) when
barrier_status == OPTIMAL, postsolve/copy barrier_solution into solution and set
vstatus appropriately (initialize any variable-status arrays the caller
expects), and return crossover_status_t::OPTIMAL; (2) when barrier_status ==
TIME_LIMIT return the corresponding crossover_status_t TIME_LIMIT (preserving
the termination reason), when barrier_status == CONCURRENT_LIMIT return
CONCURRENT_LIMIT, and for any other non-OPTIMAL status return NUMERICAL_ISSUES;
use the existing symbols barrier_solution, barrier_solver.solve, barrier_status,
solution, and vstatus to locate where to perform the postsolve/copy and status
propagation.
- Around line 1865-1967: The PDHG main for-loop (for (i_t iter = 0; iter <
iter_max; iter++)) never checks time_limit or the concurrent_halt flag, so
long-running/cancelled solves keep using the GPU; add termination checks inside
the loop (e.g., at the top or immediately after the periodic sync/printf block)
that compute elapsed time via toc(start_time) and break if elapsed >=
time_limit, and also check a shared concurrent_halt (or similar cancellation
token) and break if set; update the same cleanup/raft::copy-to-host and
converged handling (x,y,xbar) when breaking due to timeout/cancel to ensure
consistent state.
- Around line 1817-1823: norm_c or barrier_lp.A.norm2_estimate can be zero
causing primal_weight, eta, tau or sigma to become non-finite; update the
initialization around norm_b, norm_c, primal_weight, eta, tau, sigma (and use
mu_target) to guard against zero/near-zero values by introducing a small epsilon
(e.g. eps = numeric_limits<f_t>::epsilon() or a configurable tiny constant),
clamp norm_b/norm_c and the norm2_estimate denominator to >= eps, compute
primal_weight = (norm_b > eps) ? norm_c / norm_b : 1.0, compute eta = 0.998 /
max(barrier_lp.A.norm2_estimate(1e-4), eps), then set tau = eta /
max(primal_weight, eps) and sigma = eta * max(primal_weight, eps); finally
ensure tau/sigma are finite (fallback to safe defaults if not) so PDHG never
starts with NaN/Inf.

In `@cpp/src/dual_simplex/presolve.cpp`:
- Around line 838-855: The presolve negation branch (when
settings.barrier_presolve && no_lower_bound > 0) flips variable bounds and A(:,
j) but fails to negate the linear objective and any quadratic/postsolve data;
update problem.objective[j] to -problem.objective[j] (not *= 1) and
record/negate any quadratic terms and postsolve mappings associated with column
j so that presolve_info.negated_variables and uncrush_solution can restore signs
correctly; ensure all places that consume negated_variables (e.g.,
uncrush_solution, postsolve reconstruction, and any quadratic Hessian or local
quadratic coefficients) apply the same sign flip.
- Around line 1155-1205: crush_solution_to_presolve_space currently hard-aborts
on unsupported presolve mappings (removed_constraints, removed_variables,
free_variable_pairs) and assumes removed_lower_bounds/negated_variables align
with original_x, which can kill the process or produce mis-sized outputs; change
the function signature to return a status (bool or enum) instead of void, remove
printf/exit, and return false when encountering unsupported transforms (preserve
existing behavior for negated_variables/removed_lower_bounds only if all index
checks pass); add bounds checks before indexing (verify
presolve_info.negated_variables[k] < crushed_x.size() and
removed_lower_bounds.size() == crushed_x.size() before subtracting) and document
that callers must short-circuit warm-starting when this function returns false.

In `@cpp/src/dual_simplex/random.hpp`:
- Around line 33-37: The template function random_value<T> should be constrained
to floating-point types and use a distribution parameterized by T: add a
static_assert(std::is_floating_point<T>::value, "...") in random_value to
prevent integral misuse, change the distribution type to
std::uniform_real_distribution<T> so the generated type matches T, and ensure
<type_traits> is included; update the call sites if any rely on implicit double
behavior.

In `@cpp/src/dual_simplex/sparse_matrix.cpp`:
- Around line 867-894: The loop currently divides by Sx_norm and norm_x using
exact-zero checks which can still divide by tiny values; guard these
normalizations with a relative tolerance instead. In the power-iteration section
(symbols: matrix_vector_multiply, matrix_transpose_vector_multiply,
vector_norm2, random_t and variables Sx_norm, norm_x, e, x) replace the
Sx_norm==0.0 and norm_x==0.0 checks with tests like Sx_norm <= max(tol,
std::numeric_limits<f_t>::epsilon())*reference_norm (or tol * e0) and similarly
for norm_x, and if the norm is below that threshold regenerate Sx (or
reinitialize x) and skip the division (or scale safely); ensure divisions x[j]
/= norm_x and e = norm_x / Sx_norm only occur when both norms exceed the chosen
tolerance to avoid inf/NaN contamination.

In `@cpp/src/pdlp/solve.cu`:
- Around line 623-624: The change replaces the classical crossover with
dual_simplex::central_path which returns an interior-point solution (via
barrier_solver.solve()) and does not produce a basis nor correctly-populated
vstatus; either revert to the original classical crossover when
settings.crossover is set or add an explicit crossover step after central_path
to convert the interior solution into a full basic solution and populate
vstatus. Specifically, restore or call the previous crossover routine (e.g.,
classical_crossover / simplex_crossover) instead of relying on central_path
alone, or after central_path invoke a crossover routine that takes
vertex_solution and fills vstatus and returns a proper
dual_simplex::crossover_status_t; also update the switch/handling of
dual_feasible/other statuses to map all possible central_path/crossover outputs
to existing error enums and add a short comment documenting why
central_path+explicit crossover is safe if you choose to keep it.

---

Nitpick comments:
In `@cpp/src/barrier/barrier.hpp`:
- Around line 79-84: Replace the five separate PDHG warm-start vectors (x_pdhg,
w_pdhg, y_pdhg, z_pdhg, v_pdhg) with a single PDHGWarmStart struct that owns
those vectors and provides validate(), set(const PDHGWarmStart&), clear(), and
isSet() methods; update initial_point() and any code that referenced the
individual vectors to accept/ask the PDHGWarmStart object and call validate()
(checking dimensions and interiority) before using its contents; ensure the
setter performs dimension checks against the problem sizes and that clear()
resets the isSet flag so partial initialization cannot occur.

In `@cpp/src/dual_simplex/crossover.cu`:
- Around line 1884-1906: Remove the per-iteration host synchronizations: delete
the lp.handle_ptr->sync_stream() calls surrounding the
cusparse_view.transpose_spmv and the two cub::DeviceTransform::Transform
invocations (the sync before transpose_spmv and the two syncs after the
transforms). These kernels are launched on the same stream so the sync_stream()
calls serialize the hot path; rely on the stream for ordering and only
synchronize later at points where host-visible data is actually required (or
replace with stream/event-based sync there), leaving the
cusparse_view.transpose_spmv, both cub::DeviceTransform::Transform calls, and
the RAFT_CHECK_CUDA checks intact.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 3c7ebf4a-64ad-424b-ba57-13c2def561c6

📥 Commits

Reviewing files that changed from the base of the PR and between d531ad1 and 317a7af.

📒 Files selected for processing (12)
  • cpp/src/barrier/barrier.cu
  • cpp/src/barrier/barrier.hpp
  • cpp/src/dual_simplex/CMakeLists.txt
  • cpp/src/dual_simplex/crossover.cu
  • cpp/src/dual_simplex/crossover.hpp
  • cpp/src/dual_simplex/presolve.cpp
  • cpp/src/dual_simplex/presolve.hpp
  • cpp/src/dual_simplex/random.hpp
  • cpp/src/dual_simplex/sparse_matrix.cpp
  • cpp/src/dual_simplex/sparse_matrix.hpp
  • cpp/src/pdlp/solve.cu
  • cpp/src/pdlp/translate.hpp

Comment on lines +1974 to +1980
if (x_pdhg.size() > 0) {
data.x = x_pdhg;
data.w = w_pdhg;
data.y = y_pdhg;
data.z = z_pdhg;
data.v = v_pdhg;
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Validate the PDHG warm start before overwriting the barrier point.

x_pdhg.size() > 0 is the only gate here, so a partially populated or mis-sized warm start can shrink data.w/y/z/v to the wrong dimensions. Also, because this overwrite happens after the positivity adjustments above, zeros or negatives can flow into the later z / x and v / w divisions in gpu_compute_search_direction(). Please validate all five vector sizes against the current presolved problem and re-establish strict interiority after the overwrite.

Suggested guard
-  if (x_pdhg.size() > 0) {
+  if (!x_pdhg.empty()) {
+    if (x_pdhg.size() != data.x.size() || w_pdhg.size() != data.w.size() ||
+        y_pdhg.size() != data.y.size() || z_pdhg.size() != data.z.size() ||
+        v_pdhg.size() != data.v.size()) {
+      settings.log.printf("Invalid PDHG warm-start dimensions\n");
+      return -1;
+    }
     data.x = x_pdhg;
     data.w = w_pdhg;
     data.y = y_pdhg;
     data.z = z_pdhg;
     data.v = v_pdhg;
+    data.w.ensure_positive(epsilon_adjust);
+    data.x.ensure_positive(epsilon_adjust);
+    data.v.ensure_positive(epsilon_adjust);
+    data.z.ensure_positive(epsilon_adjust);
   }

Based on learnings: Validate correct initialization of variable bounds, constraint coefficients, and algorithm state before solving; ensure reset when transitioning between algorithm phases, and ensure variables and constraints are accessed from the correct problem context (original vs presolve vs folded vs postsolve); verify index mapping consistency across problem transformations.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cpp/src/barrier/barrier.cu` around lines 1974 - 1980, Validate that all
warm-start vectors x_pdhg, w_pdhg, y_pdhg, z_pdhg, v_pdhg are present and
exactly match the current presolved problem dimensions (match data.x.size(),
data.w.size(), data.y.size(), data.z.size(), data.v.size() or the problem n/m
counts) before assigning them to data.x/data.w/data.y/data.z/data.v; if any size
mismatches occur, skip the warm start and leave existing data or reset to a safe
default. After a successful full-size overwrite, re-establish strict interiority
by pushing any non‑positive entries in data.x/data.z and data.w/data.v to a
small positive tolerance (e.g., eps) so subsequent divisions in
gpu_compute_search_direction() cannot divide by zero or negative values. Ensure
you reference and check the five symbols x_pdhg, w_pdhg, y_pdhg, z_pdhg, v_pdhg
and update data.x, data.w, data.y, data.z, data.v atomically only when all
validations pass.

Comment on lines +1817 to +1823
f_t norm_b = vector_norm2<i_t, f_t>(barrier_lp.rhs);
f_t norm_c = vector_norm2<i_t, f_t>(barrier_lp.objective);
f_t primal_weight = norm_b > 0.0 ? norm_c / norm_b : 1.0;
f_t eta = 0.998 / barrier_lp.A.norm2_estimate(1e-4);
f_t tau = eta / primal_weight;
f_t sigma = eta * primal_weight;
f_t mu = mu_target;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Guard PDHG step-size initialization against zero norms.

norm_c == 0 makes primal_weight zero, and a zero/empty A makes norm2_estimate() zero; either case sends tau or sigma non-finite before the first iteration. Zero-objective or fully-presolved LPs are valid inputs here.

🛡️ Suggested guard
-    f_t norm_b        = vector_norm2<i_t, f_t>(barrier_lp.rhs);
-    f_t norm_c        = vector_norm2<i_t, f_t>(barrier_lp.objective);
-    f_t primal_weight = norm_b > 0.0 ? norm_c / norm_b : 1.0;
-    f_t eta           = 0.998 / barrier_lp.A.norm2_estimate(1e-4);
+    f_t norm_b = vector_norm2<i_t, f_t>(barrier_lp.rhs);
+    f_t norm_c = vector_norm2<i_t, f_t>(barrier_lp.objective);
+    f_t primal_weight = (norm_b > 0.0 && norm_c > 0.0) ? norm_c / norm_b : 1.0;
+    f_t norm_A = barrier_lp.A.norm2_estimate(1e-4);
+    if (!(norm_A > 0.0)) { return crossover_status_t::NUMERICAL_ISSUES; }
+    f_t eta = 0.998 / norm_A;
As per coding guidelines, "Check numerical stability: prevent overflow/underflow, precision loss, division by zero/near-zero, and use epsilon comparisons for floating-point equality checks".
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cpp/src/dual_simplex/crossover.cu` around lines 1817 - 1823, norm_c or
barrier_lp.A.norm2_estimate can be zero causing primal_weight, eta, tau or sigma
to become non-finite; update the initialization around norm_b, norm_c,
primal_weight, eta, tau, sigma (and use mu_target) to guard against
zero/near-zero values by introducing a small epsilon (e.g. eps =
numeric_limits<f_t>::epsilon() or a configurable tiny constant), clamp
norm_b/norm_c and the norm2_estimate denominator to >= eps, compute
primal_weight = (norm_b > eps) ? norm_c / norm_b : 1.0, compute eta = 0.998 /
max(barrier_lp.A.norm2_estimate(1e-4), eps), then set tau = eta /
max(primal_weight, eps) and sigma = eta * max(primal_weight, eps); finally
ensure tau/sigma are finite (fallback to safe defaults if not) so PDHG never
starts with NaN/Inf.

Comment on lines +1865 to +1967
for (i_t iter = 0; iter < iter_max; iter++) {
// Dual update
// delta_y = -sigma * A * xbar + sigma * b = sigma * (b - A * xbar)
raft::copy(d_delta_y.data(), d_b.data(), m, lp.handle_ptr->get_stream());
cusparse_view.spmv(-1, d_xbar, 1, d_delta_y);
cub::DeviceTransform::Transform(
cuda::std::make_tuple(d_delta_y.data(), d_y.data()),
d_y.data(),
d_delta_y.size(),
[sigma] HD(f_t delta_y_i, f_t dy_i) -> f_t { return dy_i + delta_y_i * sigma; },
lp.handle_ptr->get_stream().value());
RAFT_CHECK_CUDA(lp.handle_ptr->get_stream());

raft::copy(d_xold.data(), d_x.data(), n, lp.handle_ptr->get_stream());

// xold = x
// Primal gradient step
// v = x + tau * A' * y

lp.handle_ptr->sync_stream();
cusparse_view.transpose_spmv(1.0, d_y, 0.0, d_v);
cub::DeviceTransform::Transform(
cuda::std::make_tuple(d_x.data(), d_v.data()),
d_v.data(),
d_v.size(),
[tau] HD(f_t x_i, f_t v_i) -> f_t { return x_i + tau * v_i; },
lp.handle_ptr->get_stream().value());
RAFT_CHECK_CUDA(lp.handle_ptr->get_stream());
lp.handle_ptr->sync_stream();

cub::DeviceTransform::Transform(
cuda::std::make_tuple(d_v.data(), d_c.data(), d_x.data(), d_xold.data()),
thrust::make_zip_iterator(d_x.data(), d_xbar.data()),
d_x.size(),
[tau, mu] HD(f_t v_j, f_t c_j, f_t x_j, f_t xold_j) -> thrust::tuple<f_t, f_t> {
f_t w_j = v_j - tau * c_j;
f_t new_x_j = (w_j + std::sqrt(w_j * w_j + 4.0 * tau * mu)) / 2.0;
return {new_x_j, 2.0 * new_x_j - xold_j};
},
lp.handle_ptr->get_stream().value());
RAFT_CHECK_CUDA(lp.handle_ptr->get_stream());
lp.handle_ptr->sync_stream();

if (iter % 1000 == 0) {
raft::copy(x.data(), d_x.data(), n, lp.handle_ptr->get_stream());
raft::copy(d_primal_residual.data(), d_b.data(), m, lp.handle_ptr->get_stream());
cusparse_view.spmv(1.0, d_x, -1.0, d_primal_residual);

cub::DeviceTransform::Transform(
cuda::std::make_tuple(d_xbar.data(), d_c.data()),
d_dual_residual.data(),
d_xbar.size(),
[mu_target] HD(f_t xbar_j, f_t c_j) -> f_t { return mu_target / xbar_j - c_j; },
lp.handle_ptr->get_stream().value());
RAFT_CHECK_CUDA(lp.handle_ptr->get_stream());
lp.handle_ptr->sync_stream();

cusparse_view.transpose_spmv(1.0, d_y, 1.0, d_dual_residual);

f_t vector_norm_2_primal =
std::sqrt(thrust::inner_product(rmm::exec_policy(lp.handle_ptr->get_stream()),
d_primal_residual.data(),
d_primal_residual.data() + m,
d_primal_residual.data(),
0.0));
f_t vector_norm_2_dual =
std::sqrt(thrust::inner_product(rmm::exec_policy(lp.handle_ptr->get_stream()),
d_dual_residual.data(),
d_dual_residual.data() + n,
d_dual_residual.data(),
0.0));

f_t primal_relative = vector_norm_2_primal / (1.0 + norm_b);
f_t dual_relative = vector_norm_2_dual / (1.0 + norm_c);

f_t primal_obj = thrust::inner_product(rmm::exec_policy(lp.handle_ptr->get_stream()),
d_c.data(),
d_c.data() + n,
d_xbar.data(),
0.0);
f_t dual_obj = thrust::inner_product(rmm::exec_policy(lp.handle_ptr->get_stream()),
d_b.data(),
d_b.data() + m,
d_y.data(),
0.0);
f_t time = toc(start_time);
printf("%8d %+12.6e %+12.6e %8.2e %8.2e %8.1fs\n",
iter,
primal_obj,
dual_obj,
primal_relative,
dual_relative,
time);

if (primal_relative < 1e-5 && dual_relative < 1e-5) {
converged = true;
raft::copy(x.data(), d_x.data(), n, lp.handle_ptr->get_stream());
raft::copy(y.data(), d_y.data(), m, lp.handle_ptr->get_stream());
raft::copy(xbar.data(), d_xbar.data(), n, lp.handle_ptr->get_stream());
break;
}
}
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Honor time_limit and concurrent_halt inside the PDHG loop.

This loop can run up to 1,000,000 iterations without checking either termination condition, so a timed-out or cancelled solve keeps burning GPU time until PDHG finishes or fails.

⏱️ Suggested termination check
     for (i_t iter = 0; iter < iter_max; iter++) {
+      if ((iter % 256) == 0) {
+        if (toc(start_time) > settings.time_limit) { return crossover_status_t::TIME_LIMIT; }
+        if (settings.concurrent_halt != nullptr && *settings.concurrent_halt == 1) {
+          return crossover_status_t::CONCURRENT_LIMIT;
+        }
+      }
       // Dual update
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cpp/src/dual_simplex/crossover.cu` around lines 1865 - 1967, The PDHG main
for-loop (for (i_t iter = 0; iter < iter_max; iter++)) never checks time_limit
or the concurrent_halt flag, so long-running/cancelled solves keep using the
GPU; add termination checks inside the loop (e.g., at the top or immediately
after the periodic sync/printf block) that compute elapsed time via
toc(start_time) and break if elapsed >= time_limit, and also check a shared
concurrent_halt (or similar cancellation token) and break if set; update the
same cleanup/raft::copy-to-host and converged handling (x,y,xbar) when breaking
due to timeout/cancel to ensure consistent state.

Comment on lines +1974 to +2013
if (converged) {
printf("PDHG found point on central path\n\n\n");
// Solve using barrier
lp_solution_t<i_t, f_t> barrier_solution(presolved_lp.num_rows, presolved_lp.num_cols);

barrier_solver_t<i_t, f_t> barrier_solver(presolved_lp, presolve_info, barrier_settings);

// Convert the converged solution to (x, w, y, z, v)
std::vector<f_t> x_pdhg(n - n_upper_bounds);
std::vector<f_t> w_pdhg(n_upper_bounds);
std::vector<f_t> y_pdhg(m - n_upper_bounds);
std::vector<f_t> z_pdhg(n - n_upper_bounds);
std::vector<f_t> v_pdhg(n_upper_bounds);

for (i_t j = 0; j < n - n_upper_bounds; j++) {
x_pdhg[j] = x[j];
z_pdhg[j] = mu_target / x[j];
}

for (i_t j = 0; j < n_upper_bounds; j++) {
w_pdhg[j] = x[n - n_upper_bounds + j];
v_pdhg[j] = mu_target / x[n - n_upper_bounds + j];
}

for (i_t i = 0; i < m - n_upper_bounds; i++) {
y_pdhg[i] = y[i];
}

barrier_solver.x_pdhg = x_pdhg;
barrier_solver.w_pdhg = w_pdhg;
barrier_solver.y_pdhg = y_pdhg;
barrier_solver.z_pdhg = z_pdhg;
barrier_solver.v_pdhg = v_pdhg;

barrier_solver_settings_t<i_t, f_t> barrier_solver_settings;
lp_status_t barrier_status =
barrier_solver.solve(start_time, barrier_solver_settings, barrier_solution);

return barrier_status == lp_status_t::OPTIMAL ? crossover_status_t::OPTIMAL
: crossover_status_t::NUMERICAL_ISSUES;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Propagate the barrier result and preserve the barrier termination code.

This branch returns based on barrier_status, but it never copies/postsolves barrier_solution into solution or initializes vstatus, so callers can observe stale outputs after an OPTIMAL return. It also collapses TIME_LIMIT and CONCURRENT_LIMIT into NUMERICAL_ISSUES, which loses the actual termination reason.

Based on learnings, in cpp/src/linear_programming/solve.cu, the barrier solver does not currently return INFEASIBLE or UNBOUNDED status. It only returns OPTIMAL, TIME_LIMIT, NUMERICAL_ISSUES, or CONCURRENT_LIMIT.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cpp/src/dual_simplex/crossover.cu` around lines 1974 - 2013, The branch that
handles barrier_status after barrier_solver.solve currently returns based on
barrier_status but never copies/postsolves barrier_solution into the
caller-facing solution nor initializes vstatus, and it collapses
TIME_LIMIT/CONCURRENT_LIMIT into NUMERICAL_ISSUES; update the code after
barrier_solver.solve to (1) when barrier_status == OPTIMAL, postsolve/copy
barrier_solution into solution and set vstatus appropriately (initialize any
variable-status arrays the caller expects), and return
crossover_status_t::OPTIMAL; (2) when barrier_status == TIME_LIMIT return the
corresponding crossover_status_t TIME_LIMIT (preserving the termination reason),
when barrier_status == CONCURRENT_LIMIT return CONCURRENT_LIMIT, and for any
other non-OPTIMAL status return NUMERICAL_ISSUES; use the existing symbols
barrier_solution, barrier_solver.solve, barrier_status, solution, and vstatus to
locate where to perform the postsolve/copy and status propagation.

Comment on lines +838 to +855
if (settings.barrier_presolve && no_lower_bound > 0) {
presolve_info.negated_variables.reserve(no_lower_bound);
for (i_t j = 0; j < problem.num_cols; j++) {
if (problem.lower[j] == -inf && problem.upper[j] < inf) {
presolve_info.negated_variables.push_back(j);

problem.lower[j] = -problem.upper[j];
problem.upper[j] = inf;
problem.objective[j] *= 1;

const i_t col_start = problem.A.col_start[j];
const i_t col_end = problem.A.col_start[j + 1];
for (i_t p = col_start; p < col_end; p++) {
problem.A.x[p] *= -1.0;
}
}
}
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Finish the x'_j = -x_j substitution.

This block updates bounds and flips A(:, j), but the linear term is left unchanged (problem.objective[j] *= 1) and the quadratic/postsolve pieces are not updated to match. For any -inf < x_j <= u_j column, barrier presolve will otherwise solve a different objective and uncrush_solution() will return the wrong sign for x/z.

Minimum linear-term fix
-        problem.objective[j] *= 1;
+        problem.objective[j] *= -1.0;

Based on learnings: Validate algorithm correctness in optimization logic: simplex pivots, branch-and-bound decisions, routing heuristics, and constraint/objective handling must produce correct results, and ensure variables and constraints are accessed from the correct problem context (original vs presolve vs folded vs postsolve); verify index mapping consistency across problem transformations.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cpp/src/dual_simplex/presolve.cpp` around lines 838 - 855, The presolve
negation branch (when settings.barrier_presolve && no_lower_bound > 0) flips
variable bounds and A(:, j) but fails to negate the linear objective and any
quadratic/postsolve data; update problem.objective[j] to -problem.objective[j]
(not *= 1) and record/negate any quadratic terms and postsolve mappings
associated with column j so that presolve_info.negated_variables and
uncrush_solution can restore signs correctly; ensure all places that consume
negated_variables (e.g., uncrush_solution, postsolve reconstruction, and any
quadratic Hessian or local quadratic coefficients) apply the same sign flip.

Comment on lines +1155 to +1205
template <typename i_t, typename f_t>
void crush_solution_to_presolve_space(const lp_problem_t<i_t, f_t>& original_lp,
presolve_info_t<i_t, f_t>& presolve_info,
const std::vector<f_t>& original_x,
const std::vector<f_t>& original_y,
const std::vector<f_t>& original_z,
std::vector<f_t>& crushed_x,
std::vector<f_t>& crushed_y,
std::vector<f_t>& crushed_z)
{
crushed_x = original_x;
crushed_y = original_y;
crushed_z = original_z;

if (presolve_info.negated_variables.size() > 0) {
for (i_t j = 0; j < presolve_info.negated_variables.size(); j++) {
crushed_x[presolve_info.negated_variables[j]] *= -1.0;
crushed_z[presolve_info.negated_variables[j]] *= -1.0;
}
}

// Repeat all presolve steps to get the crushed solution
if (presolve_info.removed_lower_bounds.size() > 0) {
// We had l_j <= x_j <= u_j
// And we transformed it into
// 0 <= v_j = x_j - l_j
// So we need to subtract the removed lower bounds from the original solution
for (i_t j = 0; j < original_x.size(); j++) {
crushed_x[j] -= presolve_info.removed_lower_bounds[j];
}
}

// Handle empty rows
if (presolve_info.removed_constraints.size() > 0) {
printf("Can't handle removed constraints %ld\n", presolve_info.removed_constraints.size());
exit(1);
}

// Handle empty cols
if (presolve_info.removed_variables.size() > 0) {
printf("Can't handle removed variables %ld\n", presolve_info.removed_variables.size());
exit(1);
}

// Handle free variables
if (presolve_info.free_variable_pairs.size() > 0) {
printf("Can't handle free variable pairs %ld\n", presolve_info.free_variable_pairs.size());
exit(1);
}

}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Do not hard-abort on unsupported presolve mappings.

presolve() above can legitimately remove empty rows/cols, expand free variables, and fold the system. This helper either exits the process when those states appear or silently ignores them (for example, folding), so a valid model can turn the warm-start path into a process kill or a mis-sized crushed solution. Please return a status/fallback and short-circuit warm-starting when the presolve transform set is not fully supported.

Based on learnings: Ensure variables and constraints are accessed from the correct problem context (original vs presolve vs folded vs postsolve); verify index mapping consistency across problem transformations, and reduce tight coupling between solver components (presolve, simplex, basis, barrier); increase modularity and reusability of optimization algorithms.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cpp/src/dual_simplex/presolve.cpp` around lines 1155 - 1205,
crush_solution_to_presolve_space currently hard-aborts on unsupported presolve
mappings (removed_constraints, removed_variables, free_variable_pairs) and
assumes removed_lower_bounds/negated_variables align with original_x, which can
kill the process or produce mis-sized outputs; change the function signature to
return a status (bool or enum) instead of void, remove printf/exit, and return
false when encountering unsupported transforms (preserve existing behavior for
negated_variables/removed_lower_bounds only if all index checks pass); add
bounds checks before indexing (verify presolve_info.negated_variables[k] <
crushed_x.size() and removed_lower_bounds.size() == crushed_x.size() before
subtracting) and document that callers must short-circuit warm-starting when
this function returns false.

Comment on lines +33 to +37
template <typename T>
T random_value(T min, T max)
{
std::uniform_real_distribution<> distrib(min, max);
return distrib(gen);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

# First, let's find and examine the file
find . -path "*cpp/src/dual_simplex/random.hpp" -type f

Repository: NVIDIA/cuopt

Length of output: 89


🏁 Script executed:

# Let's check the full content of the random.hpp file
cat -n cpp/src/dual_simplex/random.hpp

Repository: NVIDIA/cuopt

Length of output: 1281


🏁 Script executed:

# Search for calls to random_value() to see what types are being used
rg "random_value" --type cpp --type hpp -B2 -A2

Repository: NVIDIA/cuopt

Length of output: 83


🏁 Script executed:

# Search for random_value calls without file type filter
rg "random_value" -B2 -A2

Repository: NVIDIA/cuopt

Length of output: 1496


🏁 Script executed:

# Also search for usage of the random_t class to understand context
rg "random_t<" -B1 -A1

Repository: NVIDIA/cuopt

Length of output: 443


🏁 Script executed:

# Check if there are explicit template instantiations or usage patterns
rg "random_value\s*\(" --no-heading -A1

Repository: NVIDIA/cuopt

Length of output: 287


Constrain random_value() to real types and parameterize the distribution.

std::uniform_real_distribution<> defaults to double regardless of template argument T. This means if integral types are passed, the function still compiles but returns truncated double-to-integral conversions instead of uniformly distributed integers. The current call site (sparse_matrix.cpp) only uses floating-point types, but the API should enforce this constraint to prevent future misuse.

Use std::uniform_real_distribution<T> with a static_assert to restrict T to floating-point types:

🔧 Proposed fix
 `#include` <random>
+#include <type_traits>
@@
   template <typename T>
   T random_value(T min, T max)
   {
-    std::uniform_real_distribution<> distrib(min, max);
+    static_assert(std::is_floating_point_v<T>,
+                  "random_value requires a floating-point type");
+    std::uniform_real_distribution<T> distrib(min, max);
     return distrib(gen);
   }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cpp/src/dual_simplex/random.hpp` around lines 33 - 37, The template function
random_value<T> should be constrained to floating-point types and use a
distribution parameterized by T: add a
static_assert(std::is_floating_point<T>::value, "...") in random_value to
prevent integral misuse, change the distribution type to
std::uniform_real_distribution<T> so the generated type matches T, and ensure
<type_traits> is included; update the call sites if any rely on implicit double
behavior.

Comment on lines +867 to +894
f_t e = vector_norm2<i_t, f_t>(x);
if (e == 0.0) { return 0.0; }

for (i_t j = 0; j < n; ++j) {
x[j] /= e;
}

f_t e0 = 0.0;

i_t iter = 0;
const i_t max_iter = 100;
while (std::abs(e - e0) > tol * e) {
e0 = e;
matrix_vector_multiply(*this, 1.0, x, 0.0, Sx);
f_t Sx_norm = vector_norm2<i_t, f_t>(Sx);
if (Sx_norm == 0.0) {
random_t<i_t, f_t> rng(0);
for (i_t i = 0; i < m; ++i) {
Sx[i] = rng.random_value(0.0, 1.0);
}
Sx_norm = vector_norm2<i_t, f_t>(Sx);
}
matrix_transpose_vector_multiply(*this, 1.0, Sx, 0.0, x);
f_t norm_x = vector_norm2<i_t, f_t>(x);
e = norm_x / Sx_norm;
for (i_t j = 0; j < n; ++j) {
x[j] /= norm_x;
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Guard the norm normalizations with a tolerance, not == 0.0.

Sx_norm and norm_x can legitimately become tiny on rank-deficient or very small matrices. With the current exact-zero checks, e = norm_x / Sx_norm and x[j] /= norm_x can produce inf/NaN values before either norm is exactly zero, which will corrupt the estimate used downstream.

🛡️ Proposed fix
 `#include` <cstdio>
+#include <limits>
@@
-  f_t e = vector_norm2<i_t, f_t>(x);
-  if (e == 0.0) { return 0.0; }
+  const f_t min_norm = std::max(std::abs(tol), std::numeric_limits<f_t>::epsilon());
+  f_t e              = vector_norm2<i_t, f_t>(x);
+  if (e <= min_norm) { return 0.0; }
@@
     matrix_vector_multiply(*this, 1.0, x, 0.0, Sx);
     f_t Sx_norm = vector_norm2<i_t, f_t>(Sx);
-    if (Sx_norm == 0.0) {
+    if (Sx_norm <= min_norm) {
       random_t<i_t, f_t> rng(0);
       for (i_t i = 0; i < m; ++i) {
-        Sx[i] = rng.random_value(0.0, 1.0);
+        Sx[i] = rng.random_value(f_t{0}, f_t{1});
       }
       Sx_norm = vector_norm2<i_t, f_t>(Sx);
+      if (Sx_norm <= min_norm) { return 0.0; }
     }
     matrix_transpose_vector_multiply(*this, 1.0, Sx, 0.0, x);
     f_t norm_x = vector_norm2<i_t, f_t>(x);
+    if (norm_x <= min_norm) { return 0.0; }
     e          = norm_x / Sx_norm;
As per coding guidelines, "Check numerical stability: prevent overflow/underflow, precision loss, division by zero/near-zero..."
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cpp/src/dual_simplex/sparse_matrix.cpp` around lines 867 - 894, The loop
currently divides by Sx_norm and norm_x using exact-zero checks which can still
divide by tiny values; guard these normalizations with a relative tolerance
instead. In the power-iteration section (symbols: matrix_vector_multiply,
matrix_transpose_vector_multiply, vector_norm2, random_t and variables Sx_norm,
norm_x, e, x) replace the Sx_norm==0.0 and norm_x==0.0 checks with tests like
Sx_norm <= max(tol, std::numeric_limits<f_t>::epsilon())*reference_norm (or tol
* e0) and similarly for norm_x, and if the norm is below that threshold
regenerate Sx (or reinitialize x) and skip the division (or scale safely);
ensure divisions x[j] /= norm_x and e = norm_x / Sx_norm only occur when both
norms exceed the chosen tolerance to avoid inf/NaN contamination.

Comment on lines +623 to 624
dual_simplex::crossover_status_t crossover_status = dual_simplex::central_path(
lp, dual_simplex_settings, initial_solution, timer.get_tic_start(), vertex_solution, vstatus);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

# First, check the context around lines 623-624 in solve.cu
sed -n '610,640p' cpp/src/pdlp/solve.cu

Repository: NVIDIA/cuopt

Length of output: 2034


🏁 Script executed:

# Find the dual_simplex crossover/central_path definitions
find . -type f \( -name "*.hpp" -o -name "*.cu" \) -path "*/dual_simplex/*" | head -20

Repository: NVIDIA/cuopt

Length of output: 829


🏁 Script executed:

# Search for central_path function definition
rg "central_path" --type cpp --type cuda -A 10 -B 2 cpp/src/dual_simplex/

Repository: NVIDIA/cuopt

Length of output: 84


🏁 Script executed:

# Search for the old crossover function to compare
rg "crossover\(" --type cpp --type cuda -A 10 -B 2 cpp/src/dual_simplex/ | head -100

Repository: NVIDIA/cuopt

Length of output: 84


🏁 Script executed:

# Read the crossover.hpp file for function declarations
cat -n cpp/src/dual_simplex/crossover.hpp

Repository: NVIDIA/cuopt

Length of output: 1950


🏁 Script executed:

# Read the crossover.cu file for implementations
cat -n cpp/src/dual_simplex/crossover.cu

Repository: NVIDIA/cuopt

Length of output: 50368


🏁 Script executed:

# Find the crossover() template implementation in crossover.cu
grep -n "^.*crossover.*{" cpp/src/dual_simplex/crossover.cu | head -5

Repository: NVIDIA/cuopt

Length of output: 38


🏁 Script executed:

# Find the central_path() template implementation in crossover.cu
grep -n "^.*central_path.*{" cpp/src/dual_simplex/crossover.cu | head -5

Repository: NVIDIA/cuopt

Length of output: 38


🏁 Script executed:

# Search for where central_path is actually defined
rg "crossover_status_t central_path" cpp/src/dual_simplex/crossover.cu -A 30 | head -100

Repository: NVIDIA/cuopt

Length of output: 1951


🏁 Script executed:

# Search for the old crossover() implementation to compare
rg "^template.*crossover_status_t crossover\(" cpp/src/dual_simplex/crossover.cu -A 50 | head -150

Repository: NVIDIA/cuopt

Length of output: 38


🏁 Script executed:

# Look for what central_path returns - does it produce a basic solution?
rg "central_path" cpp/src/dual_simplex/crossover.cu -A 100 | tail -200

Repository: NVIDIA/cuopt

Length of output: 4249


🏁 Script executed:

# Get more of central_path implementation to see the full function
sed -n '1100,1400p' cpp/src/dual_simplex/crossover.cu | tail -100

Repository: NVIDIA/cuopt

Length of output: 4580


🏁 Script executed:

# Search for what vstatus is set to in central_path
rg "vstatus\[" cpp/src/dual_simplex/crossover.cu | grep -A2 -B2 "central_path" | head -50

Repository: NVIDIA/cuopt

Length of output: 38


🏁 Script executed:

# Look at the full central_path function - get line ranges
rg -n "^template.*i_t, f_t.*central_path" cpp/src/dual_simplex/crossover.cu

Repository: NVIDIA/cuopt

Length of output: 38


🏁 Script executed:

# Find where central_path is defined in the file
grep -n "template.*crossover_status_t central_path" cpp/src/dual_simplex/crossover.cu

Repository: NVIDIA/cuopt

Length of output: 117


🏁 Script executed:

# Get the line count of the file to understand its size
wc -l cpp/src/dual_simplex/crossover.cu

Repository: NVIDIA/cuopt

Length of output: 94


🏁 Script executed:

# Search for the old crossover() function to see the pattern
grep -n "template.*crossover_status_t crossover" cpp/src/dual_simplex/crossover.cu

Repository: NVIDIA/cuopt

Length of output: 112


🏁 Script executed:

# Search for where the actual template function for central_path starts (not the instantiation)
grep -n "^template <typename i_t, typename f_t>" cpp/src/dual_simplex/crossover.cu | head -20

Repository: NVIDIA/cuopt

Length of output: 814


🏁 Script executed:

# Get a broader view - look for both crossover and central_path function declarations
grep -n "crossover_status_t crossover\|crossover_status_t central_path" cpp/src/dual_simplex/crossover.cu

Repository: NVIDIA/cuopt

Length of output: 313


🏁 Script executed:

# Read the section around line 2029 backwards to find where central_path actually starts
sed -n '1600,2030p' cpp/src/dual_simplex/crossover.cu | head -100

Repository: NVIDIA/cuopt

Length of output: 3724


🏁 Script executed:

# Read lines 1582 to 1900 to see the main logic of central_path
sed -n '1582,1900p' cpp/src/dual_simplex/crossover.cu

Repository: NVIDIA/cuopt

Length of output: 11660


🏁 Script executed:

# Read lines 1900 to 2020 to see the end of central_path
sed -n '1900,2020p' cpp/src/dual_simplex/crossover.cu

Repository: NVIDIA/cuopt

Length of output: 4930


🏁 Script executed:

# Now let's check what happens with vstatus and vertex_solution after line 624 in solve.cu
sed -n '623,700p' cpp/src/pdlp/solve.cu

Repository: NVIDIA/cuopt

Length of output: 4733


🏁 Script executed:

# Look for the old crossover() implementation to compare it with central_path()
sed -n '1186,1400p' cpp/src/dual_simplex/crossover.cu | head -150

Repository: NVIDIA/cuopt

Length of output: 5833


API contract violation: central_path() produces interior point solution, not basic solution.

The function dispatch change at line 623 replaces classical crossover (which produces a complete basic solution with vstatus properly classified) with central_path(), which runs a barrier method with PDHG and produces an interior point on the central path. This is a silent semantic change: identical signatures but fundamentally different algorithm and output type.

central_path() terminates by calling barrier_solver.solve(), not simplex pivoting. It does not produce a complete basis—vstatus will not correctly represent which variables are basic/nonbasic for downstream simplex operations. Code expecting a basic solution will be broken.

The mapping of crossover_status_t::DUAL_FEASIBLE to NumericalError in the switch statement suggests incomplete handling of possible output states from the new algorithm.

Required: Document why this API contract change is safe, or restore the original crossover behavior for the settings.crossover branch.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cpp/src/pdlp/solve.cu` around lines 623 - 624, The change replaces the
classical crossover with dual_simplex::central_path which returns an
interior-point solution (via barrier_solver.solve()) and does not produce a
basis nor correctly-populated vstatus; either revert to the original classical
crossover when settings.crossover is set or add an explicit crossover step after
central_path to convert the interior solution into a full basic solution and
populate vstatus. Specifically, restore or call the previous crossover routine
(e.g., classical_crossover / simplex_crossover) instead of relying on
central_path alone, or after central_path invoke a crossover routine that takes
vertex_solution and fills vstatus and returns a proper
dual_simplex::crossover_status_t; also update the switch/handling of
dual_feasible/other statuses to map all possible central_path/crossover outputs
to existing error enums and add a short comment documenting why
central_path+explicit crossover is safe if you choose to keep it.

@anandhkb anandhkb added this to the 26.04 milestone Mar 12, 2026
@rgsl888prabhu rgsl888prabhu changed the base branch from main to release/26.04 March 19, 2026 16:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants