Master-Worker Model + Reliability Branching #766

nguidotti · 2026-01-13T17:26:58Z

This PR implements the master-worker model for B&B, where one thread (the master) is responsible for scheduling tasks to the other threads (the workers). With this model, the CPU resources can be shared efficiently among the different tasks within the MIP solver. It also allows the scheduling policy to be changed in runtime (e.g., in the future, the solver can dynamically set the number of workers allocated to each diving heuristics at runtime).

This also implements a parallel reliability branching (Section 5.7 from [1]).

Closes #526.
Closes #445.

Reference

[1] T. Achterberg, “Constraint Integer Programming,” PhD, Technischen Universität Berlin, Berlin, 2007. doi: 10.14279/depositonce-1634.

Checklist

I am familiar with the Contributing Guidelines.
Testing
- New or existing tests cover these changes
- Added tests
- Created an issue to follow-up
- NA
Documentation
- The documentation is up to date with these changes
- Added new documentation
- NA

Summary by CodeRabbit

New Features
- Toggleable reliability-branching and explicit parallel/single-threaded MIP solve modes.
- Added a CPU PCG random generator.
Refactor
- Branch-and-bound redesigned to a master/worker scheduler with worker-driven interfaces.
- Bounds-strengthening and pseudo-cost workflows updated to support reliability-based branching and trial branching.
Concurrency / Stability
- Finer-grained per-variable locking, atomic counters, safer queue and worker synchronization.
Other
- Logging improvements and wider integer counters for solution metrics.

_{✏️ Tip: You can customize this high-level summary in your review settings.}

… on the objective pseudcost estimate.

…eady explored.

…iteration and node limit to the diving threads.

…diving

nguidotti · 2026-01-26T14:12:50Z

/ok to test e46eba6

nguidotti · 2026-01-27T13:48:01Z

/coderabbit review

nguidotti · 2026-01-27T13:48:24Z

/ok to test a6e055c

nguidotti · 2026-01-27T14:59:43Z

@coderabbitai review

coderabbitai · 2026-01-27T14:59:52Z

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

nguidotti · 2026-01-27T15:01:06Z

/ok to test de4389b

nguidotti · 2026-01-27T15:01:23Z

@coderabbitai review

coderabbitai · 2026-01-27T15:01:31Z

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

coderabbitai · 2026-01-27T15:02:50Z

📝 Walkthrough

Walkthrough

Refactors B&B to a worker-pool model, adds reliability-branching settings and CLI flag, implements reliability-based pseudo-cost selection with per-variable locks and trial branching, moves bounds-strengthening to accept external bound-change info, introduces a CPU PCG RNG, and updates many function signatures to use worker_data and solve-mode parameters.

Changes

Cohort / File(s)	Summary
CLI & top-level run `benchmarks/linear_programming/cuopt/run_mip.cpp`	Added `--enable-reliability-branching` flag; threaded and non-threaded run paths accept and pass `enable_rb`.
Solver settings & wiring `cpp/include/cuopt/linear_programming/mip/solver_settings.hpp`, `cpp/src/mip/solver.cu`	Added `enable_reliability_branching` to solver settings and wired it into B&B settings; removed manual thread-distribution assignments; B&B solve calls now pass `mip_solve_mode_t::BNB_PARALLEL`.
Worker framework `cpp/src/dual_simplex/bnb_worker.hpp`	New worker types enum, `bnb_stats_t`, `bnb_worker_data_t` and thread-safe `bnb_worker_pool_t` for worker allocation/return and lower-bound aggregation.
Branch-and-bound core `cpp/src/dual_simplex/branch_and_bound.hpp`, `cpp/src/dual_simplex/branch_and_bound.cpp`	Large refactor to worker-driven API: replace thread-type usage with `bnb_worker_data_t*`, add `mip_solve_mode_t` (BNB_PARALLEL/BNB_SINGLE_THREADED), introduce `plunge_with`/`dive_with`/`run_scheduler`/`single_threaded_solve`, update many signatures and integrate worker_pool and per-type accounting.
Pseudo-costs & selection `cpp/src/dual_simplex/pseudo_costs.hpp`, `cpp/src/dual_simplex/pseudo_costs.cpp`	Replace global pseudo-cost mutex with per-variable `pseudo_cost_mutex` vector; add `sb_total_lp_iter` atomic; add `reliable_variable_selection` and `trial_branching`; rework locking in update/selection/obj_estimate paths.
Diving heuristics concurrency `cpp/src/dual_simplex/diving_heuristics.cpp`	Replace coarse-grained pseudo-cost lock with per-index mutex locking around pc reads to reduce contention.
Node queue synchronization `cpp/src/dual_simplex/node_queue.hpp`	Add internal mutex guards to `pop_best_first`/`pop_diving`; remove public `lock()`/`unlock()` methods.
Bounds strengthening & presolve `cpp/src/dual_simplex/bounds_strengthening.hpp`, `cpp/src/dual_simplex/bounds_strengthening.cpp`, `cpp/src/dual_simplex/presolve.cpp`	`bounds_strengthening` now takes `const std::vector<bool>& bounds_changed`; removed internal member; presolve constructs/passes vector and uses `if constexpr` for compile-time control.
Sub-MIP / heuristics adjustments `cpp/src/mip/diversity/lns/rins.cu`, `cpp/src/mip/diversity/recombiners/sub_mip.cuh`	Simplified sub-MIP/RINS to single-threaded runs; disable reliability branching for those heuristics; call solve with `BNB_SINGLE_THREADED`.
Simplex settings: reliability config `cpp/src/dual_simplex/simplex_solver_settings.hpp`	Added `reliability_branching_settings_t` with tuning fields; removed `num_diving_workers`/`num_bfs_workers`; added `reliability_branching_settings` member.
Basis & solution structs `cpp/src/dual_simplex/basis_updates.hpp`, `cpp/src/dual_simplex/solution.hpp`	`basis_update_mpf_t`: defaulted copy ctor/assignment added. `mip_solution_t` counters `nodes_explored` and `simplex_iterations` changed to `int64_t`.
OMP helpers / atomics `cpp/src/utilities/omp_helpers.hpp`	`omp_atomic_t::operator T()` is now `const` for const-correctness.
New RNG utility `cpp/src/utilities/pcg.hpp`	Added CPU-side PCG RNG `cuopt::PCG` with seeding/skipahead, integer/float outputs, uniform and shuffle helpers.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 9.09% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title 'Master-Worker Model + Reliability Branching' accurately summarizes the main changes in the PR, which introduces a master-worker scheduling paradigm and parallel reliability branching.
Linked Issues check	✅ Passed	The PR implements the coding requirements from both linked issues: `#526` (separate reporting/repair task via new master-worker scheduling) and `#445` (improved lower bound calculation via worker pool's get_lower_bounds()).
Out of Scope Changes check	✅ Passed	All changes are in scope: master-worker architecture, reliability branching implementation, worker pool management, bounds strengthening refactoring, and supporting utilities (PCG RNG). No unrelated changes detected.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing touches

📝 Generate docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 6

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)

cpp/src/dual_simplex/solution.hpp (1)
54-77: Initialize the new 64-bit counters.

nodes_explored and simplex_iterations are still uninitialized in the constructor; please set them to 0 to avoid undefined values.
🐛 Suggested fix
   mip_solution_t(i_t n)
     : x(n),
       objective(std::numeric_limits<f_t>::quiet_NaN()),
       lower_bound(-inf),
+      nodes_explored(0),
+      simplex_iterations(0),
       has_incumbent(false)
   {
   }
cpp/src/dual_simplex/pseudo_costs.cpp (1)
333-390: Add guards against empty fractional in variable selection methods.

The functions variable_selection (line 375) and reliable_variable_selection (line 408) access fractional[0] without checking if the list is empty. Additionally, score[select] at line 390 uses select = -1 as a sentinel when the fractional list is empty, causing an out-of-bounds access. While current callers at lines 584, 594, and 1442 in branch_and_bound.cpp guard against empty fractional before invoking these methods, the functions should not rely on caller discipline. Add an early check with appropriate handling (return sentinel value and log, or assert for internal consistency):
Example guard for `variable_selection`
+  if (fractional.empty()) {
+    log.debug("Pseudocost: no fractional variables to branch on\n");
+    return -1;
+  }
   const i_t num_fractional = fractional.size();
The same guard should be added to reliable_variable_selection at line 408.

🤖 Fix all issues with AI agents

In `@benchmarks/linear_programming/cuopt/run_mip.cpp`:
- Around line 363-365: The help text for the CLI flag added by
program.add_argument("--enable-reliability-branching") is incorrect ("track
allocations"); update it to a clear, accurate description such as "enable
reliability-based branching (t/f)" or "enable reliability branching (t/f)".
Locate the call to program.add_argument("--enable-reliability-branching") and
replace the .help(...) string accordingly, keeping the existing
.default_value(...) unchanged.

In `@cpp/src/dual_simplex/bnb_worker.hpp`:
- Around line 182-201: The get_idle_worker()/pop_idle_worker() split can return
the same idle worker multiple times; replace them with a single atomic operation
by adding get_and_pop_idle_worker() that under one std::lock_guard on mutex_
checks idle_workers_.empty(), if not pops the front index, decrements
num_idle_workers_, and returns workers_[idx].get(); remove or stop using
get_idle_worker() and pop_idle_worker() and update callers (e.g., in
run_scheduler()) to call get_and_pop_idle_worker() so a worker is reserved and
removed from idle_workers_ in one locked step.

In `@cpp/src/dual_simplex/branch_and_bound.cpp`:
- Around line 1466-1468: The new call to calculate_variable_locks from
branch_and_bound.cpp exposes a known bug in calculate_variable_locks (in
dual_simplex/diving_heuristics.cpp) that indexes variable bounds with row
indices and can go OOB when num_rows > num_cols; either fix that function or
gate this call. To fix: update calculate_variable_locks to iterate over column
indices (use num_cols) when mapping bounds and locks, index bounds arrays by
column rather than row, and treat constraints Ax=b as equalities when computing
up/down locks so locks reflect column variable bounds correctly; alternatively,
in branch_and_bound.cpp gate the call behind a safe flag so original_lp_,
var_up_locks_, var_down_locks_ are not passed to calculate_variable_locks until
the routine is corrected.

In `@cpp/src/dual_simplex/pseudo_costs.cpp`:
- Around line 423-426: The calculation of bnb_lp_iter_per_node uses
bnb_stats.nodes_explored as a divisor and can divide by zero; update the code
that computes bnb_lp_iter_per_node (which currently uses bnb_total_lp_iter /
bnb_stats.nodes_explored) to guard the denominator by checking
bnb_stats.nodes_explored (or bnb_nodes_explored) == 0 and using a safe fallback
(e.g., set bnb_lp_iter_per_node to 0 or compute using 1) when zero, so no
division by zero occurs.
- Around line 560-567: Reads of pseudo_cost_sum_up, pseudo_cost_num_up,
pseudo_cost_sum_down and pseudo_cost_num_down are done after the per-variable
mutex was released, causing a race; re-acquire the same per-var mutex used for
updating pseudo-costs and perform the computation of pc_up, pc_down, f_down,
f_up and score inside that locked scope (i.e., move the calculations that
produce pc_up, pc_down and score into the mutex-protected region that guards
pseudo_cost_sum_*/pseudo_cost_num_*), ensuring you still fall back to
pseudo_cost_up_avg/pseudo_cost_down_avg when counts are zero.

In `@cpp/src/utilities/pcg.hpp`:
- Around line 143-151: The shuffle() implementation underflows when seq is
empty; guard early and skip work for sizes < 2 by returning immediately (e.g.,
if (seq.size() < 2) return;) or change the loop condition to avoid computing
seq.size() - 1 (e.g., for (size_t i = 0; i + 1 < seq.size(); ++i)). Ensure the
call to uniform(i, seq.size()) only occurs when seq.size() > i so uniform is
passed valid bounds; update the function shuffle(std::vector<T>& seq)
accordingly.

🧹 Nitpick comments (4)

cpp/src/dual_simplex/basis_updates.hpp (1)

226-227: Consider adding defaulted move operations to follow modern C++ best practices.

While declaring explicit copy operations does suppress implicit move semantics, the impact here is minimal. The class is primarily passed by reference in the codebase (lines 226-227 of basis_update_mpf_t), and the only notable copy occurs in pseudo_costs.cpp where it copies from a const reference—which must be a copy regardless. The class is not stored in containers or returned by value in patterns where move would provide meaningful benefit.

If adopting it as standard practice, consider adding:
basis_update_mpf_t(basis_update_mpf_t&& other) = default;
basis_update_mpf_t& operator=(basis_update_mpf_t&& other) = default;
This improves consistency with modern C++ conventions but is not required for correctness or performance in current usage.

cpp/src/dual_simplex/bounds_strengthening.cpp (1)

225-229: Variable shadowing: local bounds_changed shadows the parameter.

Line 228 declares a local bool bounds_changed that shadows the const std::vector<bool>& bounds_changed parameter from the function signature. While the code is functionally correct (the local is only used on line 229), this naming collision can cause confusion during maintenance.
♻️ Suggested fix: rename local variable
-      bool bounds_changed = lb_updated || ub_updated;
-      if (bounds_changed) { num_bounds_changed++; }
+      bool bound_updated = lb_updated || ub_updated;
+      if (bound_updated) { num_bounds_changed++; }

cpp/src/dual_simplex/diving_heuristics.cpp (1)

91-98: Prefer scoped locking for pseudo-cost reads.

Manual lock/unlock is easy to regress if these loops gain early returns; a scoped guard keeps this exception-safe. Apply the same pattern in both blocks.

♻️ Suggested change

-    pc.pseudo_cost_mutex[j].lock();
-    f_t pc_down = pc.pseudo_cost_num_down[j] != 0
-                    ? pc.pseudo_cost_sum_down[j] / pc.pseudo_cost_num_down[j]
-                    : pseudo_cost_down_avg;
+    std::lock_guard<omp_mutex_t> lock(pc.pseudo_cost_mutex[j]);
+    f_t pc_down = pc.pseudo_cost_num_down[j] != 0
+                    ? pc.pseudo_cost_sum_down[j] / pc.pseudo_cost_num_down[j]
+                    : pseudo_cost_down_avg;
@@
-    pc.pseudo_cost_mutex[j].unlock();

Also applies to: 170-177

cpp/src/dual_simplex/branch_and_bound.cpp (1)

565-619: Guard reliability branching when no idle workers/single-threaded.

num_tasks is derived from worker_pool_.num_idle_workers(), which can be 0 when all workers are busy or when the pool isn't initialized in BNB_SINGLE_THREADED. If reliable_variable_selection expects at least one task, this can lead to an invalid branch choice (asserts later). Consider falling back to pseudocost selection or clamping to a minimum of 1.

🔧 Suggested guard for zero idle workers

-      if (settings_.reliability_branching_settings.enable) {
-        simplex_solver_settings_t<i_t, f_t> rb_settings      = settings_;
-        rb_settings.reliability_branching_settings.num_tasks = worker_pool_.num_idle_workers();
-
-        branch_var = pc_.reliable_variable_selection(node_ptr,
-                                                     fractional,
-                                                     solution,
-                                                     rb_settings,
-                                                     var_types_,
-                                                     worker_data,
-                                                     exploration_stats_,
-                                                     upper_bound_,
-                                                     log);
-      } else {
-        branch_var = pc_.variable_selection(fractional, solution, log);
-      }
+      if (settings_.reliability_branching_settings.enable) {
+        auto idle = worker_pool_.num_idle_workers();
+        if (idle <= 0) {
+          branch_var = pc_.variable_selection(fractional, solution, log);
+        } else {
+          simplex_solver_settings_t<i_t, f_t> rb_settings      = settings_;
+          rb_settings.reliability_branching_settings.num_tasks = idle;
+          branch_var = pc_.reliable_variable_selection(node_ptr,
+                                                       fractional,
+                                                       solution,
+                                                       rb_settings,
+                                                       var_types_,
+                                                       worker_data,
+                                                       exploration_stats_,
+                                                       upper_bound_,
+                                                       log);
+        }
+      } else {
+        branch_var = pc_.variable_selection(fractional, solution, log);
+      }

coderabbitai · 2026-01-27T15:13:33Z

benchmarks/linear_programming/cuopt/run_mip.cpp

+  program.add_argument("--enable-reliability-branching")
+    .help("track allocations (t/f)")
+    .default_value(std::string("f"));


⚠️ Potential issue | 🟡 Minor

Fix CLI help text for reliability branching flag.
Current help string says “track allocations,” which is misleading for this option.

🐛 Proposed fix

program.add_argument("--enable-reliability-branching") - .help("track allocations (t/f)") + .help("enable reliability branching (t/f)") .default_value(std::string("f"));

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

program.add_argument("--enable-reliability-branching")

.help("track allocations (t/f)")

.default_value(std::string("f"));

program.add_argument("--enable-reliability-branching")

.help("enable reliability branching (t/f)")

.default_value(std::string("f"));

🤖 Prompt for AI Agents

In `@benchmarks/linear_programming/cuopt/run_mip.cpp` around lines 363 - 365, The help text for the CLI flag added by program.add_argument("--enable-reliability-branching") is incorrect ("track allocations"); update it to a clear, accurate description such as "enable reliability-based branching (t/f)" or "enable reliability branching (t/f)". Locate the call to program.add_argument("--enable-reliability-branching") and replace the .help(...) string accordingly, keeping the existing .default_value(...) unchanged.

cpp/src/dual_simplex/bnb_worker.hpp

cpp/src/dual_simplex/branch_and_bound.cpp

cpp/src/dual_simplex/pseudo_costs.cpp

cpp/src/utilities/pcg.hpp

nguidotti · 2026-01-27T17:04:57Z

@CodeRabbit review

coderabbitai · 2026-01-27T17:05:09Z

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

coderabbitai

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)

cpp/src/dual_simplex/pseudo_costs.cpp (2)
273-284: Guard against near-zero fractional values.

When fractional_val is very close to an integer (within floating-point tolerance), frac can be extremely small, causing change_in_obj / frac to produce very large or infinite values that corrupt pseudo-cost statistics.
🐛 Suggested guard
   const f_t change_in_obj = leaf_objective - node_ptr->lower_bound;
+  constexpr f_t min_frac = 1e-6;
   const f_t frac          = node_ptr->branch_dir == rounding_direction_t::DOWN
                               ? node_ptr->fractional_val - std::floor(node_ptr->fractional_val)
                               : std::ceil(node_ptr->fractional_val) - node_ptr->fractional_val;
+  if (frac < min_frac) { return; }  // Skip update for near-integer values
   if (node_ptr->branch_dir == rounding_direction_t::DOWN) {
375-390: Guard against accessing score[select] with invalid index.

Static analysis flags that select could remain -1 if the loop never executes (empty fractional). While the function would crash earlier at line 375 accessing fractional[0], defensive coding would improve robustness.
🛡️ Suggested defensive guard
+  if (num_fractional == 0) {
+    log.debug("No fractional variables for branching\n");
+    return -1;  // Or handle appropriately
+  }
+
   i_t branch_var = fractional[0];
   f_t max_score  = -1;
   i_t select     = -1;

🧹 Nitpick comments (4)

cpp/src/dual_simplex/pseudo_costs.cpp (2)
498-530: Holding mutex during expensive LP solve in trial branching.

The mutex pseudo_cost_mutex[j] is held while calling trial_branching, which performs an LP solve. This blocks other tasks attempting to work on the same variable j for potentially long periods. Consider releasing the lock before the trial solve and reacquiring before updating:
♻️ Alternative pattern to reduce contention
     pseudo_cost_mutex[j].lock();
-    if (pseudo_cost_num_down[j] < reliable_threshold) {
-      // Do trial branching on the down branch
-      f_t obj = trial_branching(...);
+    bool need_down_trial = pseudo_cost_num_down[j] < reliable_threshold;
+    pseudo_cost_mutex[j].unlock();
+
+    if (need_down_trial) {
+      f_t obj = trial_branching(...);  // Expensive, done without lock
 
+      pseudo_cost_mutex[j].lock();
       if (!std::isnan(obj)) {
         // ... update pseudo costs
       }
+      pseudo_cost_mutex[j].unlock();
     }
-    pseudo_cost_mutex[j].unlock();
636-658: Thread-safety assumption for update_pseudo_costs_from_strong_branching.

This function modifies pseudo_cost_sum_* and pseudo_cost_num_* without locking. This is correct given the current call site (line 266) is after the parallel region completes. Consider adding a comment documenting this single-threaded assumption to prevent future misuse.
📝 Documentation suggestion
 template <typename i_t, typename f_t>
 void pseudo_costs_t<i_t, f_t>::update_pseudo_costs_from_strong_branching(
   const std::vector<i_t>& fractional, const std::vector<f_t>& root_soln)
 {
+  // NOTE: This function is not thread-safe. Must be called from single-threaded context
+  // (e.g., after strong_branching's parallel region completes).
   for (i_t k = 0; k < fractional.size(); k++) {
cpp/src/utilities/pcg.hpp (2)
10-11: Prefer <cstdint> over <stdint.h> for C++ code.

Using the C++ header <cstdint> is preferred in C++ code as it places types in the std namespace and follows C++ conventions.
♻️ Suggested change
-#include <stdint.h>
+#include <cstdint>
 `#include` <vector>
135-141: Integer uniform sampling has modulo bias.

When T is an integer type (as used by shuffle with size_t), multiplying a [0,1) double by range and truncating introduces modulo bias. For cryptographic or high-quality sampling needs, consider rejection sampling. For shuffling in this optimization context, the slight bias is likely acceptable.
📝 Consider documenting the bias for integer types
   /// Draws a sample from a uniform distribution. The samples are uniformly distributed over
   /// the semi-closed interval `[low, high)`. This routine may have a **slight bias** toward
-  /// some numbers in the range (scaling by floating-point).
+  /// some numbers in the range (scaling by floating-point). For integer types, this bias
+  /// is more pronounced; use rejection sampling if unbiased integers are required.
   template <typename T>
   T uniform(T low, T high)

coderabbitai

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

cpp/src/dual_simplex/pseudo_costs.cpp (1)
375-392: Guard against empty fractional vector to prevent undefined behavior.

Static analysis flagged score[select] at line 390 as potentially accessing index -1. If fractional is empty, fractional[0] at line 375 and score[select] at line 390 cause undefined behavior.
🐛 Suggested fix: add an assertion or early return
 template <typename i_t, typename f_t>
 i_t pseudo_costs_t<i_t, f_t>::variable_selection(const std::vector<i_t>& fractional,
                                                  const std::vector<f_t>& solution,
                                                  logger_t& log)
 {
   const i_t num_fractional = fractional.size();
+  assert(num_fractional > 0 && "variable_selection called with no fractional variables");
+
   std::vector<f_t> pseudo_cost_up(num_fractional);

🤖 Fix all issues with AI agents

In `@cpp/src/dual_simplex/pseudo_costs.cpp`:
- Around line 523-528: The division by change_in_x can be zero or extremely
small due to floating-point rounding of solution[j]; in the block that updates
pseudo_cost_sum_down[j] / pseudo_cost_num_down[j] (using change_in_obj and
change_in_x) add a guard that computes a small positive threshold (e.g., based
on fabs(solution[j]) and std::numeric_limits<f_t>::epsilon() or a fixed epsilon
like 1e-12) and only perform pseudo_cost_sum_down[j] += change_in_obj /
change_in_x and increment pseudo_cost_num_down[j] when fabs(change_in_x) >
threshold; apply the same guard to the up-branch code that updates
pseudo_cost_sum_up[j] / pseudo_cost_num_up[j] (the symmetric computation around
lines where change_in_x is computed for the up branch) so near-integer
denominators are skipped and do not corrupt statistics.

cpp/src/dual_simplex/pseudo_costs.cpp

coderabbitai

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

cpp/src/dual_simplex/pseudo_costs.cpp (1)
375-392: Guard against empty fractional list to prevent out-of-bounds access.

The static analysis correctly identifies that select is initialized to -1 and used as an array index at line 390. While fractional is presumably non-empty when this function is called, adding a defensive check prevents undefined behavior if the precondition is violated.
🐛 Suggested fix
   i_t branch_var = fractional[0];
   f_t max_score  = -1;
   i_t select     = -1;
 
+  if (fractional.empty()) {
+    log.debug("variable_selection called with empty fractional list\n");
+    return -1;  // Or handle appropriately
+  }
+
   for (i_t k = 0; k < num_fractional; k++) {

🤖 Fix all issues with AI agents

In `@cpp/src/dual_simplex/pseudo_costs.cpp`:
- Around line 498-572: The OpenMP taskloop uses the 'untied' clause while
holding std::lock_guard<omp_mutex_t> (locking pseudo_cost_mutex[j]) which can
suspend and resume on a different thread causing undefined behavior; remove the
'untied' clause from the "#pragma omp taskloop if (num_tasks > 1)
priority(task_priority) num_tasks(num_tasks) untied" so tasks remain tied to the
creating thread and the std::lock_guard on pseudo_cost_mutex (and the
score_mutex lock later) is always released by the same thread that acquired it;
after removing 'untied', re-run tests and ensure no locks span long suspension
points (keep critical regions minimal around the sections using
pseudo_cost_mutex and score_mutex).

🧹 Nitpick comments (3)

cpp/src/dual_simplex/pseudo_costs.cpp (3)
274-284: Guard against division by near-zero fractional value.

If node_ptr->fractional_val is very close to an integer due to floating-point precision, frac could be extremely small, causing the pseudo-cost update to produce infinity or corrupt statistics. Consider adding a guard similar to the pattern suggested for other pseudo-cost divisions.
♻️ Suggested guard
   const f_t frac          = node_ptr->branch_dir == rounding_direction_t::DOWN
                               ? node_ptr->fractional_val - std::floor(node_ptr->fractional_val)
                               : std::ceil(node_ptr->fractional_val) - node_ptr->fractional_val;
+  constexpr f_t eps = 1e-9;
+  if (frac < eps) { return; }  // Skip update for near-integer values
   if (node_ptr->branch_dir == rounding_direction_t::DOWN) {
602-614: Consider using std::lock_guard for exception safety.

Manual lock()/unlock() calls are exception-unsafe. If any operation between them throws (even though unlikely here), the mutex remains locked. Using std::lock_guard provides automatic cleanup.
♻️ Suggested refactor
-    pseudo_cost_mutex[j].lock();
+    {
+      std::lock_guard<omp_mutex_t> lock(pseudo_cost_mutex[j]);
     if (pseudo_cost_num_down[j] != 0) {
       pseudo_cost_down = pseudo_cost_sum_down[j] / pseudo_cost_num_down[j];
     } else {
       pseudo_cost_down = pseudo_cost_down_avg;
     }

     if (pseudo_cost_num_up[j] != 0) {
       pseudo_cost_up = pseudo_cost_sum_up[j] / pseudo_cost_num_up[j];
     } else {
       pseudo_cost_up = pseudo_cost_up_avg;
     }
-    pseudo_cost_mutex[j].unlock();
+    }
Apply the same pattern in variable_selection at lines 354-366.
354-366: Consider using std::lock_guard for exception safety.

Same concern as in obj_estimate - manual lock()/unlock() is exception-unsafe. Using std::lock_guard provides automatic cleanup.

cpp/src/dual_simplex/pseudo_costs.cpp

coderabbitai

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (3)

cpp/src/dual_simplex/pseudo_costs.cpp (2)

273-283: Guard pseudo‑cost updates against near‑zero fractional distance.

frac can be ~0 with near‑integer values, producing inf/NaN pseudo‑costs and destabilizing future scoring. Add an epsilon guard before dividing. As per coding guidelines, this improves numerical stability.

🐛 Proposed fix

   const f_t frac          = node_ptr->branch_dir == rounding_direction_t::DOWN
                               ? node_ptr->fractional_val - std::floor(node_ptr->fractional_val)
                               : std::ceil(node_ptr->fractional_val) - node_ptr->fractional_val;
+  constexpr f_t eps = 1e-9;
+  if (std::abs(frac) <= eps) { return; }
   if (node_ptr->branch_dir == rounding_direction_t::DOWN) {
     pseudo_cost_sum_down[node_ptr->branch_var] += change_in_obj / frac;
     pseudo_cost_num_down[node_ptr->branch_var]++;

375-390: Avoid OOB log access when all scores are NaN/inf.

If any score[k] becomes NaN/inf, select can remain -1, and score[select] becomes OOB (static analysis flagged this). Guard selection and logging.

🐛 Proposed fix

-  f_t max_score  = -1;
-  i_t select     = -1;
+  f_t max_score  = -std::numeric_limits<f_t>::infinity();
+  i_t select     = -1;
 
   for (i_t k = 0; k < num_fractional; k++) {
-    if (score[k] > max_score) {
+    if (std::isfinite(score[k]) && score[k] > max_score) {
       max_score  = score[k];
       branch_var = fractional[k];
       select     = k;
     }
   }
 
-  log.debug("Pseudocost branching on %d. Value %e. Score %e.\n",
-            branch_var,
-            solution[branch_var],
-            score[select]);
+  if (select < 0) {
+    log.debug("Pseudocost branching on %d. Value %e. Score NaN/inf.\n",
+              branch_var,
+              solution[branch_var]);
+    return branch_var;
+  }
+  log.debug("Pseudocost branching on %d. Value %e. Score %e.\n",
+            branch_var,
+            solution[branch_var],
+            score[select]);

cpp/src/dual_simplex/branch_and_bound.cpp (1)

281-286: Avoid divide‑by‑zero in iter/node logging.

If nodes_explored == 0, iter_node becomes inf/NaN. Guard the denominator.

🐛 Proposed fix

-  i_t nodes_explored   = exploration_stats_.nodes_explored;
+  i_t nodes_explored   = exploration_stats_.nodes_explored;
   i_t nodes_unexplored = exploration_stats_.nodes_unexplored;
   f_t user_obj         = compute_user_objective(original_lp_, obj);
   f_t user_lower       = compute_user_objective(original_lp_, lower_bound);
-  f_t iter_node        = (f_t)exploration_stats_.total_lp_iters / nodes_explored;
+  const i_t nodes_explored_safe = std::max<i_t>(nodes_explored, 1);
+  f_t iter_node        = (f_t)exploration_stats_.total_lp_iters / nodes_explored_safe;

nguidotti and others added 30 commits December 12, 2025 10:22

implemented diving heuristics. sorted starting nodes for diving based…

c9081a6

… on the objective pseudcost estimate.

moved diving heuristics to a separated file

046a501

organized code. added toggle to disable each type of diving.

6ea6d72

restrict calling RINS to the best-first threads

5422b97

fix invalid branch var in line search diving

73c1a63

moved asserts

3a77cca

replace inf and max with STL calls

0f7af4e

Fix incorrect infeasible list

79368c3

implemented diving heuristics. sorted starting nodes for diving based…

6334ad7

… on the objective pseudcost estimate.

moved diving heuristics to a separated file

2c94a7c

organized code. added toggle to disable each type of diving.

0e815e1

moved asserts

29a2a33

unified global node heap. fathom node in the diving node that was alr…

5a3ef60

…eady explored.

refactoring code

c181ccf

fix style

501c20d

small fixes

0ee757a

unified the best-first and diving heap into a single object. imposed …

546b116

…iteration and node limit to the diving threads.

adjusted column spacing in bnb logs. added opening mode for logger.

06d531a

revert fixes for dual simplex. changed RINS and SubMIP to use guided …

ae742c2

…diving

moved bound propagation logs to debug mode

b5f2c7e

addressing code rabbit suggestions

eb5c695

added explicit conversion to float

5cf5ac0

missing code revert in basis update

d7dfb0d

fixed variable type

55e64cc

Merge remote-tracking branch 'origin/main' into node-queue

a26a816

Merge remote-tracking branch 'origin/main' into diving-heuristics

17d3d7c

added comments

0b1e994

added missing spacing

9effdc8

updated logs

6d43e03

Merge branch 'main' into adjust-bnb-logs

98f670f

nguidotti added 3 commits January 26, 2026 18:18

set the reliable threshold dynamically

31b5285

fixed incorrect threshold formula. fixed time limit violation.

7bac62d

simplified parallel loop

a6e055c

added single-threaded mode for rins and submip

de4389b

coderabbitai bot reviewed Jan 27, 2026

View reviewed changes

nguidotti added 2 commits January 27, 2026 17:04

added missing mutexes

eb51080

fixed empty vector in shuffle

f3567cb

nguidotti changed the title ~~[WIP] Reliability Branching~~ Master-Worker Model + Reliability Branching Jan 27, 2026

nguidotti marked this pull request as ready for review January 27, 2026 17:06

nguidotti requested a review from a team as a code owner January 27, 2026 17:06

nguidotti requested review from akifcorduk and chris-maes January 27, 2026 17:06

coderabbitai bot reviewed Jan 27, 2026

View reviewed changes

reverted some code changes

233933f

coderabbitai bot reviewed Jan 27, 2026

View reviewed changes

cpp/src/dual_simplex/pseudo_costs.cpp Show resolved Hide resolved

replaced with lock_guards

527754c

coderabbitai bot reviewed Jan 27, 2026

View reviewed changes

cpp/src/dual_simplex/pseudo_costs.cpp Outdated Show resolved Hide resolved

fixed crash

2251b87

coderabbitai bot reviewed Jan 27, 2026

View reviewed changes

Master-Worker Model + Reliability Branching #766

Are you sure you want to change the base?

Master-Worker Model + Reliability Branching #766

Conversation

nguidotti commented Jan 13, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reference

Checklist

Summary by CodeRabbit

Uh oh!

nguidotti commented Jan 26, 2026

Uh oh!

nguidotti commented Jan 27, 2026

Uh oh!

nguidotti commented Jan 27, 2026

Uh oh!

nguidotti commented Jan 27, 2026

Uh oh!

coderabbitai bot commented Jan 27, 2026

Uh oh!

nguidotti commented Jan 27, 2026

Uh oh!

nguidotti commented Jan 27, 2026

Uh oh!

coderabbitai bot commented Jan 27, 2026

Uh oh!

coderabbitai bot commented Jan 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jan 27, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

nguidotti commented Jan 27, 2026

Uh oh!

coderabbitai bot commented Jan 27, 2026

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

nguidotti commented Jan 13, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Jan 27, 2026 •

edited

Loading