Skip to content

Conversation

@nadav-govari
Copy link
Collaborator

@nadav-govari nadav-govari commented Jan 7, 2026

Rebalances all shards from unavailable ingesters. Moves to only look at ingesters with more shards than allowed as candidates for rebalancing. Also moves to integer division to simplify target calculation.

@nadav-govari nadav-govari marked this pull request as ready for review January 8, 2026 03:23
shards
})
.collect();
let target_max = total_open_shards.div_ceil(num_available_ingesters);
Copy link
Collaborator

@fulmicoton fulmicoton Jan 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the point of using 0.9/1.1 was to add some tolerance to make sure rebalance do not happen too often.

let mut per_leader_open_shards: HashMap<&str, Vec<&Shard>> = HashMap::new();

let mut total_open_shards: usize = 0;
let mut orphaned_shards: Vec<Shard> = Vec::new();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

counting orphaned shards as needing to be moved is dangerous.

Right now moving a shard is done by opening a shard, and closing the shard moved.
Closing a shard requires contacting the node hosting it, so it can fail. In this exact case, it WILL fail because the ingester is not even in the pool.

Shards on inactive indexers cannot be moved, because they cannot be closed.
For this reason, we remove them from the equation.

Also, this PR reintroduce some leniency on our definition of what balanced
means.

We allow for a variation of 10% between the indexer with the highest number of
shards and the indexer with the lowest number of shards.

Signed-off-by: Nadav Gov-Ari <[email protected]>
@fulmicoton-dd fulmicoton-dd force-pushed the nadav/dailymotion branch 2 times, most recently from fac5569 to 640b9f1 Compare January 8, 2026 16:17
Comment on lines 3542 to 3543
/// - `active_ingester_shards`: shards per active ingester (these are in the pool)
/// - `inactive_ingester_shards`: shards per inactive ingester (NOT in the pool - orphaned)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/// - `active_ingester_shards`: shards per active ingester (these are in the pool)
/// - `inactive_ingester_shards`: shards per inactive ingester (NOT in the pool - orphaned)
/// - `available_ingester_shards`: shards per available ingester (shards whose leader is in the pool)
/// - `unavailable_ingester_shards`: shards per unavailable ingester (NOT in the pool - orphaned)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand the notion of orphaned. Aren't the shards whose leader is in NOT the pool the orphans already?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, the comments got jumbled in between revisions. Let me fix.

Comment on lines 3545 to 3546
active_ingester_shards: &[usize],
inactive_ingester_shards: &[usize],
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/active/available/ everywhere in this function.

Signed-off-by: Nadav Gov-Ari <[email protected]>
Signed-off-by: Nadav Gov-Ari <[email protected]>
Signed-off-by: Nadav Gov-Ari <[email protected]>
@nadav-govari nadav-govari enabled auto-merge (squash) January 8, 2026 18:46
@nadav-govari nadav-govari disabled auto-merge January 8, 2026 19:25
@nadav-govari nadav-govari enabled auto-merge (squash) January 8, 2026 19:44
@nadav-govari nadav-govari merged commit 09e7159 into main Jan 8, 2026
8 checks passed
@nadav-govari nadav-govari deleted the nadav/dailymotion branch January 8, 2026 19:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants