Skip to content

Add background task to monitor and cancel stuck pending pods#1167

Open
APErebus wants to merge 4 commits intomainfrom
ap/pending-pod-timeout
Open

Add background task to monitor and cancel stuck pending pods#1167
APErebus wants to merge 4 commits intomainfrom
ap/pending-pod-timeout

Conversation

@APErebus
Copy link
Contributor

@APErebus APErebus commented Nov 21, 2025

Background

We have situation where the script pods can't be scheduled and remain in a Pending state. This is a problem because unless you have a step level timeout configured, the deployment will run for ever.

Results

Adds a new PendingPodWatchDog task that monitors all pods and if they are pending for more than a set period, then they are marked as timedout (which then causes them to be deleted).

Will be controlled by a value in the Helm Chart, but if this is not set, the watchdog does not run

After

image

How to review this PR

Quality ✔️

Pre-requisites

  • I have read How we use GitHub Issues for help deciding when and where it's appropriate to make an issue.
  • I have considered informing or consulting the right people, according to the ownership map.
  • I have considered appropriate testing for my change.

@APErebus APErebus requested a review from a team as a code owner November 21, 2025 03:41
Copy link
Contributor

@zentron zentron left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense.
Is there any information or event data that would be useful logging?
If it is marked as completed and cleaned up, it may be harder to diagnose why it was hanging.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants