Skip to content

Provide ability to configure backup job retry limit in DevWorkspaceOperatorConfig #1579

@rohanKanojia

Description

@rohanKanojia

Description

Currently, the backup jobs created by the operator do not specify a backoffLimit, causing them to default to the Kubernetes standard of 6 retries. When a backup fails, this results in the creation of multiple failing pods (e.g., devworkspace-backup-xxxxx), which can clutter the namespace and consume unnecessary resources.

We need the ability to configure the .spec.backoffLimit for these backup jobs, ideally through the DevWorkspaceOperatorConfig (DWOC)'s backupConfig, to allow users to control the retry behavior.

Current failing backup pod behavior:

NAME                                     READY   STATUS    RESTARTS   AGE
devworkspace-backup-wwmkr-2fl56          0/1     Error     0          69s
devworkspace-backup-wwmkr-86g6g          0/1     Error     0          2m32s
devworkspace-backup-wwmkr-v6d4p          0/1     Error     0          3m39s
devworkspace-backup-wwmkr-vqxxh          0/1     Error     0          3m53s
devworkspace-backup-wwmkr-znz7k          0/1     Error     0          3m16s

Acceptance Criteria

  • Add a new field to the DevWorkspaceOperatorConfig (DWOC) to define the backoffLimit for backup jobs.
  • Update the backupcronjob_controller.go to inject this configured value into the Job .spec.backoffLimit.
  • If no value is specified in the DWOC, the system should either use the Kubernetes default (6) or a safe internal default.
  • Confirm that setting a lower backoffLimit (e.g., 1 or 2) successfully limits the number of pods created upon backup failure.

Additional Context

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions