-
Notifications
You must be signed in to change notification settings - Fork 427
OCPBUGS-69447: feat(updates): enable CVO metrics access with RHOBS monitoring flag #7399
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
When --rhobs-monitoring=true is set (for ROSA HCP), automatically enable CVO access to OBO Prometheus for conditional update risk evaluation. The CVO deployment logic routes to different metrics endpoints based on the monitoring stack: - RHOBS stack (ROSA HCP): http://hypershift-monitoring-stack-prometheus.openshift-observability-operator.svc:9090 - CoreOS stack (Self-managed HyperShift on OpenShift): https://thanos-querier.openshift-monitoring.svc:9092 Key changes: - CVO deployment enables metrics access when either --rhobs-monitoring (for ROSA HCP) or --enable-cvo-management-cluster-metrics-access (for self-managed HyperShift on OpenShift) is set - Network policies updated to allow egress to the appropriate monitoring endpoint based on stack configuration - Flag description updated to document automatic CVO metrics access behavior - Flags remain mutually exclusive to prevent misconfiguration
|
@Chee-Lu: This pull request references OCM-10395 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the epic to target the "4.22.0" version, but no target version was set. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
Important Review skippedAuto reviews are limited based on label configuration. 🚫 Excluded labels (none allowed) (1)
Please check the settings in the CodeRabbit UI or the You can disable this status message by setting the WalkthroughThe changes add RHOBS (Red Hat Observability Services) monitoring support by introducing environment variable-driven logic to conditionally enable metrics access across installation, CVO deployment, and network policy components. This includes documentation updates, conditional metrics configuration with endpoint URL selection, and dynamic network policy rule generation based on the monitoring stack. Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes
Comment |
|
@Chee-Lu: This pull request references OCM-10395 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the epic to target the "4.22.0" version, but no target version was set. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
@Chee-Lu: This pull request references OCM-10395 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the epic to target the "4.22.0" version, but no target version was set. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
/auto-cc |
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: Chee-Lu The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
|
Hi @Chee-Lu. Thanks for your PR. I'm waiting for a github.com member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
@Chee-Lu: This pull request references OCM-10395 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the epic to target the "4.22.0" version, but no target version was set. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
@Chee-Lu: This pull request references OCM-10395 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the epic to target the "4.22.0" version, but no target version was set. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
@coderabbitai review |
✅ Actions performedReview triggered.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
control-plane-operator/controllers/hostedcontrolplane/v2/cvo/deployment.go (1)
102-119: Consider extracting duplicate enableMetricsAccess check.The
enableMetricsAccessvariable is computed identically on lines 33 and 103. Consider computing it once at the function start and reusing it to reduce duplication.The conditional metrics URL configuration looks correct, routing to the appropriate monitoring endpoint (RHOBS Prometheus vs CoreOS Thanos) with the correct protocols.
Apply this diff to eliminate the duplicate check:
func (cvo *clusterVersionOperator) adaptDeployment(cpContext component.WorkloadContext, deployment *appsv1.Deployment) error { + // Enable CVO metrics access if either RHOBS monitoring is enabled or the explicit flag is set + enableMetricsAccess := os.Getenv(rhobsmonitoring.EnvironmentVariable) == "1" || cvo.enableCVOManagementClusterMetricsAccess + - // Enable CVO metrics access if either RHOBS monitoring is enabled or the explicit flag is set - enableMetricsAccess := os.Getenv(rhobsmonitoring.EnvironmentVariable) == "1" || cvo.enableCVOManagementClusterMetricsAccess - if enableMetricsAccess { if deployment.Spec.Template.Labels == nil { deployment.Spec.Template.Labels = map[string]string{} } deployment.Spec.Template.Labels[config.NeedMetricsServerAccessLabel] = "true" deployment.Spec.Template.Spec.ServiceAccountName = ComponentName } featureSet := configv1.Default if cpContext.HCP.Spec.Configuration != nil && cpContext.HCP.Spec.Configuration.FeatureGate != nil { featureSet = cpContext.HCP.Spec.Configuration.FeatureGate.FeatureSet } // ... (rest of the function) util.UpdateContainer(ComponentName, deployment.Spec.Template.Spec.Containers, func(c *corev1.Container) { util.UpsertEnvVar(c, corev1.EnvVar{ Name: "RELEASE_IMAGE", Value: dataPlaneReleaseImage, }) if updateService := cpContext.HCP.Spec.UpdateService; updateService != "" { c.Args = append(c.Args, "--update-service", string(updateService)) } - // Enable CVO metrics access if either RHOBS monitoring is enabled or the explicit flag is set - enableMetricsAccess := os.Getenv(rhobsmonitoring.EnvironmentVariable) == "1" || cvo.enableCVOManagementClusterMetricsAccess - if enableMetricsAccess { c.Args = append(c.Args, "--use-dns-for-services=true") // Configure metrics endpoint based on monitoring stack var metricsURL string if os.Getenv(rhobsmonitoring.EnvironmentVariable) == "1" { // RHOBS Prometheus uses HTTP without TLS metricsURL = fmt.Sprintf("http://hypershift-monitoring-stack-prometheus.openshift-observability-operator.svc:9090?namespace=%s", cpContext.HCP.Namespace) } else { // CoreOS Thanos uses HTTPS with service CA c.Args = append(c.Args, "--metrics-ca-bundle-file=/var/run/secrets/kubernetes.io/serviceaccount/service-ca.crt") metricsURL = fmt.Sprintf("https://thanos-querier.openshift-monitoring.svc:9092?namespace=%s", cpContext.HCP.Namespace) } c.Args = append(c.Args, fmt.Sprintf("--metrics-url=%s", metricsURL)) } }) return nil }
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
Cache: Disabled due to data retention organization setting
Knowledge base: Disabled due to Reviews -> Disable Knowledge Base setting
📒 Files selected for processing (3)
cmd/install/install.go(1 hunks)control-plane-operator/controllers/hostedcontrolplane/v2/cvo/deployment.go(4 hunks)hypershift-operator/controllers/hostedcluster/network_policies.go(3 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**
⚙️ CodeRabbit configuration file
-Focus on major issues impacting performance, readability, maintainability and security. Avoid nitpicks and avoid verbosity.
Files:
cmd/install/install.gocontrol-plane-operator/controllers/hostedcontrolplane/v2/cvo/deployment.gohypershift-operator/controllers/hostedcluster/network_policies.go
🔇 Additional comments (4)
cmd/install/install.go (1)
272-272: LGTM: Documentation clarifies RHOBS monitoring behavior.The updated flag description accurately reflects the additional CVO metrics access capability enabled when
--rhobs-monitoringis set.control-plane-operator/controllers/hostedcontrolplane/v2/cvo/deployment.go (1)
32-41: LGTM: Metrics access properly gated.The logic correctly enables metrics access when either RHOBS monitoring is active (via environment variable) or the explicit flag is set. The label and service account configuration support the network policy egress rules.
hypershift-operator/controllers/hostedcluster/network_policies.go (2)
88-97: LGTM: Metrics access network policy properly gated.The logic correctly creates the metrics server network policy when either RHOBS monitoring is active or the explicit flag is set, consistent with the CVO deployment configuration.
836-910: LGTM: Network policy properly routes to monitoring stack.The refactored function correctly creates egress rules based on the monitoring stack:
- RHOBS: targets OBO Prometheus (port 9090)
- CoreOS: targets Thanos Querier (port 9092)
This aligns with the CVO deployment changes that configure the corresponding metrics URLs. The pod and namespace selectors appropriately identify the monitoring components.
|
@Chee-Lu: This pull request references Jira Issue OCPBUGS-69447, which is invalid:
Comment The bug has been updated to refer to the pull request using the external bug tracker. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
control-plane-operator/controllers/hostedcontrolplane/v2/cvo/deployment.go
Outdated
Show resolved
Hide resolved
|
/jira refresh |
|
@Chee-Lu: This pull request references Jira Issue OCPBUGS-69447, which is valid. The bug has been moved to the POST state. 3 validation(s) were run on this bug
No GitHub users were found matching the public email listed for the QA contact in Jira ([email protected]), skipping review request. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
| var metricsURL string | ||
| if os.Getenv(rhobsmonitoring.EnvironmentVariable) == "1" { | ||
| // RHOBS Prometheus uses HTTP without TLS | ||
| metricsURL = fmt.Sprintf("http://hypershift-monitoring-stack-prometheus.openshift-observability-operator.svc:9090?namespace=%s", cpContext.HCP.Namespace) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure that in ROSA land, the monitoring stack honors the namespace parameter (in fact I'd bet that it doesn't). Also hardcoding the URL might not work for other environments (e.g. GCP/ARO).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the feedback!
Also hardcoding the URL might not work for other environments (e.g. GCP/ARO).
ARO and GCP have separate tickets for their implementations(Ticket for ARO).
How should I scope this to ROSA only while leaving room for ARO/GCP teams to add their platform-specific configurations later? Or do you have other suggestions on how to handle this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How should I scope this to ROSA only while leaving room for ARO/GCP teams to add their platform-specific configurations later? Or do you have other suggestions on how to handle this?
I'm not an HyperShift engineer so I can't really tell what's the best option. But anyway I've got bigger concerns about the design itself as I'm not sure that it will work as expected...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And about the URL, my gut feeling is that it isn't an HyperShift concern but rather something that would be provided via configuration (e.g what if ROSA decides to change the service name?).
control-plane-operator/controllers/hostedcontrolplane/v2/cvo/deployment.go
Outdated
Show resolved
Hide resolved
| var metricsURL string | ||
| if os.Getenv(rhobsmonitoring.EnvironmentVariable) == "1" { | ||
| // RHOBS Prometheus uses HTTP without TLS | ||
| metricsURL = fmt.Sprintf("http://hypershift-monitoring-stack-prometheus.openshift-observability-operator.svc:9090?namespace=%s", cpContext.HCP.Namespace) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can determine this using a check like the following:
func isROSAHCP(hc *hyperv1.HostedCluster) bool {
if hc.Spec.Platform.AWS == nil {
return false
}
for _, tag := range hc.Spec.Platform.AWS.ResourceTags {
if tag.Key == "red-hat-managed" && tag.Value == "true" {
return true
}
}
return false
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, added the checking to CVO and network policies
Remove redundant variable declaration on line 103 that duplicated the enableMetricsAccess variable already defined on line 33. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <[email protected]>
Add ROSA HCP detection to ensure RHOBS monitoring configuration only applies to ROSA HCP clusters. Non-ROSA clusters (ARO, GCP, self-managed) will continue using the default CoreOS Thanos metrics stack. Changes: - Add isROSAHCP() function to detect ROSA HCP clusters via red-hat-managed tag - Update CVO deployment to use RHOBS Prometheus only for ROSA HCP - Update metrics server network policy to scope RHOBS egress rules to ROSA HCP - Clarify documentation to distinguish ROSA HCP vs self-managed HyperShift 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <[email protected]>
17190ba to
c7c5b9c
Compare
What this PR does:
When
--rhobs-monitoring=trueis set (for ROSA HCP), enable CVO access to OBO Prometheus for conditional update risk evaluation.The CVO deployment logic routes to different metrics endpoints based on the monitoring stack:
http://hypershift-monitoring-stack-prometheus.openshift-observability-operator.svc:9090https://thanos-querier.openshift-monitoring.svc:9092Key changes:
--rhobs-monitoring(for ROSA HCP) or--enable-cvo-management-cluster-metrics-access(for self-managed HyperShift on OpenShift) is setWhich issue(s) this PR fixes:
fixes https://issues.redhat.com//browse/OCM-10395
fixes https://issues.redhat.com//browse/OCM-20970
Special notes for your reviewer:
Backport Requirements
This change should be backported to 4.20.z, 4.21.z as well to benefit customers on that version.