[SECURITY] Add input validation and CSP headers#1834
[SECURITY] Add input validation and CSP headers#1834hari-kuriakose wants to merge 6 commits intomainfrom
Conversation
- Add server-side HTML/script injection validation for name and description fields across all user-facing serializers (CWE-20) - Add Content-Security-Policy header via Django middleware (API) and nginx config (frontend) to mitigate XSS and data injection attacks - Change SESSION_COOKIE_SECURE and CSRF_COOKIE_SECURE defaults to True so cookies are only sent over HTTPS Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
WalkthroughAdds regex-based input sanitizers and integrates them as field-level validators across multiple backend serializers; introduces a Content-Security-Policy middleware and adds a comprehensive CSP header to the frontend nginx configuration to emit CSP on all responses. Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes 🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
for more information, see https://pre-commit.ci
There was a problem hiding this comment.
Actionable comments posted: 4
🧹 Nitpick comments (1)
backend/utils/tests/test_input_sanitizer.py (1)
7-69: Add a regression test for benignconnection=okinput.This will protect against false positives in event-handler detection.
Test addition
class TestValidateNoHtmlTags: @@ def test_rejects_event_handler_case_insensitive(self): with pytest.raises( ValidationError, match="must not contain event handler attributes" ): validate_no_html_tags("ONLOAD=alert(1)") + + def test_allows_benign_assignment_text(self): + assert validate_no_html_tags("connection=ok") == "connection=ok"🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/utils/tests/test_input_sanitizer.py` around lines 7 - 69, The tests are missing a regression case for benign strings like "connection=ok" that could be mis-detected as an event-handler; add a new test method in TestValidateNoHtmlTags (e.g., test_allows_connection_equals_ok) that asserts validate_no_html_tags("connection=ok") returns "connection=ok" (no ValidationError), ensuring the event-handler detection in validate_no_html_tags doesn't yield false positives for this pattern.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@backend/adapter_processor_v2/serializers.py`:
- Around line 32-43: The validate method in the serializer overrides parent
validation and omits calling super(), so parent logic in AuditSerializer is
bypassed; update the validate(self, data) implementation to first call validated
= super().validate(data) (or merge with returned value) and then apply the
adapter-specific checks (validate_name_field and validate_no_html_tags) on that
validated dict, finally returning the validated dict so the AuditSerializer
validations are preserved.
In `@backend/api_v2/serializers.py`:
- Around line 69-70: The validate_description method can raise TypeError if
value is None because validate_no_html_tags expects a str; update
validate_description (in the same serializer) to guard against None the same way
BaseAdapterSerializer.validate does — e.g., if value is None return value or
coerce to "" before calling validate_no_html_tags — then call
validate_no_html_tags(value, field_name="Description") so validate_no_html_tags
never receives None.
In `@backend/middleware/content_security_policy.py`:
- Around line 17-27: Change the unconditional header assignment in the
middleware so it doesn't clobber any existing route-specific CSP: instead of
directly setting response["Content-Security-Policy"], use
response.setdefault("Content-Security-Policy", <policy string>) so the default
policy (the same multi-line string currently assigned) is applied only when the
header is not already present; update the code in content_security_policy.py
where the header is set to use response.setdefault to preserve any prior CSP set
by views or earlier middleware.
In `@backend/utils/input_sanitizer.py`:
- Around line 10-20: EVENT_HANDLER_PATTERN is too broad and matches substrings
inside benign words (e.g., "connection=..."); update the regex used by
validate_no_html_tags to only match true HTML attribute contexts by changing
EVENT_HANDLER_PATTERN to require a word boundary or a preceding '<' or
whitespace before the attribute name (for example use a pattern like a
lookbehind for whitespace or '<' followed by "on" + letters and "=" with
re.IGNORECASE) so it no longer flags tokens like "connection=" while still
catching real event-handler attributes referenced in validate_no_html_tags.
---
Nitpick comments:
In `@backend/utils/tests/test_input_sanitizer.py`:
- Around line 7-69: The tests are missing a regression case for benign strings
like "connection=ok" that could be mis-detected as an event-handler; add a new
test method in TestValidateNoHtmlTags (e.g., test_allows_connection_equals_ok)
that asserts validate_no_html_tags("connection=ok") returns "connection=ok" (no
ValidationError), ensuring the event-handler detection in validate_no_html_tags
doesn't yield false positives for this pattern.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 2d27dfb1-b259-4452-adef-342aba0d383a
📒 Files selected for processing (13)
backend/account_v2/serializer.pybackend/adapter_processor_v2/serializers.pybackend/api_v2/serializers.pybackend/backend/settings/base.pybackend/connector_v2/serializers.pybackend/middleware/content_security_policy.pybackend/notification_v2/serializers.pybackend/prompt_studio/prompt_studio_core_v2/serializers.pybackend/utils/input_sanitizer.pybackend/utils/tests/__init__.pybackend/utils/tests/test_input_sanitizer.pybackend/workflow_manager/workflow_v2/serializers.pyfrontend/nginx.conf
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com> Signed-off-by: Hari John Kuriakose <hari@zipstack.com>
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com> Signed-off-by: Hari John Kuriakose <hari@zipstack.com>
for more information, see https://pre-commit.ci
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com> Signed-off-by: Hari John Kuriakose <hari@zipstack.com>
Frontend Lint Report (Biome)✅ All checks passed! No linting or formatting issues found. |
Test ResultsSummary
Runner Tests - Full Report
SDK1 Tests - Full Report
|
|
There was a problem hiding this comment.
Actionable comments posted: 1
♻️ Duplicate comments (1)
backend/utils/input_sanitizer.py (1)
10-20:⚠️ Potential issue | 🟠 MajorNarrow
EVENT_HANDLER_PATTERNfurther.This still rejects benign
on... =text such asoncall = primaryoronboarding = enabled, so valid names/descriptions can fail across every serializer using this helper. Please constrain the match to actual HTML attribute contexts or a vetted set of real DOM event names instead of anyon\w+=token.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/utils/input_sanitizer.py` around lines 10 - 20, EVENT_HANDLER_PATTERN is too broad and rejects benign tokens like "oncall" or "onboarding"; narrow it by matching only real DOM event names or actual HTML attribute contexts. Update EVENT_HANDLER_PATTERN (used by validate_no_html_tags) to either (a) match a vetted list of event names (e.g., click, change, load, submit, mouseover, keydown, input, focus, blur, etc.) as anchors like on(click|change|...) followed by optional whitespace and '=', or (b) require HTML attribute context by ensuring the token appears as an attribute (e.g., preceded by '<tag ' or whitespace within a tag) before the '='; replace the existing pattern with one of these stricter patterns and add a unit test demonstrating that "oncall = primary" and "onboarding = enabled" pass while real event attributes like "onclick=" are still rejected.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@backend/middleware/content_security_policy.py`:
- Around line 6-22: The CSP middleware's process_response currently sets
"script-src 'self'" and "style-src 'self'", which blocks the inline <script>
used by the login page; update the header value set in process_response to
permit inline scripts/styles (e.g., add 'unsafe-inline' to both script-src and
style-src) or implement a nonce-based CSP and inject matching nonces into the
login template rendered by authentication_service.render; also update the
middleware docstring to remove the incorrect "JSON API backend" claim. Use the
process_response function and the "Content-Security-Policy" header string as the
change points, or alternatively refactor backend/account_v2/templates/login.html
to use external JS and then keep the stricter CSP.
---
Duplicate comments:
In `@backend/utils/input_sanitizer.py`:
- Around line 10-20: EVENT_HANDLER_PATTERN is too broad and rejects benign
tokens like "oncall" or "onboarding"; narrow it by matching only real DOM event
names or actual HTML attribute contexts. Update EVENT_HANDLER_PATTERN (used by
validate_no_html_tags) to either (a) match a vetted list of event names (e.g.,
click, change, load, submit, mouseover, keydown, input, focus, blur, etc.) as
anchors like on(click|change|...) followed by optional whitespace and '=', or
(b) require HTML attribute context by ensuring the token appears as an attribute
(e.g., preceded by '<tag ' or whitespace within a tag) before the '='; replace
the existing pattern with one of these stricter patterns and add a unit test
demonstrating that "oncall = primary" and "onboarding = enabled" pass while real
event attributes like "onclick=" are still rejected.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 20bde572-2bc1-45f4-a2ac-5ffc1c2e7e5f
📒 Files selected for processing (4)
backend/adapter_processor_v2/serializers.pybackend/api_v2/serializers.pybackend/middleware/content_security_policy.pybackend/utils/input_sanitizer.py
🚧 Files skipped from review as they are similar to previous changes (2)
- backend/api_v2/serializers.py
- backend/adapter_processor_v2/serializers.py
| """Middleware to add Content-Security-Policy header to all responses. | ||
|
|
||
| Since this is a JSON API backend, the policy is restrictive by default: | ||
| only 'self' is allowed for all directives, and no inline scripts or styles | ||
| are permitted. This prevents any injected content from being executed if a | ||
| response is ever rendered in a browser context. | ||
| """ | ||
|
|
||
| def process_response( | ||
| self, request: HttpRequest, response: HttpResponse | ||
| ) -> HttpResponse: | ||
| response.setdefault( | ||
| "Content-Security-Policy", | ||
| ( | ||
| "default-src 'self'; " | ||
| "script-src 'self'; " | ||
| "style-src 'self'; " |
There was a problem hiding this comment.
CSP blocks inline scripts required by the login page.
The docstring states this is a "JSON API backend" with no inline scripts permitted, but the backend serves HTML pages with inline JavaScript. Context snippet 3 shows backend/account_v2/templates/login.html contains an inline <script> block for the form submit spinner, and context snippet 4 confirms authentication_service.py renders these templates via Django's render().
The current script-src 'self' (line 21) without 'unsafe-inline' will block that inline script, breaking the login page functionality.
🐛 Proposed fix: Add 'unsafe-inline' for script-src and style-src
def process_response(
self, request: HttpRequest, response: HttpResponse
) -> HttpResponse:
response.setdefault(
"Content-Security-Policy",
(
"default-src 'self'; "
- "script-src 'self'; "
- "style-src 'self'; "
+ "script-src 'self' 'unsafe-inline'; "
+ "style-src 'self' 'unsafe-inline'; "
"img-src 'self'; "
"font-src 'self'; "
"connect-src 'self'; "
"frame-ancestors 'none'; "
"base-uri 'self'; "
"form-action 'self'"
),
)
return responseAlternatively, refactor the login template to use an external JS file or add a nonce-based CSP for stronger security. Also update the docstring to remove the misleading "JSON API backend" claim.
The AI summary states script-src 'self' 'unsafe-inline' but the actual code at line 21 shows script-src 'self' without 'unsafe-inline'.
🧰 Tools
🪛 Ruff (0.15.4)
[warning] 15-15: Unused method argument: request
(ARG002)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@backend/middleware/content_security_policy.py` around lines 6 - 22, The CSP
middleware's process_response currently sets "script-src 'self'" and "style-src
'self'", which blocks the inline <script> used by the login page; update the
header value set in process_response to permit inline scripts/styles (e.g., add
'unsafe-inline' to both script-src and style-src) or implement a nonce-based CSP
and inject matching nonces into the login template rendered by
authentication_service.render; also update the middleware docstring to remove
the incorrect "JSON API backend" claim. Use the process_response function and
the "Content-Security-Policy" header string as the change points, or
alternatively refactor backend/account_v2/templates/login.html to use external
JS and then keep the stricter CSP.



What
ContentSecurityPolicyMiddlewareto the middleware stackWhy
How
Input validation (
backend/utils/input_sanitizer.py):validate_name_field()andvalidate_no_html_tags()validators that reject HTML tags,javascript:protocols, and event handler attributes (onclick=, etc.)validate_<field>methods to serializers for all user-facing name/identifier and description fields:APIDeploymentSerializer—display_name,descriptionWorkflowSerializer—workflow_name,descriptionBaseAdapterSerializer—adapter_name,description(viavalidate()override)ConnectorInstanceSerializer—connector_nameNotificationSerializer—name(enhanced existing validator)CustomToolSerializer—tool_name,descriptionOrganizationSignupSerializer—name,display_nameprompt,preamble,postamble,summarize_prompt) are intentionally not validated — they must accept arbitrary text including code snippets for LLM extraction workflowsCSP headers:
backend/middleware/content_security_policy.py): Strict policy for the JSON API backend — all directives set to'self',frame-ancestors 'none'. Addedmiddleware.content_security_policy.ContentSecurityPolicyMiddlewareto thePRODUCTION_MIDDLEWARElist.frontend/nginx.conf): Policy tailored to the React SPA's third-party dependencies — Monaco Editor (cdn.jsdelivr.net), PDF.js (unpkg.com), PostHog, Google Tag Manager, reCAPTCHA, Stripe, Product Fruits, and WebSocket connections for Socket.IOCan this PR break any existing features. If yes, please list possible items. If no, please explain why. (PS: Admins do not merge the PR without this section filled)
<and>characters (e.g.My Workflow <v2>). Normal names with spaces, hyphens, underscores, parentheses, periods, and other punctuation are unaffected. Prompt fields are not validated.Database Migrations
Relevant Docs
Related Issues or PRs
Dependencies Versions
Dependencies Versions
Notes on Testing
backend/utils/tests/test_input_sanitizer.pycovering clean input, HTML tags, script tags, JS protocols, event handlers, whitespace stripping, and empty string rejectioncd backend && python -m pytest utils/tests/test_input_sanitizer.py -v<script>alert(1)</script>via API — should return 400Content-Security-Policyheader present in both API and frontend responsesScreenshots
N/A — backend/infrastructure changes only
Checklist
I have read and understood the Contribution Guidelines.