Add cross-SDK request validator test vectors and fix URL normalization#27
Add cross-SDK request validator test vectors and fix URL normalization#27
Conversation
There was a problem hiding this comment.
Pull request overview
This PR improves callback signature validation by normalizing “no-scheme” URLs (treating them as http://...) and adds a set of cross-SDK URL normalization signature vectors to prevent future drift.
Changes:
- Add URL normalization test vectors covering ports, query/fragment, and userinfo cases
- Update URL normalization to prepend
http://when the input URL has no scheme
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
| tests/test_request_validator.py | Adds parametrized pytest vectors asserting expected signatures for various URL normalization cases |
| src/didww/callback/request_validator.py | Adjusts _normalize_url to handle URLs without an explicit scheme by prepending http:// |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| if not re.match(r'^[a-zA-Z]+://', url): | ||
| url = 'http://' + url | ||
| parsed = urlparse(url) |
There was a problem hiding this comment.
The no-scheme detection via re.match(r'^[a-zA-Z]+://', url) is both too restrictive for valid URI schemes (RFC 3986 allows digits/+/./- after the first char) and it breaks scheme-relative URLs like //foo.com/bar (it would become http:////foo.com/bar, leading to an empty hostname during normalization). Consider using urlparse(url) first and only prepending when parsed.scheme is empty (special-case url.startswith('//') to prepend http:), or broaden the scheme regex to RFC-compliant form.
| if not re.match(r'^[a-zA-Z]+://', url): | |
| url = 'http://' + url | |
| parsed = urlparse(url) | |
| # Parse the URL first to determine whether a scheme is present. | |
| parsed = urlparse(url) | |
| if not parsed.scheme: | |
| # Handle scheme-relative URLs (e.g., //example.com/path) by | |
| # prefixing "http:" so that urlparse correctly identifies the host. | |
| if url.startswith("//"): | |
| url = "http:" + url | |
| else: | |
| url = "http://" + url | |
| parsed = urlparse(url) |
| URL_NORMALIZATION_VECTORS = [ | ||
| ("http://foo.com/bar", "4d1ce2be656d20d064183bec2ab98a2ff3981f73"), | ||
| ("http://foo.com:80/bar", "4d1ce2be656d20d064183bec2ab98a2ff3981f73"), | ||
| ("http://foo.com:443/bar", "904eaa65c0759afac0e4d8912de424e2dfb96ea1"), | ||
| ("http://foo.com:8182/bar", "eb8fcfb3d7ed4b4c2265d73cf93c31ba614384d1"), | ||
| ("foo.com/bar", "4d1ce2be656d20d064183bec2ab98a2ff3981f73"), | ||
| ("http://foo.com/bar?baz=boo", "78b00717a86ce9df06abf45ff818aa94537e1729"), | ||
| ("http://user:pass@foo.com/bar", "88615a11a78c021c1da2e1e0bfb8cc165170afc5"), | ||
| ("http://foo.com/bar#test", "b1c4391fcdab7c0521bb5b9eb4f41f08529b8418"), | ||
| ("https://foo.com/bar", "f26a771c302319a7094accbe2989bad67fff2928"), | ||
| ("https://foo.com:443/bar", "f26a771c302319a7094accbe2989bad67fff2928"), | ||
| ("https://foo.com:80/bar", "bd45af5253b72f6383c6af7dc75250f12b73a4e1"), | ||
| ("https://foo.com:8384/bar", "9c9fec4b7ebd6e1c461cb8e4ffe4f2987a19a5d3"), | ||
| ("https://foo.com/bar?qwe=asd", "4a0e98ddf286acadd1d5be1b0ed85a4e541c3137"), | ||
| ("https://qwe:asd@foo.com/bar", "7a8cd4a6c349910dfecaf9807e56a63787250bbd"), | ||
| ("https://foo.com/bar#baz", "5024919770ea5ca2e3ccc07cb940323d79819508"), |
There was a problem hiding this comment.
Add a normalization vector for a scheme-relative URL (e.g. //foo.com/bar) to lock in the intended behavior for “no-scheme” inputs. With the current implementation, this form is normalized incorrectly (hostname becomes empty), and a test vector here would prevent regressions once the normalizer is adjusted.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| if not re.match(r'^[a-zA-Z]+://', url): | ||
| url = 'http://' + url # NOSONAR |
There was a problem hiding this comment.
String quoting is inconsistent with the rest of this module and the surrounding codebase (which predominantly uses double quotes). For consistency, consider changing the new raw regex string and the http:// literal to use double quotes.
| if not re.match(r'^[a-zA-Z]+://', url): | |
| url = 'http://' + url # NOSONAR | |
| if not re.match(r"^[a-zA-Z]+://", url): | |
| url = "http://" + url # NOSONAR |
Add NOSONAR comments to http:// URLs in request_validator.py and test_request_validator.py to suppress false-positive S5332 hotspots. These URLs are used for signature validation logic and test vectors, not for actual network communication.
b192e87 to
8bdd149
Compare
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 1 out of 1 changed files in this pull request and generated 1 comment.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| URL_NORMALIZATION_VECTORS = [ | ||
| ("http://foo.com/bar", "4d1ce2be656d20d064183bec2ab98a2ff3981f73"), # NOSONAR | ||
| ("http://foo.com:80/bar", "4d1ce2be656d20d064183bec2ab98a2ff3981f73"), # NOSONAR | ||
| ("http://foo.com:443/bar", "904eaa65c0759afac0e4d8912de424e2dfb96ea1"), # NOSONAR | ||
| ("http://foo.com:8182/bar", "eb8fcfb3d7ed4b4c2265d73cf93c31ba614384d1"), # NOSONAR | ||
|
|
||
| ("http://foo.com/bar?baz=boo", "78b00717a86ce9df06abf45ff818aa94537e1729"), # NOSONAR | ||
| ("http://user:pass@foo.com/bar", "88615a11a78c021c1da2e1e0bfb8cc165170afc5"), # NOSONAR | ||
| ("http://foo.com/bar#test", "b1c4391fcdab7c0521bb5b9eb4f41f08529b8418"), # NOSONAR | ||
| ("https://foo.com/bar", "f26a771c302319a7094accbe2989bad67fff2928"), | ||
| ("https://foo.com:443/bar", "f26a771c302319a7094accbe2989bad67fff2928"), | ||
| ("https://foo.com:80/bar", "bd45af5253b72f6383c6af7dc75250f12b73a4e1"), | ||
| ("https://foo.com:8384/bar", "9c9fec4b7ebd6e1c461cb8e4ffe4f2987a19a5d3"), | ||
| ("https://foo.com/bar?qwe=asd", "4a0e98ddf286acadd1d5be1b0ed85a4e541c3137"), | ||
| ("https://qwe:asd@foo.com/bar", "7a8cd4a6c349910dfecaf9807e56a63787250bbd"), | ||
| ("https://foo.com/bar#baz", "5024919770ea5ca2e3ccc07cb940323d79819508"), | ||
| ] |
There was a problem hiding this comment.
URL_NORMALIZATION_VECTORS currently contains 14 entries and none of them covers the PR’s stated “no-schema URL handling” (e.g., foo.com/bar that should normalize as http://foo.com:80/bar). Either add a no-scheme test vector (likely the intended 15th case) with the correct expected signature, or update the PR description to match what’s actually being tested.
Add IPv6, empty path, percent-encoded path vectors from Go SDK. Remove no-scheme URL support. Add NOSONAR comments for SonarCloud.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| if not path: | ||
| path = "/" |
There was a problem hiding this comment.
path = parsed.path or "/" already guarantees a non-empty path, so the subsequent if not path: path = "/" is redundant. Consider removing the extra check to keep _normalize_url simpler and avoid duplicated logic.
| if not path: | |
| path = "/" |
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
tests/test_request_validator.py
Outdated
| ("http://[::1]:80/bar", "e0e9b83e4046d097f54b3ae64b08cbb4a539f601"), | ||
| ("http://[::1]:9090/bar", "ebec110ec5debd0e0fd086ff2f02e48ca665b543"), | ||
| ("https://[::1]/bar", "f3cfe6f523fdf1d4eaadc310fcd3ed92e1e324b0"), | ||
| ("http://foo.com/hello%20world", "eb64035b2e8f356ff1442898a39ec94d5c3e2fc8"), # NOSONAR |
There was a problem hiding this comment.
In URL_NORMALIZATION_VECTORS, the tuple at line 26 is not indented consistently with the rest of the list items. While it still parses, it makes the test vectors harder to scan and is easy to miss during edits; align its indentation with the other entries for consistency.
| ("http://foo.com/hello%20world", "eb64035b2e8f356ff1442898a39ec94d5c3e2fc8"), # NOSONAR | |
| ("http://foo.com/hello%20world", "eb64035b2e8f356ff1442898a39ec94d5c3e2fc8"), # NOSONAR |
dc59985 to
a24704e
Compare
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| def test_sandbox(self): | ||
| validator = RequestValidator("SOMEAPIKEY") | ||
| url = "http://example.com/callback.php?id=7ae7c48f-d48a-499f-9dc1-c9217014b457&reject_reason=&status=approved&type=address_verifications" | ||
| url = "http://example.com/callback.php?id=7ae7c48f-d48a-499f-9dc1-c9217014b457&reject_reason=&status=approved&type=address_verifications" # NOSONAR |
There was a problem hiding this comment.
In test_sandbox, # NOSONAR was added to the long URL literal here, but the expected signature literal later in the same test remains unsuppressed. If the goal is SonarCloud compliance for hard-coded token/secret patterns, consider applying suppression consistently within this test (or centralizing the test vector).
|



Summary
_normalize_url(bracket wrapping)Test plan