fix(server): use ALPN-negotiated protocol for TLS connections#574
fix(server): use ALPN-negotiated protocol for TLS connections#574cluster2600 wants to merge 2 commits intoNVIDIA:mainfrom
Conversation
|
All contributors have signed the DCO ✍️ ✅ |
|
Thank you for your interest in contributing to OpenShell, @cluster2600. This project uses a vouch system for first-time contributors. Before submitting a pull request, you need to be vouched by a maintainer. To get vouched:
See CONTRIBUTING.md for details. |
|
I have read the DCO document and I hereby sign the DCO. |
|
Hi there — just a quick note to say we've done the needful on our end. The DCO is signed, and we've opened a vouch request as well. Happy to address any feedback on the implementation. The fix has been compiled and tested on our DGX Spark (aarch64) — 199 server lib tests passing, no regressions. Cheers! |
|
Remote validation on the DGX host after the rustfmt fix:
The ALPN/TLS routing change looks sound on the DGX box. The remaining workspace failure appears environmental rather than related to this change. |
The gateway server used hyper's auto-detection (byte-sniffing the HTTP/2 connection preface) for all connections, including TLS connections where ALPN already negotiated the protocol. This could misidentify h2 connections as HTTP/1.1 when the first read returned a partial preface, making gRPC unusable. Add MultiplexService::serve_h2() which starts the HTTP/2 state machine directly, bypassing auto-detection. After TLS handshake, check the ALPN-negotiated protocol and route h2 connections to serve_h2(). Plaintext connections continue using auto-detection. Also add TLS status logging (cert paths, ALPN config) on startup and per-connection ALPN protocol logging at debug level. Closes NVIDIA#535
4f7656b to
a97dd49
Compare
|
Are you encountering #535 directly? The fix here assumes TLS is starting up which was not the case for #535. The PR itself, to me, does not match the issue you are linking it to. I'm also trying to understand the bug you are describing in Hyper specifically on version 0.1.20. That does not seem to be the case from what I can tell. Please clarify if you directly encountered #535 and this PR fixed the issue. For now I will close the PR. |
Summary
Resolves #535 — the gateway server ignored the ALPN-negotiated protocol on TLS connections, falling back to byte-sniffing auto-detection which could misidentify HTTP/2 connections as HTTP/1.1, rendering gRPC unusable.
MultiplexService::serve_h2()which starts the HTTP/2 state machine directly, bypassing auto-detectionh2connections toserve_h2()Root Cause
Hyper's auto-detection works by reading the first bytes of a connection and checking for the HTTP/2 connection preface (
PRI * HTTP/2.0). On TLS connections, if the firstread()returns a partial preface (due to buffering, MTU fragmentation, or TLS record boundaries), the auto-detector falls through to HTTP/1.1 — even though ALPN already negotiatedh2during the TLS handshake.graph TD subgraph "Before (broken)" A[TLS handshake<br/>ALPN negotiates h2] --> B[hyper auto-detect] B -->|"partial preface read"| C[HTTP/1.1 assumed] B -->|"full preface read"| D[HTTP/2] C --> E["gRPC fails ❌"] end subgraph "After (fixed)" F[TLS handshake<br/>ALPN negotiates h2] --> G{ALPN protocol?} G -->|"h2"| H["serve_h2() — direct HTTP/2"] G -->|"other/none"| I[hyper auto-detect fallback] H --> J["gRPC works ✅"] endChanges
crates/openshell-server/src/lib.rsh2toserve_h2(); add TLS startup loggingcrates/openshell-server/src/multiplex.rsALPN_H2constant; implementserve_h2()usinghyper::server::conn::http2::Builderdirectlycrates/openshell-server/src/tls.rsfrom_files_rejects_missing_cert,load_certs_rejects_nonexistent_file,load_key_rejects_nonexistent_file)Test Plan
cargo test -p openshell-server --lib— 199 passed, 0 failedalpn_h2_constant_matches_standard_protocol_id— verifies theALPN_H2constant equalsb"h2"tls.rssequenceDiagram participant Client participant TLS as TLS Acceptor participant Server as MultiplexService Client->>TLS: ClientHello (ALPN: h2, http/1.1) TLS->>Client: ServerHello (ALPN: h2) TLS->>Server: tls_stream + ALPN = "h2" alt ALPN == h2 Server->>Server: serve_h2(tls_stream) Note over Server: HTTP/2 state machine starts immediately else ALPN != h2 or absent Server->>Server: serve(tls_stream) Note over Server: byte-sniffing auto-detection end Client->>Server: gRPC request (HTTP/2 POST) Server->>Client: gRPC response