Skip to content

SSL/ACVP Test Integration for FIPS - second try#1560

Merged
Enmk merged 14 commits intoreleases/25.3.8-fipsfrom
features/25.3/fips-ch-binary-extended-testing
Mar 20, 2026
Merged

SSL/ACVP Test Integration for FIPS - second try#1560
Enmk merged 14 commits intoreleases/25.3.8-fipsfrom
features/25.3/fips-ch-binary-extended-testing

Conversation

@Enmk
Copy link
Copy Markdown
Member

@Enmk Enmk commented Mar 19, 2026

Changelog category (leave one):

  • Improvement

Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):

Added three sub-commands to clickhouse binary that facilitate verification of builtin ssl library: ssl-handshaker, ssl-shim, and acvp-server.

Documentation entry for user-facing changes

...

CI/CD Options

Exclude tests:

  • Fast test
  • Integration Tests
  • Stateless tests
  • Stateful tests
  • Performance tests
  • All with ASAN
  • All with TSAN
  • All with MSAN
  • All with UBSAN
  • All with Coverage
  • All with Aarch64
  • All Regression
  • Disable CI Cache

Regression jobs to run:

  • Fast suites (mostly <1h)
  • Aggregate Functions (2h)
  • Alter (1.5h)
  • Benchmark (30m)
  • ClickHouse Keeper (1h)
  • Iceberg (2h)
  • LDAP (1h)
  • Parquet (1.5h)
  • RBAC (1.5h)
  • SSL Server (1h)
  • S3 (2h)
  • S3 Export (2h)
  • Swarms (30m)
  • Tiered Storage (2h)

Basically same code as in #1503, but compiled in a safer way (no more --allow-multiple-definition, hardcoded compiler paths, etc).

Closes #1503

DimensionWieldr and others added 13 commits March 10, 2026 17:08
Made-with: Cursor
Signed-off-by: Julian Huang <jhuang@altinity.com>
Signed-off-by: Julian Huang <jhuang@altinity.com>
…ndshaker

Move the identical glibc_compat.c files from ssl-shim/ and ssl-handshaker/
into a shared programs/ssl-common/ directory. Also add the gtest include
path required by test_util.cc to both CMakeLists.

Signed-off-by: Julian Huang <jhuang@altinity.com>
The FIPS 2.0.0 shim sources do not include any gtest headers,
so this include path is not needed.

Signed-off-by: Julian Huang <jhuang@altinity.com>
Signed-off-by: Julian Huang <jhuang@altinity.com>
Gate ssl-shim/ssl-handshaker/acvp-server declarations in main.cpp with
per-target ENABLE_CLICKHOUSE_* defines (via config_tools.h) that match
the exact CMake conditions under which targets are created, preventing
unresolved symbols when FIPS_CLICKHOUSE is set without AWSLC_SRC_DIR
or on non-Linux platforms.

Move the --allow-multiple-definition linker flag from the global
clickhouse target into each of the three library targets as an INTERFACE
property, so the flag only enters the link when those specific libraries
are actually consumed.

Signed-off-by: Julian Huang <jhuang@altinity.com>
Made-with: Cursor
Replace the partial posix_spawn with the complete upstream musl
implementation (https://git.musl-libc.org/cgit/musl/tree/src/process/posix_spawn.c),
adapted for the glibc sysroot headers used by ClickHouse.

Key safety improvements from upstream:
- Pipe fd clobbering protection: if a file action targets the
  error-reporting pipe fd, dup it to an unoccupied fd first
- Close-on-exec set after file actions (pipe may have been moved)
- Block all signals before pipe2/clone; unblock after exec
- EPIPE-aware error reporting back to parent
- Support for POSIX_SPAWN_SETSID, SETPGROUP, RESETIDS, SETSIGDEF
- Larger stack (1024 + PATH_MAX)

Adaptations from upstream musl:
- Uses glibc sysroot field names (__ss/__sd vs __mask/__def)
- Keeps __posix_spawnx exec-function parameter (glibc attr has no __fn)
- Omits LOCK(__abort_lock) (musl-internal, not available)
- Omits __get_handler_set (musl-internal; signals are blocked for the
  child's brief pre-exec window so parent handlers cannot fire)
- Uses clone() instead of musl-internal __clone()

Signed-off-by: Julian Huang <jhuang@altinity.com>
Made-with: Cursor
…2, used only for ssl/acvp tests

Signed-off-by: Julian Huang <jhuang@altinity.com>
…t the rest of CH

Signed-off-by: Julian Huang <jhuang@altinity.com>
Signed-off-by: Julian Huang <jhuang@altinity.com>
…ading source twice

Signed-off-by: Julian Huang <jhuang@altinity.com>
Signed-off-by: Julian Huang <jhuang@altinity.com>
Compile ssl-shim, ssl-handshaker, and acvp-server test harness code
inside the Docker build rather than with manual /usr/bin/c++ invocations
in CMake. A new build_test_harness.sh script does partial linking
(ld -r) with libstdc++ and then uses objcopy --prefix-symbols to
namespace all internal symbols, avoiding collisions with ClickHouse's
libc++. This eliminates --allow-multiple-definition, hardcoded x86
library paths, and the AWSLC_SRC_DIR configure-time requirement.
The three per-program CMakeLists.txt are replaced by IMPORTED library
targets in contrib/openssl-cmake/CMakeLists.txt. posix_spawn_2.c
gains aarch64 raw-syscall wrappers so both Dockerfiles work natively.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@Enmk
Copy link
Copy Markdown
Member Author

Enmk commented Mar 20, 2026

@codex review

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 1ab51f9538

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +81 to +83
${AWSLC_BINARIES_DIR}/libawslc_shim.a
${AWSLC_BINARIES_DIR}/libawslc_handshaker.a
${AWSLC_BINARIES_DIR}/libawslc_acvp_server.a
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Avoid requiring harness archives in every FIPS build

build-awslc is a dependency of the imported crypto target, so adding these three archives here makes every -DFIPS_CLICKHOUSE=1 build produce the SSL/ACVP harness before libcrypto.a is usable. That is a build regression on non-x86_64/aarch64 FIPS targets: the new harness path always compiles programs/ssl-common/posix_spawn_2.c, which hard-errors outside those two architectures, so Linux ppc64le/riscv64/s390x configurations now fail in the Docker step even if they never use clickhouse ssl-shim or clickhouse ssl-handshaker.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are not going to support non-x86_64/aarch64 build, so this is not relevant.

aarch64 Linux does not define SYS_dup2 — the syscall was omitted from
the aarch64 kernel ABI since dup3 supersedes it. Use SYS_dup3 with
flags=0 which is semantically identical. Both call sites are already
guarded by fd != op->fd, so dup3's EINVAL-on-equal-fds is not a concern.
@Enmk Enmk merged commit 87c961e into releases/25.3.8-fips Mar 20, 2026
168 of 310 checks passed
@Enmk
Copy link
Copy Markdown
Member Author

Enmk commented Mar 20, 2026

AI audit note: This review comment was generated by AI (gpt-5.3-codex).

Audit update for PR #1560 (SSL/ACVP test harness integration for FIPS):

Confirmed defects:

High: Unconditional harness build breaks FIPS builds on non-x86_64/aarch64 Linux
Impact: FIPS builds can fail before libcrypto.a is usable on unsupported Linux architectures (e.g. ppc64le/s390x/riscv64), even when SSL harness binaries are not needed.
Anchor: contrib/openssl-cmake/CMakeLists.txt (line 76), contrib/openssl-cmake/CMakeLists.txt (line 124), programs/ssl-common/posix_spawn_2.c (line 125)
Trigger: Build with -DFIPS_CLICKHOUSE=1 on Linux architecture other than x86_64/aarch64.
Why defect: build-awslc now depends on harness archives unconditionally, crypto depends on build-awslc, and harness compilation hard-errors on unsupported architectures (#error "unsupported architecture").
Fix direction (short): Gate harness outputs/targets by supported arch (or feature flag), and keep base libssl/libcrypto path architecture-agnostic.
Regression test direction (short): Add FIPS configure/build CI on one unsupported arch and assert crypto builds without requiring harness artifacts.

Medium: clone() failure returns wrong posix_spawn error code
Impact: Callers receive incorrect error (1) instead of actual cause (EAGAIN, ENOMEM, etc.), degrading retry/error handling and diagnostics in harness subprocess paths.
Anchor: programs/ssl-common/posix_spawn_2.c (line 293), programs/ssl-common/posix_spawn_2.c (line 301)
Trigger: clone(...) fails in spawnx_impl (resource exhaustion/limits).
Why defect: glibc clone returns -1 and sets errno; current code uses ec = -pid, which collapses all failures to 1.
Fix direction (short): In pid <= 0 failure path, set ec = errno (or use a raw clone syscall returning -errno consistently).
Regression test direction (short): Inject clone failure and assert __ssl_posix_spawn returns the injected errno value.

Coverage summary:

Scope reviewed: Full PR diff (d1e8b14..9c61bc5), all 29 changed files; deep path coverage on contrib/openssl-cmake/, programs/main.cpp, programs/ssl-common/, programs/ssl-/, Docker/CI/config deltas.
Categories failed: Build portability/arch gating, error-contract consistency (spawn failure propagation).
Categories passed: Entrypoint dispatch wiring, shared-state/threading safety in changed runtime paths, memory/iterator/UB classes in touched C++ wrappers, rollback behavior in spawn child fail path (static).
Assumptions/limits: Static audit only (no runtime execution or fault-injection run in this pass).

@Enmk
Copy link
Copy Markdown
Member Author

Enmk commented Mar 20, 2026

High: Unconditional harness build breaks FIPS builds on non-x86_64/aarch64 Linux

Not an issue, since not going to support non-x86_64/aarch64 builds

# Pass 2: restore undefined refs + entry point to original names,
# and weaken all defined symbols so duplicates across archives
# (same libstdc++ members pulled into shim, handshaker, acvp) don't clash.
objcopy --redefine-syms="$obj_dir/redefine.txt" --weaken "$obj_dir/combined.o"
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this weaken all global symbols? I think we only want to weaken defined symbols (embedded libstdc++ internals) while keeping external references strong. I think this might fix the segfaults I'm running into with ssl-shim.

Something like this?

nm --defined-only "$obj_dir/combined.o" | awk 'NF>=3{print $3}' | sort -u > "$obj_dir/defined_syms.txt"
awk -v p="$PREFIX" '{print p $0}' "$obj_dir/defined_syms.txt" > "$obj_dir/weaken_list.txt"
objcopy --redefine-syms="$obj_dir/redefine.txt" --weaken-symbols="$obj_dir/weaken_list.txt" "$obj_dir/combined.o"

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in light of the findings, reverted the PR #1565, will re-open this proposed set of changes in the third on of the series. Where we'll test the proposed fix as well...

BTW, @DimensionWieldr do you have a test that reliable reproduces the crashes? That would simplify the fix

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This piece of code weaken all defined symbols so duplicates across archive only affects some of the symbols in test-related code, (ssl-shim, ssl-handshaker, and acvp-server), like: C++ std lib symbols, any duplicates, etc. Everything that is supposed to be known ant run-time is restored back:

  • undefined symbols == external dependencies, like malloc, SSL-API, etc
  • entry-point symbols == bssl_shim_main, handshaker_main, acvp_modulewrapper_main

And symbols from the rest of clickhouse are left as is (and in fact there is no way to modify those here).

Copy link
Copy Markdown
Member Author

@Enmk Enmk Mar 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only one possibility that comes to mind: some symbols are missing, meaning we've not built/linked corresponding source file. Hence weak symbols are resolved to "NULL" at link-time, and will cause segfault.

Copy link
Copy Markdown
Collaborator

@DimensionWieldr DimensionWieldr Mar 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, maybe some symbols are missing.

I have a test on the fips-testing branch at https://github.com/Altinity/clickhouse-regression/blob/fips-testing/ssl_server/tests/fips_140_3.py#L753 that I run with ./regression.py --local --clickhouse https://s3.amazonaws.com/altinity-build-artifacts/PRs/1560/9c61bc52473f908e9126ddb12284ba82850b238e/package_release/clickhouse-common-static_25.3.8.10134.altinitytest_amd64.deb --force-fips --only "/ssl server/part 2/fips 140-3/aws-lc test suites/*" -l test.log. (Run from clickhouse-regression/ssl_server)

Copy link
Copy Markdown
Member Author

@Enmk Enmk Mar 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I hastily merged this PR, it is re-done in #1567

@DimensionWieldr let's continue conversation there

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

fips Work related to Altinity FIPS releases fips-25.3

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants