feat: Support AI-assisted SQL generation via ClickHouse client integration #440
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR is still experimental, so please feel free to reject it if it doesn’t align with chdb’s direction. I would appreciate any discussion or feedback, particularly regarding the items listed in the TODO section.
Description
This PR adds AI-powered SQL generation support to chdb, enabling users to translate natural language descriptions into executable SQL queries, and addresses #439.
Starting from ClickHouse 25.7, both the ClickHouse client and clickhouse-local include built-in AI capabilities for SQL generation. This contribution brings similar functionality to chdb, improving usability in interactive and exploratory workflows.
Solution
The ClickHouse client’s C++ implementation is reused without modification to handle prompting, tooling, and communication with AI providers. See
src/Client/AIfor more details.chdb introduces the same
??prefix used by the ClickHouse client and clickhouse-local as the trigger for AI-assisted query generation. It reuses the existingqueryAPI to accept user prompts, with additional glue code to integrate with the ClickHouse client’s AI-related components.AI-generated output is returned in
RAWformat, allowing users to easily extract the generated SQL string and execute it separately.Changelog category
Changelog entry
Add AI-powered SQL generation support, enabling natural-language-to-SQL translation through chdb’s language bindings.
Documentation entry for user-facing changes
??. For example,conn.query("?? list all users order by id")may generate a SQL statement such asSELECT * FROM users ORDER BY id.Test
A single Python unit test has been added to validate this functionality. It requires certain environment variables to be set for the AI endpoint; otherwise, the test will be skipped.
TODO
There are a few clear areas that could be improved, including:
queryAPI, it becomes difficult to pass AI-related configuration parameters (such astemperature), since there’s no natural place to expose the options defined insrc/Client/AI/AIConfiguration.h. I’d prefer not to bloat the currentqueryAPI, and I’d welcome guidance on the preferred approach.google/gemma-3n-e4bis the weakest and barely usable, whileqwen/qwen3-codershows noticeably poorer tool-use capability compared toz-ai/glm-4.5-air.References
[1] AI-powered SQL generation, https://clickhouse.com/docs/use-cases/AI/ai-powered-sql-generation
Demo
Here is a working demo video, consuming OpenAI compatible API from openrouter, using
z-ai/glm-4.5-airas the model.ai-gen-demo.mp4