Skip to content

Conversation

@niyue
Copy link
Contributor

@niyue niyue commented Dec 9, 2025

This PR is still experimental, so please feel free to reject it if it doesn’t align with chdb’s direction. I would appreciate any discussion or feedback, particularly regarding the items listed in the TODO section.

Description

This PR adds AI-powered SQL generation support to chdb, enabling users to translate natural language descriptions into executable SQL queries, and addresses #439.

Starting from ClickHouse 25.7, both the ClickHouse client and clickhouse-local include built-in AI capabilities for SQL generation. This contribution brings similar functionality to chdb, improving usability in interactive and exploratory workflows.

Solution

  • The ClickHouse client’s C++ implementation is reused without modification to handle prompting, tooling, and communication with AI providers. See src/Client/AI for more details.

  • chdb introduces the same ?? prefix used by the ClickHouse client and clickhouse-local as the trigger for AI-assisted query generation. It reuses the existing query API to accept user prompts, with additional glue code to integrate with the ClickHouse client’s AI-related components.

  • AI-generated output is returned in RAW format, allowing users to easily extract the generated SQL string and execute it separately.

Changelog category

  • New Feature

Changelog entry

Add AI-powered SQL generation support, enabling natural-language-to-SQL translation through chdb’s language bindings.

Documentation entry for user-facing changes

  • Users can trigger AI-powered SQL generation by prefixing the input with ??. For example, conn.query("?? list all users order by id") may generate a SQL statement such as SELECT * FROM users ORDER BY id.
  • This PR does not include user-facing documentation yet, as the current approach, reusing the query API, may not be ideal (see the TODO section). Once the API design is finalized, additional documentation will be added.

Test

  • I use openrouter to test several different models (all free), including:
    • z-ai/glm-4.5-air:free
    • qwen/qwen3-coder:free
    • openai/gpt-oss-120b:free
    • google/gemma-3n-e4b (tested locally via LM studio)

A single Python unit test has been added to validate this functionality. It requires certain environment variables to be set for the AI endpoint; otherwise, the test will be skipped.

TODO

There are a few clear areas that could be improved, including:

  • I’m not certain whether the existing query API is the best interface for this feature. A dedicated API could return a plain string directly and might offer a cleaner user experience. However, adding a new API would increase complexity and require implementation across all language bindings. If we continue using the existing query API, it becomes difficult to pass AI-related configuration parameters (such as temperature), since there’s no natural place to expose the options defined in src/Client/AI/AIConfiguration.h. I’d prefer not to bloat the current query API, and I’d welcome guidance on the preferred approach.
  • CI will need to be configured with the appropriate AI API endpoint parameters for this feature to work reliably.
  • The final output heavily depends on the model being used, as different models vary significantly in both code-generation quality and tool-use capabilities. To benefit from this enhancement, a reasonably strong model is expected. In my testing, google/gemma-3n-e4b is the weakest and barely usable, while qwen/qwen3-coder shows noticeably poorer tool-use capability compared to z-ai/glm-4.5-air.

References

[1] AI-powered SQL generation, https://clickhouse.com/docs/use-cases/AI/ai-powered-sql-generation

Demo

Here is a working demo video, consuming OpenAI compatible API from openrouter, using z-ai/glm-4.5-air as the model.

ai-gen-demo.mp4

@niyue niyue force-pushed the feature/ai-sql branch 3 times, most recently from 0a0a298 to 9f25950 Compare December 15, 2025 13:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants