Skip to content

[FLINK-14925][table] Support precision-aware TO_TIMESTAMP with format-based inference#27793

Open
raminqaf wants to merge 1 commit intoapache:masterfrom
raminqaf:FLINK-14925
Open

[FLINK-14925][table] Support precision-aware TO_TIMESTAMP with format-based inference#27793
raminqaf wants to merge 1 commit intoapache:masterfrom
raminqaf:FLINK-14925

Conversation

@raminqaf
Copy link
Contributor

What is the purpose of the change

This pull request makes the TO_TIMESTAMP function precision-aware when a format pattern is provided. Previously,
TO_TIMESTAMP always returned TIMESTAMP(3) regardless of the format pattern's fractional second precision, which forced users to lose sub-millisecond data. This is the TO_TIMESTAMP counterpart to the TO_TIMESTAMP_LTZ precision support added in FLINK-39244.

The output type for the 1-arg variant remains TIMESTAMP(3) for backward compatibility. For the 2-arg variant, precision is
inferred from the format pattern's trailing S count (e.g., SSSSSSTIMESTAMP(6)), with a minimum of 3.

As part of this change, TO_TIMESTAMP is migrated from the legacy Calcite-native function pattern (FlinkSqlOperatorTable + StringCallGen codegen) to the modern bridging function pattern (BuiltInFunctionDefinition + runtimeClass), matching how
TO_TIMESTAMP_LTZ is implemented. This was made possible by fixing the function name from camelCase "toTimestamp" to "TO_TIMESTAMP", which allows CoreModule to resolve it correctly for SQL queries without needing a separate FlinkSqlOperatorTable entry.

Brief change log

  • Type strategy (ToTimestampTypeStrategy): New output type strategy that returns TIMESTAMP(3) for the 1-arg variant
    and TIMESTAMP(max(sCount, 3)) for the 2-arg variant, where sCount is inferred from the format pattern's trailing S
    characters.
  • Runtime function (ToTimestampFunction): New runtime class with eval(StringData) and eval(StringData, StringData)
    methods. The 2-arg variant passes precisionFromFormat(format) to parseTimestampData for precision-aware parsing.
  • Function definition (BuiltInFunctionDefinitions): Changed name from "toTimestamp" to "TO_TIMESTAMP" (removing the need for explicit sqlName), added runtimeClass, and switched output type strategy to SpecificTypeStrategies.TO_TIMESTAMP.
  • Removed legacy plumbing: Removed FlinkSqlOperatorTable.TO_TIMESTAMP, DirectConvertRule mapping, StringCallGen
    cases, and BuiltInMethods.STRING_TO_TIMESTAMP / STRING_TO_TIMESTAMP_WITH_FORMAT — all superseded by the bridging function mechanism.
  • Documentation: Updated sql_functions.yml, sql_functions_zh.yml, and Python expressions.py / expression.py
    docstrings with precision-dependent output types and examples.

Verifying this change

This change added tests and can be verified as follows:

  • Added type strategy unit tests in ToTimestampTypeStrategyTest covering 1-arg default precision, 2-arg format-based
    precision (SSS/SSSSSS/SSSSSSSSS/no-S), invalid argument types, and argument count validation.
  • Added integration tests in TimeFunctionsITCase for 1-arg truncation to precision 3, 2-arg precision 6/9 from format, SSS format staying at precision 3, fewer input digits than format precision, unparsable string, and null input.
  • Removed redundant legacy tests from TemporalTypesTest.scala that are now covered by the new TimeFunctionsITCase tests.

Does this pull request potentially affect one of the following parts:

  • Dependencies (does it add or upgrade a dependency): no
  • The public API, i.e., is any changed class annotated with @Public(Evolving): no
  • The serializers: no
  • The runtime per-record code paths (performance sensitive): no — the ToTimestampFunction.eval() methods call the same DateTimeUtils.parseTimestampData methods as before.
  • Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Kubernetes/Yarn, ZooKeeper: no
  • The S3 file system connector: no

Documentation

  • Does this pull request introduce a new feature? yes
  • If yes, how is the feature documented? docs / JavaDocs / PyDocs

@snuyanzin snuyanzin marked this pull request as ready for review March 20, 2026 08:55
@flinkbot
Copy link
Collaborator

flinkbot commented Mar 20, 2026

CI report:

Bot commands The @flinkbot bot supports the following commands:
  • @flinkbot run azure re-run the last Azure build

@raminqaf raminqaf marked this pull request as draft March 20, 2026 08:58
@raminqaf raminqaf marked this pull request as ready for review March 20, 2026 08:58
Converts a datetime string to a TIMESTAMP without time zone.

- string1: the datetime string to parse
- string2: the format pattern (default 'yyyy-MM-dd HH:mm:ss'). The pattern follows Java's DateTimeFormatter syntax, where 'S' represents fractional seconds (e.g., 'SSS' for milliseconds, 'SSSSSS' for microseconds, 'SSSSSSSSS' for nanoseconds).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks like we are going to heavily introduce java classes here....

Would be great if we have it consistent at least.

I'm asking since here we are talking about DateTimeFormatter, in case of DATE_FORMAT there is SimpleDateFormatter...

I think we should have at least a link to their doc explaining the formats, it will simplify the search for non java people (sql, python)


The output precision depends on the variant used:
- 1-arg variant: always returns TIMESTAMP(3).
- 2-arg variant: precision is inferred from the number of trailing 'S' characters in the format pattern, with a minimum of 3. E.g., format 'yyyy-MM-dd HH:mm:ss.SS' returns TIMESTAMP(3), format 'yyyy-MM-dd HH:mm:ss.SSSSSS' returns TIMESTAMP(6).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what happens when the number of S's is not 3, 6 or 9.

toTimestamp("2023-01-01 12:30:00"),
"TO_TIMESTAMP('2023-01-01 12:30:00')",
LocalDateTime.of(2023, 1, 1, 12, 30, 0),
TIMESTAMP(3).nullable())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we have a couple of tests for non 0, 3, 6, 9 precision?

super(BuiltInFunctionDefinitions.TO_TIMESTAMP, context);
}

public @Nullable TimestampData eval(StringData timestamp) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
public @Nullable TimestampData eval(StringData timestamp) {
public @Nullable TimestampData eval(@Nullable StringData timestamp) {

return parseTimestampData(timestamp.toString());
}

public @Nullable TimestampData eval(StringData timestamp, StringData format) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
public @Nullable TimestampData eval(StringData timestamp, StringData format) {
public @Nullable TimestampData eval(@Nullable StringData timestamp, @Nullable StringData format) {


The output precision depends on the variant used:
- 1-arg variant: always returns TIMESTAMP(3).
- 2-arg variant: precision is inferred from the number of trailing 'S' characters in the format pattern, with a minimum of 3. E.g., format 'yyyy-MM-dd HH:mm:ss.SS' returns TIMESTAMP(3), format 'yyyy-MM-dd HH:mm:ss.SSSSSS' returns TIMESTAMP(6).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

E.g., format 'yyyy-MM-dd HH:mm:ss.SS' returns TIMESTAMP(3)

why do we have such behavior?

what is the behavior for other vendors?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is kept for backward compatibility. Same behavior we have for TO_TIMESTAMP_LTZ https://issues.apache.org/jira/browse/FLINK-39244

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isn't release notes telling it a bit different?

TO_TIMESTAMP_LTZ() function now supports up to precision 9 for both numeric and string conversions. While we kept backwards compatibility for numeric precision 0-3 which always returns TIMESTAMP_LTZ(3), precisions p=4-9 now return TIMESTAMP_LTZ(p). Function calls taking string formats such as ".SSSS" now return precision 4 instead of silently loosing sub second information.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants