Skip to content

[FLINK-39244][table] Support precision 0-9 for TO_TIMESTAMP_LTZ function#27757

Merged
twalthr merged 4 commits intoapache:masterfrom
raminqaf:FLINK-39244
Mar 19, 2026
Merged

[FLINK-39244][table] Support precision 0-9 for TO_TIMESTAMP_LTZ function#27757
twalthr merged 4 commits intoapache:masterfrom
raminqaf:FLINK-39244

Conversation

@raminqaf
Copy link
Contributor

@raminqaf raminqaf commented Mar 11, 2026

What is the purpose of the change

This pull request extends the TO_TIMESTAMP_LTZ function to support all precision values from 0 to 9, where precision specifies the unit of the epoch value as 10^(-precision) seconds. Previously only 0 (seconds) and 3 (milliseconds) were supported, which limited users working with higher-precision epoch values such as microseconds (6) or nanoseconds (9).

The output type is now precision-aware: TIMESTAMP_LTZ(3) for precision 0-3, and TIMESTAMP_LTZ(precision) for precision 4-9. The output type for precision 0-3 is kept at TIMESTAMP_LTZ(3) to preserve backward compatibility — previously TO_TIMESTAMP_LTZ always returned TIMESTAMP_LTZ(3), and changing it for existing precision values (0 and 3) could break downstream schemas and queries that depend on the output type. For the string-based variant, the output precision is inferred from the format pattern's S count (e.g., SSSSSSTIMESTAMP_LTZ(6)), with a minimum of 3 for the same backward compatibility reason.

Brief change log

  • Type strategy (ToTimestampLtzTypeStrategy): Output type follows input precision — TIMESTAMP_LTZ(3) for precision 0-3, TIMESTAMP_LTZ(precision) for 4-9. For string variants, precision is inferred from the format pattern's trailing S count. Validates precision is between 0 and 9.
  • Runtime conversion (DateTimeUtils): Replaced per-precision switch cases with a generic epochToTimestampData method using Math.pow for factor computation. The double overload now preserves fractional parts by converting to nanoseconds before truncation. Added toEpochValue(Instant, precision) for the inverse conversion and precisionFromFormat(String) as shared logic for format-based precision inference.
  • Runtime parsing (DateTimeUtils, ToTimestampLtzFunction): Added parseTimestampData(String, String, int) overload that accepts precision. The string+format variant of TO_TIMESTAMP_LTZ now passes the format's precision to the parser, preserving sub-millisecond fractional digits. The existing 2-arg overload remains hardcoded to precision 3 for TO_TIMESTAMP compatibility (FLINK-14925).
  • Input validation (ToTimestampLtzInputTypeStrategy): Added compile-time range validation for numeric literal arguments, so out-of-range epoch values fail at validation time rather than silently returning null at runtime.
  • Literal serialization (ValueLiteralExpression): TIMESTAMP_LTZ literals now serialize using the string-based TO_TIMESTAMP_LTZ('timestamp', 'format', 'UTC') variant with a precision-matching format pattern. Previously it used numeric epoch millis which would overflow long for dates after April 11, 2262 at nanosecond precision.
  • Function versioning (BuiltInFunctionDefinitions): Bumped TO_TIMESTAMP_LTZ to version 2 for plan serialization compatibility.
  • Documentation: Updated sql_functions.yml, sql_functions_zh.yml, and Python expressions.py docstrings with precision-dependent output types, parameter descriptions, and examples.

Verifying this change

This change added tests and can be verified as follows:

  • Added parameterized tests for all precisions 0-9 in ExpressionTest to verify TIMESTAMP_LTZ literal serialization round-trips correctly at every precision, including edge cases beyond the long nanosecond overflow (year 2262, year 9999).
  • Added type strategy unit tests in ToTimestampLtzTypeStrategyTest covering precision 0-9 (numeric and format-based), out-of-range precision, and nullable input combinations.
  • Added integration tests in TimeFunctionsITCase for string-based parsing at precision 4/6/8/9, format-based precision inference, and the case where input has fewer fractional digits than the format pattern.
  • Added serialization integration tests in LiteralExpressionsSerializationITCase for precisions 0, 3, 6, and 9.
  • Updated TemporalTypesTest with tests for precision 1 (deciseconds), precision 9 (nanoseconds), string+format variants at different precisions, DOUBLE/DECIMAL at higher precisions, and the fewer-input-digits-than-format case.
  • Updated golden files (select.q, select_batch.q) for CliClientITCase to match new TIMESTAMP_LTZ(3) display format.

Does this pull request potentially affect one of the following parts:

  • Dependencies (does it add or upgrade a dependency): no
  • The public API, i.e., is any changed class annotated with @Public(Evolving): no
  • The serializers: no
  • The runtime per-record code paths (performance sensitive): yes — DateTimeUtils.epochToTimestampData is called per-record but the change replaces a switch statement with Math.pow computations, which has negligible performance impact.
  • Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Kubernetes/Yarn, ZooKeeper: no
  • The S3 file system connector: no

Documentation

  • Does this pull request introduce a new feature? yes
  • If yes, how is the feature documented? docs / JavaDocs

@flinkbot
Copy link
Collaborator

flinkbot commented Mar 11, 2026

CI report:

Bot commands The @flinkbot bot supports the following commands:
  • @flinkbot run azure re-run the last Azure build

@davidradl
Copy link
Contributor

@raminqaf I notice PR 27770 is involved with precision and the code seems to round to 3 6 or 9 , where as your code deals with the other precisions like 7. Is this consistent?

@raminqaf
Copy link
Contributor Author

raminqaf commented Mar 16, 2026

@raminqaf I notice PR 27770 is involved with precision and the code seems to round to 3 6 or 9 , where as your code deals with the other precisions like 7. Is this consistent?

@davidradl Thanks for taking a look and raising this!

To address your point about PR #27770 rounding to 3, 6, or 9: The discrepancy comes down to the difference between the SQL function layer (this PR) and the serialization layer (PR #27770). They do not collide; they simply operate at different scopes.

The Serialization Layer (Avro Limitations)

PR #27770 deals with Avro, and the Avro 1.12.0 specification only defines the following timestamp logical types:

  • timestamp-millis → precision 3
  • timestamp-micros → precision 6
  • timestamp-nanos → precision 9

There is no Avro logical type for intermediate precisions like 1, 2, 4, 5, 7, or 8. These are Avro spec limitations, not Flink limitations. If a user executes TO_TIMESTAMP_LTZ(12, 1) (from this PR) and tries to write it to an Avro sink (PR #27770), the Avro converter would either throw an unsupported precision error or round up to the next Avro-supported precision (e.g., precision 1 → use millis). The Avro format must handle the rounding/truncation gracefully for non-standard precisions, but that is the connector's responsibility—not TO_TIMESTAMP_LTZ's.

The SQL Function Layer (Standard Database Behavior)

For the SQL layer, supporting all intermediate precisions (0-9) is the correct and standard behavior for timestamp types. Restricting Flink's SQL precision strictly to powers of 1000 (3, 6, 9) would be an artificial limitation.

Here is how other major databases handle this, which FlinkSQL aligns with by supporting 0-9:

  • Snowflake (TIMESTAMP_LTZ): Fully supports any integer precision from 0 to 9. Intermediate precisions like 7 will exactly truncate/round to 7 decimal places.

  • Oracle (TIMESTAMP WITH LOCAL TIME ZONE): Accepts any fractional seconds precision from 0 to 9. It natively enforces intermediate precisions perfectly.

  • PostgreSQL (TIMESTAMPTZ): Supports any integer from 0 to 6 (Postgres maxes out at microseconds). Specifying a precision of 1, 2, 4, or 5 is completely valid and strictly enforced.

  • MySQL (TIMESTAMP): Similar to Postgres, allows any fractional precision from 0 to 6.

By supporting 0-9 here, we bring Flink completely in line with standard ANSI SQL database behavior! Let me know what you think.

@raminqaf raminqaf force-pushed the FLINK-39244 branch 3 times, most recently from da9d166 to fd0e599 Compare March 16, 2026 17:44
@github-actions github-actions bot added the community-reviewed PR has been reviewed by the community. label Mar 17, 2026
@raminqaf raminqaf force-pushed the FLINK-39244 branch 4 times, most recently from 22f561d to 8e58476 Compare March 18, 2026 10:03
Copy link
Contributor

@twalthr twalthr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @raminqaf. I left some hopefully final comments.

Copy link
Contributor

@twalthr twalthr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks for tackling this time problem!

@raminqaf
Copy link
Contributor Author

@flinkbot run azure

…eral serialization for TO_TIMESTAMP_LTZ

  - Output type now follows input precision: TIMESTAMP_LTZ(3) for precision
    0-3, TIMESTAMP_LTZ(precision) for 4-9. String-based variants infer
    precision from the format pattern's 'S' count.
  - Literal serialization uses string-based TO_TIMESTAMP_LTZ('timestamp',
    'format', 'UTC') instead of numeric epoch to avoid long overflow beyond
    year 2262.
  - Add function version 2 for plan serialization compatibility.
  - Update docs (sql_functions.yml, sql_functions_zh.yml, Python docstrings)
    with precision-dependent output types, parameter descriptions, and examples.
  - Update tests to match new output types and display formats.
@twalthr twalthr merged commit 25493ac into apache:master Mar 19, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

community-reviewed PR has been reviewed by the community.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants