[FLINK-14925][table] Support precision-aware TO_TIMESTAMP with format-based inference#27793
[FLINK-14925][table] Support precision-aware TO_TIMESTAMP with format-based inference#27793raminqaf wants to merge 1 commit intoapache:masterfrom
Conversation
| Converts a datetime string to a TIMESTAMP without time zone. | ||
|
|
||
| - string1: the datetime string to parse | ||
| - string2: the format pattern (default 'yyyy-MM-dd HH:mm:ss'). The pattern follows Java's DateTimeFormatter syntax, where 'S' represents fractional seconds (e.g., 'SSS' for milliseconds, 'SSSSSS' for microseconds, 'SSSSSSSSS' for nanoseconds). |
There was a problem hiding this comment.
looks like we are going to heavily introduce java classes here....
Would be great if we have it consistent at least.
I'm asking since here we are talking about DateTimeFormatter, in case of DATE_FORMAT there is SimpleDateFormatter...
I think we should have at least a link to their doc explaining the formats, it will simplify the search for non java people (sql, python)
|
|
||
| The output precision depends on the variant used: | ||
| - 1-arg variant: always returns TIMESTAMP(3). | ||
| - 2-arg variant: precision is inferred from the number of trailing 'S' characters in the format pattern, with a minimum of 3. E.g., format 'yyyy-MM-dd HH:mm:ss.SS' returns TIMESTAMP(3), format 'yyyy-MM-dd HH:mm:ss.SSSSSS' returns TIMESTAMP(6). |
There was a problem hiding this comment.
what happens when the number of S's is not 3, 6 or 9.
| toTimestamp("2023-01-01 12:30:00"), | ||
| "TO_TIMESTAMP('2023-01-01 12:30:00')", | ||
| LocalDateTime.of(2023, 1, 1, 12, 30, 0), | ||
| TIMESTAMP(3).nullable()) |
There was a problem hiding this comment.
can we have a couple of tests for non 0, 3, 6, 9 precision?
| super(BuiltInFunctionDefinitions.TO_TIMESTAMP, context); | ||
| } | ||
|
|
||
| public @Nullable TimestampData eval(StringData timestamp) { |
There was a problem hiding this comment.
| public @Nullable TimestampData eval(StringData timestamp) { | |
| public @Nullable TimestampData eval(@Nullable StringData timestamp) { |
| return parseTimestampData(timestamp.toString()); | ||
| } | ||
|
|
||
| public @Nullable TimestampData eval(StringData timestamp, StringData format) { |
There was a problem hiding this comment.
| public @Nullable TimestampData eval(StringData timestamp, StringData format) { | |
| public @Nullable TimestampData eval(@Nullable StringData timestamp, @Nullable StringData format) { |
|
|
||
| The output precision depends on the variant used: | ||
| - 1-arg variant: always returns TIMESTAMP(3). | ||
| - 2-arg variant: precision is inferred from the number of trailing 'S' characters in the format pattern, with a minimum of 3. E.g., format 'yyyy-MM-dd HH:mm:ss.SS' returns TIMESTAMP(3), format 'yyyy-MM-dd HH:mm:ss.SSSSSS' returns TIMESTAMP(6). |
There was a problem hiding this comment.
E.g., format 'yyyy-MM-dd HH:mm:ss.SS' returns TIMESTAMP(3)
why do we have such behavior?
what is the behavior for other vendors?
There was a problem hiding this comment.
This is kept for backward compatibility. Same behavior we have for TO_TIMESTAMP_LTZ https://issues.apache.org/jira/browse/FLINK-39244
There was a problem hiding this comment.
isn't release notes telling it a bit different?
TO_TIMESTAMP_LTZ() function now supports up to precision 9 for both numeric and string conversions. While we kept backwards compatibility for numeric precision 0-3 which always returns TIMESTAMP_LTZ(3), precisions p=4-9 now return TIMESTAMP_LTZ(p). Function calls taking string formats such as ".SSSS" now return precision 4 instead of silently loosing sub second information.
What is the purpose of the change
This pull request makes the
TO_TIMESTAMPfunction precision-aware when a format pattern is provided. Previously,TO_TIMESTAMPalways returnedTIMESTAMP(3)regardless of the format pattern's fractional second precision, which forced users to lose sub-millisecond data. This is theTO_TIMESTAMPcounterpart to theTO_TIMESTAMP_LTZprecision support added in FLINK-39244.The output type for the 1-arg variant remains
TIMESTAMP(3)for backward compatibility. For the 2-arg variant, precision isinferred from the format pattern's trailing
Scount (e.g.,SSSSSS→TIMESTAMP(6)), with a minimum of 3.As part of this change,
TO_TIMESTAMPis migrated from the legacy Calcite-native function pattern (FlinkSqlOperatorTable + StringCallGen codegen) to the modern bridging function pattern (BuiltInFunctionDefinition + runtimeClass), matching howTO_TIMESTAMP_LTZis implemented. This was made possible by fixing the function name from camelCase"toTimestamp"to"TO_TIMESTAMP", which allowsCoreModuleto resolve it correctly for SQL queries without needing a separateFlinkSqlOperatorTableentry.Brief change log
ToTimestampTypeStrategy): New output type strategy that returnsTIMESTAMP(3)for the 1-arg variantand
TIMESTAMP(max(sCount, 3))for the 2-arg variant, wheresCountis inferred from the format pattern's trailingScharacters.
ToTimestampFunction): New runtime class witheval(StringData)andeval(StringData, StringData)methods. The 2-arg variant passes
precisionFromFormat(format)toparseTimestampDatafor precision-aware parsing.BuiltInFunctionDefinitions): Changed name from"toTimestamp"to"TO_TIMESTAMP"(removing the need for explicitsqlName), addedruntimeClass, and switched output type strategy toSpecificTypeStrategies.TO_TIMESTAMP.FlinkSqlOperatorTable.TO_TIMESTAMP,DirectConvertRulemapping,StringCallGencases, and
BuiltInMethods.STRING_TO_TIMESTAMP/STRING_TO_TIMESTAMP_WITH_FORMAT— all superseded by the bridging function mechanism.sql_functions.yml,sql_functions_zh.yml, and Pythonexpressions.py/expression.pydocstrings with precision-dependent output types and examples.
Verifying this change
This change added tests and can be verified as follows:
ToTimestampTypeStrategyTestcovering 1-arg default precision, 2-arg format-basedprecision (SSS/SSSSSS/SSSSSSSSS/no-S), invalid argument types, and argument count validation.
TimeFunctionsITCasefor 1-arg truncation to precision 3, 2-arg precision 6/9 from format, SSS format staying at precision 3, fewer input digits than format precision, unparsable string, and null input.TemporalTypesTest.scalathat are now covered by the newTimeFunctionsITCasetests.Does this pull request potentially affect one of the following parts:
@Public(Evolving): noToTimestampFunction.eval()methods call the sameDateTimeUtils.parseTimestampDatamethods as before.Documentation