Skip to content

refactor(datasets-derived): validate functions at deserialization#1905

Open
LNSD wants to merge 1 commit intomainfrom
lnsd/refactor-datasets-derived-func-validate
Open

refactor(datasets-derived): validate functions at deserialization#1905
LNSD wants to merge 1 commit intomainfrom
lnsd/refactor-datasets-derived-func-validate

Conversation

@LNSD
Copy link
Contributor

@LNSD LNSD commented Mar 4, 2026

Decouple datasets-derived from ScalarUDF and js-runtime by moving Arrow type validation to the deserialization boundary, so a successfully deserialized Function is always valid.

  • Move Function/FunctionSource from datasets-common into datasets-derived with custom Deserialize that validates Arrow types against JS UDF-supported primitives
  • Remove js-runtime dependency from datasets-derived; Dataset::function_by_name returns &Function instead of constructing ScalarUDF
  • Simplify SelfSchemaProvider::from_manifest_udfs by removing redundant schema_name parameter

@LNSD LNSD requested review from leoyvens and shiyasmohd March 4, 2026 12:31
@LNSD LNSD self-assigned this Mar 4, 2026
@LNSD LNSD force-pushed the lnsd/refactor-datasets-derived-func-validate branch 2 times, most recently from 1f1cb30 to ca4cccf Compare March 4, 2026 12:54
@LNSD
Copy link
Contributor Author

LNSD commented Mar 4, 2026

This is another spin-off from #1900

@LNSD LNSD requested a review from mitchhs12 March 4, 2026 13:24
@LNSD LNSD force-pushed the lnsd/refactor-datasets-derived-func-validate branch from ca4cccf to 3c19bec Compare March 4, 2026 14:15
Decouple `datasets-derived` from `ScalarUDF` and `js-runtime` by moving Arrow type validation to the deserialization boundary, so a successfully deserialized `Function` is always valid.

- Move `Function`/`FunctionSource` from `datasets-common` into `datasets-derived` with custom `Deserialize` that validates Arrow types against JS UDF-supported primitives
- Remove `js-runtime` dependency from `datasets-derived`; `Dataset::function_by_name` returns `&Function` instead of constructing `ScalarUDF`
- Simplify `SelfSchemaProvider::from_manifest_udfs` by removing redundant `schema_name` parameter

Signed-off-by: Lorenzo Delgado <lorenzo@edgeandnode.com>
@LNSD LNSD force-pushed the lnsd/refactor-datasets-derived-func-validate branch from 3c19bec to 0937306 Compare March 4, 2026 15:41
@LNSD LNSD requested a review from mitchhs12 March 4, 2026 15:57
Copy link
Contributor

@mitchhs12 mitchhs12 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants