Skip to content

ValidationError for primary_key failure could be a lot more helpful #282

@chriswithers-fuse

Description

@chriswithers-fuse

Hello!

So, we have had some failures like this:

File "/app/.venv/lib/python3.12/site-packages/polars/dataframe/frame.py", line 6716, in pipe     return function(self, *args, **kwargs)            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ,
--
  | File "/app/.venv/lib/python3.12/site-packages/dataframely/schema.py", line 574, in validate     raise ValidationError( ,
  | dataframely.exc.ValidationError: 1 rules failed validation:  - 'primary_key' failed for 156 rows

dataframely.exc.ValidationError: 1 rules failed validation:
 - 'primary_key' failed for 312 rows

It's great to to catch the problem, but when it occurs in the middle of a multi-hour pipeline, then to get back to this point with a debugger to find out what happened is pretty painful, especially as in our case it's usually just one value repeated many times.

Please could I put in a feature request that this exception could grow more detail, something like the following would be amazing:

dataframely.exc.ValidationError: 1 rules failed validation:
 - 'primary_key' failed for 312 rows with 4 distinct combinations, examples: [...distinct key 1..., ...distinct key 2..., etc]

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions