ENH: Machine-readable validate output with store/reload by yarikoptic · Pull Request #1822 · dandi/dandi-cli

yarikoptic · 2026-03-19T19:25:40Z

Summary

Design plan for machine-readable validate output with store/reload capability. Adds structured output formats, automatic persistence of validation results alongside log files, and the ability to reload and re-render results with different grouping/filtering options.

Key design decisions:

All structured formats (json, json_pp, json_lines, yaml) emit a uniform flat list of ValidationResult records — no envelope/non-envelope split
JSONL as the primary interchange format: cat/jq/grep/vd (VisiData) composable
_record_version field on each record for forward-compatible deserialization
Grouping affects human display only; structured output is always a stable flat schema
Auto-save _validation.jsonl sidecar next to existing .log files
Refactor dandi/validate.py + dandi/validate_types.py into dandi/validate/ subpackage
Closes validate: Add -f|--format option to optionally serialize into json, json_pp, json_lines or yaml #1515
Closes Provide easy means for introspecting upload validation failures #1753
Enhances Add filtering of issues by type/ID or by file location #1743, upload,validate: Add --validators option #1737, Tidy up the validate command function in cmd_validate.py #1748

TODO

`--max-per-group` feature (Step 5)

Limits how many results are shown per leaf group (or in the flat list when no grouping). Excess results are replaced by a TruncationNotice placeholder — a distinct data structure (not a ValidationResult), so it won't be confused with real results if the output is saved/reloaded.

Examples (against 147k+ validation results from bids-examples)

Flat truncation — --max-per-group 5 with no grouping:

[DANDI.NO_DANDISET_FOUND] .../2d_mb_pcasl — Path is not inside a Dandiset
[BIDS.JSON_KEY_RECOMMENDED] .../dataset_description.json — A JSON file is missing a key ...
[BIDS.JSON_KEY_RECOMMENDED] .../dataset_description.json — A JSON file is missing a key ...
[BIDS.JSON_KEY_RECOMMENDED] .../dataset_description.json — A JSON file is missing a key ...
[BIDS.JSON_KEY_RECOMMENDED] .../dataset_description.json — A JSON file is missing a key ...
... and 147581 more issues

Grouped truncation — -g severity --max-per-group 3:

=== ERROR (9569 issues) ===
  [DANDI.NO_DANDISET_FOUND] .../2d_mb_pcasl — Path is not inside a Dandiset
  [BIDS.NIFTI_HEADER_UNREADABLE] .../sub-1_T1w.nii.gz — We were unable to parse header data ...
  [BIDS.NIFTI_HEADER_UNREADABLE] .../sub-1_dir-AP_epi.nii.gz — We were unable to parse header data ...
  ... and 9566 more issues
=== HINT (138017 issues) ===
  [BIDS.JSON_KEY_RECOMMENDED] .../dataset_description.json — A JSON file is missing a key ...
  [BIDS.JSON_KEY_RECOMMENDED] .../dataset_description.json — A JSON file is missing a key ...
  [BIDS.JSON_KEY_RECOMMENDED] .../dataset_description.json — A JSON file is missing a key ...
  ... and 138014 more issues

and actually those are colored if output is not redirected

Multi-level leaf-only truncation — -g severity -g id --max-per-group 2:

=== ERROR (9569 issues) ===
  === DANDI.NO_DANDISET_FOUND (107 issues) ===
    [DANDI.NO_DANDISET_FOUND] .../2d_mb_pcasl — Path is not inside a Dandiset
    [DANDI.NO_DANDISET_FOUND] .../7t_trt — Path is not inside a Dandiset
    ... and 105 more issues
  === BIDS.NIFTI_HEADER_UNREADABLE (4336 issues) ===
    [BIDS.NIFTI_HEADER_UNREADABLE] .../sub-1_T1w.nii.gz — We were unable to parse header data ...
    [BIDS.NIFTI_HEADER_UNREADABLE] .../sub-1_dir-AP_epi.nii.gz — We were unable to parse header data ...
    ... and 4334 more issues
  === BIDS.EMPTY_FILE (4954 issues) ===
    ...

Structured output — -g severity -f json_pp --max-per-group 2 emits _truncated placeholders:

{
  "ERROR": [
    { "id": "DANDI.NO_DANDISET_FOUND", "severity": "ERROR", ... },
    { "id": "BIDS.NIFTI_HEADER_UNREADABLE", "severity": "ERROR", ... },
    { "_truncated": true, "omitted_count": 9567 }
  ],
  "HINT": [
    { "id": "BIDS.JSON_KEY_RECOMMENDED", "severity": "HINT", ... },
    { "id": "BIDS.JSON_KEY_RECOMMENDED", "severity": "HINT", ... },
    { "_truncated": true, "omitted_count": 138015 }
  ]
}

Headers show original counts (e.g. "9569 issues") even when only a few are displayed. The _truncated sentinel follows the _record_version naming convention for metadata fields.

Test plan

CLI unit tests for each --format output via click.CliRunner
Round-trip serialization tests for ValidationResult JSONL
--load with multi-file concatenation, mutual exclusivity enforcement
Sidecar auto-save creation and suppression when --output is used
Upload sidecar integration test with Docker Compose fixture
Extended grouping: section headers, counts, structured output unaffected
--max-per-group flat truncation, grouped truncation, multi-level, JSON placeholder, no-truncation when under limit
Unit test for _truncate_leaves() helper

Some demos

Codecov Report

❌ Patch coverage is 95.78207% with 24 lines in your changes missing coverage. Please review.
✅ Project coverage is 76.06%. Comparing base (5f03d9b) to head (2ddd5d4).

Files with missing lines	Patch %	Lines
dandi/cli/cmd_validate.py	93.60%	8 Missing ⚠️
dandi/upload.py	25.00%	3 Missing ⚠️
dandi/validate/types.py	0.00%	3 Missing ⚠️
dandi/files/zarr.py	0.00%	2 Missing ⚠️
dandi/validate/__init__.py	0.00%	2 Missing ⚠️
dandi/bids_validator_deno/_validator.py	0.00%	1 Missing ⚠️
dandi/files/bases.py	0.00%	1 Missing ⚠️
dandi/files/bids.py	50.00%	1 Missing ⚠️
dandi/organize.py	0.00%	1 Missing ⚠️
dandi/pynwb_utils.py	0.00%	1 Missing ⚠️
... and 1 more

Additional details and impacted files

@@            Coverage Diff             @@
##           master    #1822      +/-   ##
==========================================
+ Coverage   75.12%   76.06%   +0.93%     
==========================================
  Files          84       87       +3     
  Lines       11930    12457     +527     
==========================================
+ Hits         8963     9475     +512     
- Misses       2967     2982      +15

Flag	Coverage Δ
unittests	`76.06% <95.78%> (+0.93%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

…e/ subpackage Pure file move with no content changes, plus __init__.py re-exports for backward compatibility. Imports will be updated in the next commit. Co-Authored-By: Claude Code 2.1.63 / Claude Opus 4.6 <noreply@anthropic.com>

Update imports across 13 files to use the new subpackage structure: - dandi.validate_types → dandi.validate.types - dandi.validate → dandi.validate.core (for explicit imports) - Relative imports adjusted accordingly Co-Authored-By: Claude Code 2.1.63 / Claude Opus 4.6 <noreply@anthropic.com>

- test_validate.py → dandi/validate/tests/test_core.py - test_validate_types.py → dandi/validate/tests/test_types.py - Update relative imports in moved test files - Fix circular import: don't eagerly import core in __init__.py Co-Authored-By: Claude Code 2.1.63 / Claude Opus 4.6 <noreply@anthropic.com>

…ate CLI Decompose the monolithic validate() click command into helpers: - _collect_results(): runs validation and collects results - _filter_results(): applies min-severity and ignore filters - _process_issues(): simplified, no longer handles ignore (moved to _filter) No behavior changes; all existing tests pass unchanged. Co-Authored-By: Claude Code 2.1.63 / Claude Opus 4.6 <noreply@anthropic.com>

Add record_version: str = "1" for forward-compatible serialization. Uses no underscore prefix since Pydantic v2 excludes underscore-prefixed fields from serialization. Co-Authored-By: Claude Code 2.1.63 / Claude Opus 4.6 <noreply@anthropic.com>

Add -f/--format {human,json,json_pp,json_lines,yaml} to produce structured output using existing formatter infrastructure. Structured formats suppress colored text and 'No errors found' message. Exit code still reflects validation results. Co-Authored-By: Claude Code 2.1.63 / Claude Opus 4.6 <noreply@anthropic.com>

- Create dandi/validate/io.py with write/append/load JSONL utilities and validation_sidecar_path() helper - Add -o/--output option to write structured output to file - Auto-save _validation.jsonl sidecar next to logfile when using structured format without --output Co-Authored-By: Claude Code 2.1.63 / Claude Opus 4.6 <noreply@anthropic.com>

Add --summary/--no-summary flag that shows statistics after validation: total issues, breakdown by severity, validator, and standard. For human output, printed to stdout; for structured formats, printed to stderr. Also refactors _process_issues into _render_human (no exit) + _exit_if_errors, keeping _process_issues as backward-compatible wrapper. Co-Authored-By: Claude Code 2.1.63 / Claude Opus 4.6 <noreply@anthropic.com>

Add --load to reload previously-saved JSONL validation results and re-render them with different formats/filters/grouping. Mutually exclusive with positional paths. Exit code reflects loaded results. Skip auto-save sidecar when loading. Co-Authored-By: Claude Code 2.1.63 / Claude Opus 4.6 <noreply@anthropic.com>

- Add validation_log_path parameter to upload() - In upload validation loop, append results to sidecar via append_validation_jsonl() when validation_log_path is set - CLI cmd_upload derives sidecar path from logfile and passes it Co-Authored-By: Claude Code 2.1.63 / Claude Opus 4.6 <noreply@anthropic.com>

Fix mypy errors by using IO[str] instead of object for file-like output parameters in _print_summary, _get_formatter, and _render_structured. Co-Authored-By: Claude Code 2.1.63 / Claude Opus 4.6 <noreply@anthropic.com>

When --output is given without explicit --format, infer the format from the file extension: .json → json_pp, .jsonl → json_lines, .yaml/.yml → yaml. Error only if extension is unrecognized. Update design doc to reflect this behavior. Co-Authored-By: Claude Code 2.1.63 / Claude Opus 4.6 <noreply@anthropic.com>

Add severity, id, validator, standard, and dandiset as --grouping options. Uses section headers with counts (e.g. "=== ERROR (5 issues) ===") for human output. Structured output is unaffected (always flat). Co-Authored-By: Claude Code 2.1.63 / Claude Opus 4.6 <noreply@anthropic.com>

yarikoptic · 2026-03-20T00:30:58Z

dandi/cli/tests/test_cmd_validate.py

+def test_validate_load_with_format(simple2_nwb: Path, tmp_path: Path) -> None:
+    """Test --load combined with --format."""
+    outfile = tmp_path / "results.jsonl"
+    r = CliRunner().invoke(


In general, to fix "variable defined multiple times" when an earlier assignment is unused, you remove or simplify the earlier assignment so that you no longer bind the result to the variable, while preserving any necessary side effects of the expression on the right-hand side.

Here, the best fix is to keep the first CliRunner().invoke(...) call (because it creates outfile), but stop assigning its result to r. The second assignment to r at line 320 is the one that is actually used and should remain unchanged. Concretely, in dandi/cli/tests/test_cmd_validate.py, in test_validate_load_with_format, replace r = CliRunner().invoke( at line 314 with just CliRunner().invoke( so the first invoke is called for its side effects only. No imports or additional definitions are needed.

in tests, might come handy for step by step debugging

Limit how many results are shown per leaf group (or in the flat list when no grouping is used). Excess results are replaced by a TruncationNotice placeholder — a distinct dataclass (not a ValidationResult) so consumers can isinstance() check. - TruncationNotice dataclass + LeafItem/TruncatedResults type aliases - _truncate_leaves() walks the grouped tree, caps leaf lists - Human output: "... and N more issues" in cyan - Structured output: {"_truncated": true, "omitted_count": N} sentinel - Headers show original counts including omitted items - Works without grouping (flat list) and with multi-level grouping Co-Authored-By: Claude Code 2.1.63 / Claude Opus 4.6 <noreply@anthropic.com>

yarikoptic added the cmd-validate label Mar 19, 2026

yarikoptic mentioned this pull request Mar 19, 2026

Provide Python and CLI interfaces for validation of OME zarrs ome-zarr-models/ome-zarr-models-py#183

Open

yarikoptic and others added 13 commits March 19, 2026 15:53

fix: add proper type annotations to cmd_validate helpers

cabefcf

Fix mypy errors by using IO[str] instead of object for file-like output parameters in _print_summary, _get_formatter, and _render_structured. Co-Authored-By: Claude Code 2.1.63 / Claude Opus 4.6 <noreply@anthropic.com>

github-advanced-security bot found potential problems Mar 19, 2026

View reviewed changes

yarikoptic added the enhancement New feature or request label Mar 19, 2026

yarikoptic requested a review from candleindark March 19, 2026 22:52

yarikoptic mentioned this pull request Mar 19, 2026

What if all validation records were in .jsonl files?! bids-standard/bids-examples#548

Closed

yarikoptic added UX cmd-upload labels Mar 19, 2026

yarikoptic force-pushed the enh-validators branch from 4894aa1 to 2ddd5d4 Compare March 20, 2026 00:10

yarikoptic added the minor Increment the minor version when merged label Mar 20, 2026

yarikoptic requested review from CodyCBakerPhD and bendichter March 20, 2026 00:13

yarikoptic mentioned this pull request Mar 20, 2026

Please submit pointers to your validator output formats con/validation#1

Open

yarikoptic marked this pull request as ready for review March 20, 2026 00:30

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ENH: Machine-readable validate output with store/reload#1822

ENH: Machine-readable validate output with store/reload#1822
yarikoptic wants to merge 15 commits intomasterfrom
enh-validators

yarikoptic commented Mar 19, 2026 •

edited

Loading

Uh oh!

codecov bot commented Mar 19, 2026 •

edited

Loading

Uh oh!

Check warning

Copilot Autofix

yarikoptic Mar 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

yarikoptic commented Mar 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

TODO

--max-per-group feature (Step 5)

Examples (against 147k+ validation results from bids-examples)

Test plan

Some demos

Uh oh!

codecov bot commented Mar 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Check warning

Uh oh!

Copilot Autofix

yarikoptic Mar 20, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

yarikoptic commented Mar 19, 2026 •

edited

Loading

`--max-per-group` feature (Step 5)

codecov bot commented Mar 19, 2026 •

edited

Loading