ENH: Machine-readable validate output with store/reload#1822
ENH: Machine-readable validate output with store/reload#1822yarikoptic wants to merge 15 commits intomasterfrom
Conversation
Design plan for enhancing `dandi validate` with: - Structured output formats (-f json/json_pp/json_lines/yaml) - Auto-save _validation.jsonl sidecar alongside .log files - --load to reload/re-render stored results with different groupings - Upload validation persistence for later inspection - Extended grouping options (severity, id, validator, standard, dandiset) - Refactoring into dandi/validate/ subpackage (git mv separately) - _record_version field on ValidationResult for forward compatibility - VisiData integration via native JSONL support Addresses #1515, #1753, #1748; enhances #1743. Co-Authored-By: Claude Code 2.1.63 / Claude Opus 4.6 <noreply@anthropic.com>
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #1822 +/- ##
==========================================
+ Coverage 75.12% 76.06% +0.93%
==========================================
Files 84 87 +3
Lines 11930 12457 +527
==========================================
+ Hits 8963 9475 +512
- Misses 2967 2982 +15
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
…e/ subpackage Pure file move with no content changes, plus __init__.py re-exports for backward compatibility. Imports will be updated in the next commit. Co-Authored-By: Claude Code 2.1.63 / Claude Opus 4.6 <noreply@anthropic.com>
Update imports across 13 files to use the new subpackage structure: - dandi.validate_types → dandi.validate.types - dandi.validate → dandi.validate.core (for explicit imports) - Relative imports adjusted accordingly Co-Authored-By: Claude Code 2.1.63 / Claude Opus 4.6 <noreply@anthropic.com>
- test_validate.py → dandi/validate/tests/test_core.py - test_validate_types.py → dandi/validate/tests/test_types.py - Update relative imports in moved test files - Fix circular import: don't eagerly import core in __init__.py Co-Authored-By: Claude Code 2.1.63 / Claude Opus 4.6 <noreply@anthropic.com>
…ate CLI Decompose the monolithic validate() click command into helpers: - _collect_results(): runs validation and collects results - _filter_results(): applies min-severity and ignore filters - _process_issues(): simplified, no longer handles ignore (moved to _filter) No behavior changes; all existing tests pass unchanged. Co-Authored-By: Claude Code 2.1.63 / Claude Opus 4.6 <noreply@anthropic.com>
Add record_version: str = "1" for forward-compatible serialization. Uses no underscore prefix since Pydantic v2 excludes underscore-prefixed fields from serialization. Co-Authored-By: Claude Code 2.1.63 / Claude Opus 4.6 <noreply@anthropic.com>
Add -f/--format {human,json,json_pp,json_lines,yaml} to produce
structured output using existing formatter infrastructure. Structured
formats suppress colored text and 'No errors found' message. Exit
code still reflects validation results.
Co-Authored-By: Claude Code 2.1.63 / Claude Opus 4.6 <noreply@anthropic.com>
- Create dandi/validate/io.py with write/append/load JSONL utilities and validation_sidecar_path() helper - Add -o/--output option to write structured output to file - Auto-save _validation.jsonl sidecar next to logfile when using structured format without --output Co-Authored-By: Claude Code 2.1.63 / Claude Opus 4.6 <noreply@anthropic.com>
Add --summary/--no-summary flag that shows statistics after validation: total issues, breakdown by severity, validator, and standard. For human output, printed to stdout; for structured formats, printed to stderr. Also refactors _process_issues into _render_human (no exit) + _exit_if_errors, keeping _process_issues as backward-compatible wrapper. Co-Authored-By: Claude Code 2.1.63 / Claude Opus 4.6 <noreply@anthropic.com>
Add --load to reload previously-saved JSONL validation results and re-render them with different formats/filters/grouping. Mutually exclusive with positional paths. Exit code reflects loaded results. Skip auto-save sidecar when loading. Co-Authored-By: Claude Code 2.1.63 / Claude Opus 4.6 <noreply@anthropic.com>
- Add validation_log_path parameter to upload() - In upload validation loop, append results to sidecar via append_validation_jsonl() when validation_log_path is set - CLI cmd_upload derives sidecar path from logfile and passes it Co-Authored-By: Claude Code 2.1.63 / Claude Opus 4.6 <noreply@anthropic.com>
Fix mypy errors by using IO[str] instead of object for file-like output parameters in _print_summary, _get_formatter, and _render_structured. Co-Authored-By: Claude Code 2.1.63 / Claude Opus 4.6 <noreply@anthropic.com>
When --output is given without explicit --format, infer the format from the file extension: .json → json_pp, .jsonl → json_lines, .yaml/.yml → yaml. Error only if extension is unrecognized. Update design doc to reflect this behavior. Co-Authored-By: Claude Code 2.1.63 / Claude Opus 4.6 <noreply@anthropic.com>
Add severity, id, validator, standard, and dandiset as --grouping options. Uses section headers with counts (e.g. "=== ERROR (5 issues) ===") for human output. Structured output is unaffected (always flat). Co-Authored-By: Claude Code 2.1.63 / Claude Opus 4.6 <noreply@anthropic.com>
dandi/cli/tests/test_cmd_validate.py
Dismissed
| def test_validate_load_with_format(simple2_nwb: Path, tmp_path: Path) -> None: | ||
| """Test --load combined with --format.""" | ||
| outfile = tmp_path / "results.jsonl" | ||
| r = CliRunner().invoke( |
Check warning
Code scanning / CodeQL
Variable defined multiple times Warning test
Show autofix suggestion
Hide autofix suggestion
Copilot Autofix
AI about 17 hours ago
In general, to fix "variable defined multiple times" when an earlier assignment is unused, you remove or simplify the earlier assignment so that you no longer bind the result to the variable, while preserving any necessary side effects of the expression on the right-hand side.
Here, the best fix is to keep the first CliRunner().invoke(...) call (because it creates outfile), but stop assigning its result to r. The second assignment to r at line 320 is the one that is actually used and should remain unchanged. Concretely, in dandi/cli/tests/test_cmd_validate.py, in test_validate_load_with_format, replace r = CliRunner().invoke( at line 314 with just CliRunner().invoke( so the first invoke is called for its side effects only. No imports or additional definitions are needed.
| @@ -311,7 +311,7 @@ | ||
| def test_validate_load_with_format(simple2_nwb: Path, tmp_path: Path) -> None: | ||
| """Test --load combined with --format.""" | ||
| outfile = tmp_path / "results.jsonl" | ||
| r = CliRunner().invoke( | ||
| CliRunner().invoke( | ||
| validate, | ||
| ["-f", "json_lines", "-o", str(outfile), str(simple2_nwb)], | ||
| ) |
There was a problem hiding this comment.
in tests, might come handy for step by step debugging
Limit how many results are shown per leaf group (or in the flat list
when no grouping is used). Excess results are replaced by a
TruncationNotice placeholder — a distinct dataclass (not a
ValidationResult) so consumers can isinstance() check.
- TruncationNotice dataclass + LeafItem/TruncatedResults type aliases
- _truncate_leaves() walks the grouped tree, caps leaf lists
- Human output: "... and N more issues" in cyan
- Structured output: {"_truncated": true, "omitted_count": N} sentinel
- Headers show original counts including omitted items
- Works without grouping (flat list) and with multi-level grouping
Co-Authored-By: Claude Code 2.1.63 / Claude Opus 4.6 <noreply@anthropic.com>
4894aa1 to
2ddd5d4
Compare
Summary
Design plan for machine-readable
validateoutput with store/reload capability. Adds structured output formats, automatic persistence of validation results alongside log files, and the ability to reload and re-render results with different grouping/filtering options.Key design decisions:
All structured formats (json, json_pp, json_lines, yaml) emit a uniform flat list of
ValidationResultrecords — no envelope/non-envelope splitJSONL as the primary interchange format:
cat/jq/grep/vd(VisiData) composable_record_versionfield on each record for forward-compatible deserializationGrouping affects human display only; structured output is always a stable flat schema
Auto-save
_validation.jsonlsidecar next to existing.logfilesRefactor
dandi/validate.py+dandi/validate_types.pyintodandi/validate/subpackageCloses validate: Add -f|--format option to optionally serialize into json, json_pp, json_lines or yaml #1515
Closes Provide easy means for introspecting upload validation failures #1753
Enhances Add filtering of issues by type/ID or by file location #1743, upload,validate: Add --validators option #1737, Tidy up the
validatecommand function incmd_validate.py#1748TODO
dandi/validate/subpackage (git mvcommitted separately from import updates)cmd_validate.py— extract_collect_results(),_filter_results(),_render_results()_record_versiontoValidationResult--format(-f) option:human|json|json_pp|json_lines|yaml--output(-o) + auto-save_validation.jsonlsidecar--summaryflag--load(multiple paths, mutually exclusive with positional args)dandi uploadseverity,id,validator,standard,dandiset--max-per-grouptruncation — cap results per leaf group with placeholder notice--max-per-groupfeature (Step 5)Limits how many results are shown per leaf group (or in the flat list when no grouping). Excess results are replaced by a
TruncationNoticeplaceholder — a distinct data structure (not aValidationResult), so it won't be confused with real results if the output is saved/reloaded.Examples (against 147k+ validation results from bids-examples)
Flat truncation —
--max-per-group 5with no grouping:Grouped truncation —
-g severity --max-per-group 3:and actually those are colored if output is not redirected
Multi-level leaf-only truncation —
-g severity -g id --max-per-group 2:Structured output —
-g severity -f json_pp --max-per-group 2emits_truncatedplaceholders:{ "ERROR": [ { "id": "DANDI.NO_DANDISET_FOUND", "severity": "ERROR", ... }, { "id": "BIDS.NIFTI_HEADER_UNREADABLE", "severity": "ERROR", ... }, { "_truncated": true, "omitted_count": 9567 } ], "HINT": [ { "id": "BIDS.JSON_KEY_RECOMMENDED", "severity": "HINT", ... }, { "id": "BIDS.JSON_KEY_RECOMMENDED", "severity": "HINT", ... }, { "_truncated": true, "omitted_count": 138015 } ] }Headers show original counts (e.g. "9569 issues") even when only a few are displayed. The
_truncatedsentinel follows the_record_versionnaming convention for metadata fields.Test plan
--formatoutput viaclick.CliRunnerValidationResultJSONL--loadwith multi-file concatenation, mutual exclusivity enforcement--outputis used--max-per-groupflat truncation, grouped truncation, multi-level, JSON placeholder, no-truncation when under limit_truncate_leaves()helperSome demos
See also
dandi validateacross bids-examples and then using visidata for navigation of composite dump of recordsGenerated with Claude Code