-
Notifications
You must be signed in to change notification settings - Fork 528
Enhanced server.json validation (phase 1) #636
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Enhanced server.json validation (phase 1) #636
Conversation
|
There is an issue to consider with this PR, which is that I am embedding schema.server.json and using it to determine the current version string. This is in conflict (potentially) with the hard coded model CurrentSchemaVersion and CurrentSchemaURL. If we go with the embedded schema approach and we have a workflow that assures the correct schema file is used, we should use that as the source of truth and remove the hardcoded values (IMO). Since they're used extremely widely throughout the codebase, I didn't feel comfortable making that change in this PR. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this all makes sense and I'm inclined to approve as-written + let you drive the validation roadmap here onward 🚀 Just want to hear from @rdimitrov and/or @domdomegg if they are aligned with the general approach
Edit: Thought on it some more and I think I do have one fundamental gripe here -- to align with the core MCP spec, we might want to have TypeScript as a source of truth rather than JSON schema. This actually might not really impact this PR (except the design doc), but it would introduce another step we should do soon - especially before trying to expand the JSON schema - of creating the TypeScript types source of truth.
|
@claude review this PR |
|
Claude encountered an error —— View job I'll analyze this and get back to you. |
tadasant
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Had Claude review locally:
Claude's review
Now I have enough context to provide a comprehensive review. Here's the markdown review:
PR #636 Review: Enhanced server.json validation (phase 1)
Summary
This PR adds schema validation and improves the validation infrastructure for server.json files. The main changes include:
- A new
mcp-publisher validatecommand - JSON schema validation using embedded schema files
- Exhaustive validation that collects all errors instead of failing fast
- Rich validation results with JSON paths and references
Overall Assessment
Recommendation: Request Changes - The PR has solid architectural foundations but needs refinement before merging.
What's Good 👍
1. Well-designed validation types
The ValidationIssue and ValidationResult types are clean and well-structured:
type ValidationIssue struct {
Type ValidationIssueType
Path string
Message string
Severity ValidationIssueSeverity
Reference string
}2. Backward compatibility preserved
The approach of wrapping ValidateServerJSONExhaustive with the existing ValidateServerJSON signature is smart:
func ValidateServerJSON(serverJSON *apiv0.ServerJSON) error {
result := ValidateServerJSONExhaustive(serverJSON, false)
// Returns first error, maintaining existing behavior
}3. Immutable context building
The ValidationContext pattern for tracking JSON paths is elegant:
ctx.Field("packages").Index(0).Field("transport") // → "packages[0].transport"4. Comprehensive test coverage
The tests cover multiple error scenarios, path verification, and schema resolution.
5. Thorough design documentation
The proposed-enhanced-validation.md doc is exceptionally detailed and shows careful thought about the problem space.
Issues to Address
🔴 Critical
1. Schema embedding approach is fragile
The current approach copies the schema file during build (make prep-schema), which creates several problems:
- CI/CD complexity: Requires adding
make prep-schemato CI workflow - Developer friction: Easy to forget, causing confusing
go:embederrors - Gitignored generated files: The schema is in
.gitignorebut required for compilation
Suggestion: Consider one of these alternatives:
- Check in the copied schema file and update it via a dedicated script/workflow
- Use
go generatewith a clear directive at the top ofschema.go - Fetch the schema at runtime (with caching) rather than embedding
2. Panic in production code
In cmd/publisher/commands/init.go:
currentSchema, err := validators.GetCurrentSchemaVersion()
if err != nil {
// Should never happen (schema is embedded)
panic(fmt.Sprintf("failed to get embedded schema version: %v", err))
}Never use panic in user-facing code. Even "impossible" errors should return gracefully:
if err != nil {
return apiv0.ServerJSON{}, fmt.Errorf("internal error: failed to get schema version: %w", err)
}3. Duplicate schema version checking logic
Schema version validation appears in three places with slightly different implementations:
cmd/publisher/commands/publish.gocmd/publisher/commands/validate.gointernal/validators/validators.go(inValidateServerJSONExhaustive)
This violates DRY and risks inconsistent behavior. The validate command and publish command should both delegate to a single source of truth.
🟡 Important
4. Inconsistent path format between schema and semantic validation
Schema validation produces paths like packages.0.transport while semantic validation uses packages[0].transport. This inconsistency will confuse users.
From the test:
assert.True(t, issuePaths["packages.0.packageArguments.0.format"]) // Schema style
assert.True(t, issuePaths["packages[0].version"]) // Semantic styleFix: Normalize all paths to use the bracket notation for array indices.
5. Missing --help flag handling in validate command
func ValidateCommand(args []string) error {
for _, arg := range args {
if !strings.HasPrefix(arg, "-") {
serverFile = arg
}
}
// No handling for --help
}Other commands likely have --help support. This should too.
6. Error message still references model.CurrentSchemaVersion
The semantic validator still uses model.CurrentSchemaVersion:
case !strings.Contains(serverJSON.Schema, model.CurrentSchemaVersion):But the CLI commands now use validators.GetCurrentSchemaVersion(). If these ever diverge, you'll have inconsistent validation.
7. Schema validation errors for discriminated unions are noisy
As noted in the PR description, anyOf transport validation produces confusing errors (one error per possible type). The PR acknowledges this but doesn't fix it. Consider:
- Filing a follow-up issue to track this
- Adding a comment in the code explaining the limitation
🟢 Minor/Suggestions
8. Consider structured logging for validation results
The validate command uses fmt.Fprintf with emojis. For programmatic use, consider adding a --json flag to output structured validation results.
9. Test file organization
validation_detailed_test.go is in validators_test package (external test), while existing tests appear to be internal. Consider consistency.
10. Reference format is verbose
Reference: #/definitions/Repository/properties/url/format from: [#/definitions/ServerDetail]/properties/repository/[#/definitions/Repository]/properties/url/format
This is very long. Consider:
- Truncating for display, full in
--verbosemode - Just showing the final constraint:
#/definitions/Repository/properties/url/format
11. Missing newline at end of .gitignore
# Generated schema directory for embedding
internal/validators/schema/
No trailing newline - minor but can cause issues with some tools.
Questions for Discussion
-
Schema versioning strategy: When the embedded schema is updated, how will you ensure
model.CurrentSchemaVersionstays in sync? Should there be a single source of truth? -
Linter validation timing: The design doc mentions linter validation as "phase 2". Is there a tracking issue for this?
-
Performance impact: Has exhaustive validation been benchmarked against the fail-fast approach? For large
server.jsonfiles with many packages, this could be noticeably slower. -
Breaking change path: The PR mentions eventually removing redundant semantic validators. How will you handle the transition without breaking existing error message expectations in downstream code?
Files Changed Summary
| File | Assessment |
|---|---|
internal/validators/schema.go |
Good implementation, but embedding strategy needs work |
internal/validators/validation_types.go |
Clean, well-designed types |
internal/validators/validators.go |
Good refactor, path inconsistency needs fixing |
cmd/publisher/commands/validate.go |
Needs --help, remove panic potential |
cmd/publisher/commands/init.go |
Remove panic |
cmd/publisher/commands/publish.go |
DRY violation with validate.go |
docs/explanations/proposed-enhanced-validation.md |
Excellent documentation |
Makefile |
Works but fragile |
.github/workflows/ci.yml |
Required workaround for embedding |
Recommended Next Steps
- Fix the critical issues (panic, schema embedding strategy)
- Normalize path formats between schema and semantic validation
- Consolidate schema version checking to a single location
- Add
--helpto validate command - Consider adding a follow-up issue for the
anyOferror noise problem
From its feedback, I think:
1. Schema embedding approach is fragile- I don't know enough about Go to have an informed opinion on this. Maybe @toby can help? Let's not let this block on merging but maybe there's a more elegantgo:generatepowered solution here for the long term we can do as a fast follow.2. Panic in production code- Reasonable feedback; let's fix3. Duplicate schema version checking logic- I assume this will be made DRY in phase 2+, right? Can ignore.5. Missing--helpflag handling in validate command- worth adding?6. Error message still referencesmodel.CurrentSchemaVersion`` - worth fixing I think?
Besides that, my only feedback is a few inline comments (particularly on not using the draft version) -- otherwise basically looks good to merge! Thanks so much for all the work you've put into this.
After merging upstream/main, Repository field changed from value to pointer
type. Update test file to use &model.Repository{} syntax.
|
OK, I'm back to work on this. I think the first order of business is to determine what schema versions we want to validate and where we get them (this is also related to the workflow around the embedded schema file copied at build time). Cleary using the checked-in "draft" schema is not correct (and tests added since this PR are failing because they expect 2025-10-17). In my TypeScript validator what I did was to make a GHA that runs daily and gets any new schema directories from modelcontextprotocol/static, copies them into my repo, and checks them in. The validator then runs with all published schemas and validates server.json against the schema that it specifies (with a warning if that is not the latest schema). I think it would make sense to do the same thing here: use a GHA to sync official schema versions to the validator (checking them directly into /validators/schema). This allows us to validate against all available schemas (with a warning if not current schema), and eliminates the embedding workflow step and concerns (including cleanup/gitignore). I would suggest a README.md in the schema directory to explain why those files are there and how they got there. Does that sound OK? Should we make the GHA run daily or on demand (meaning it needs to be run whenever a new schema version is published)? Is there any use case where we'd want to also be able to validate against the draft schema (as generated from the OpenAPI spec)? |
- Update test to expect namespace validation error for mismatched website URL - Update test for old schema version (now allowed, not rejected) - Fix remote-servers.mdx example to use namespace that matches remote URL domain
We could potentially do this on the
I think we'd want this ability for development? e.g. we make a big change to a schema and we want to make sure our semantic validators are playing nicely with it before locking everything in. |
|
@BobDickinson I raised the GHA line of thinking with @modelcontextprotocol/registry-wg maintainers -- feedback was: can we avoid this kind of git-ops based workflow and instead just treat the public URLs as the canonical place where these live without trying to duplicate/sync them across repos. That would mean we either on-the-fly pull the schemas for every validation we process, or we cache them on e.g. startup of the registry app. Seems reasonable and less complex than GHA flows I think? |
@BobDickinson - hey, just to add to what @tadasant shared. The idea is that we already have a const pointing to the current/latest schema supported within the registry server - registry/pkg/model/constants.go Line 41 in fdcb385
Instead of ensuring a replicated state between the two repositories (static & registry), can we use that constant that points to the latest schema and initialise the schema validator by leveraging it? |
…ity to do "default" validation and return first error as before) Renamed `ValidateServerJSONExhaustive` (that returns all results) to `ValidateServerJSON` Modified all callers of validation to handle validation results (as opposed to single error) `ValidateServerJSON` now takes validation options which indicate the validation to be performed. Since all callers doing validation now call this same function directly, and pass their own validation options, it is easy to see (and later, change) what validation is being done where by just updation the options passed. We now have a GHA to migrate "official" schemas from the static repo into our repo such that they can be directly embedded in the validator. This allows us to validate whether a schema version is valid (represents an actual offical schema) and to validate a server against the schema it indicates (when desired). The previous make targets and CI steps for schema preparation / embedding have been removed (the schemas are now checked into the validators directory). It is stil possible to test a "draft" schema locally. Instructions are included in the design doc (should we put those instuctions somewhere more visible?) Validation logic is largely unchanged (from before this PR), with the following exceptions: - Empty/absent schema version is considered an error in all cases - In cases where schema version validation is performed, we now validate not only that it is present, but that it referes to an available public schema - Consolidated "current schema version" logic to use use model.CurrentSchemaURL constant (instead of runtime extraction from draft/only schema as before) Note: Behavior change - The publish command now requires a non-empty, valid, and current schema version - The API endpoint now requires a non-empty and valid schema version (but will allow a non-current schema version, as before). Moved registry service update validation into validator to make consistent with create validation. Command-level consolidation: - Created shared runValidationAndPrintIssues() and printSchemaValidationErrors() to eliminate duplication between validate and publish commands - Both commands now use ValidateServerJSONSchema for consistent schema version validation Minor: fixed identified typos, .gitignore blank line, clarified "embedding" usage to represent go:embed context Added design documentation for future /v0/validate API endpoint Addressed (per guidance): - Fix the critical issues (panic, schema embedding strategy) - Consolidate schema version checking to a single location - Normalize path formats between schema and semantic validation - Add --help to validate command Not yet addressed: - Consider adding a follow-up issue for the anyOf error noise problem (added it do design doc, still need to create issue assuming we don't address it in this PR) Open issues: - Alternate mechanisms have been discussed for fetching the schemas at runtime (versus building them in as current) - needs further discussion
|
I implemented the git-ops workflow before I saw this feedback and committed it because it was working and I was at a good checkpoint. I think the GHA could be improved (triggered by the commit of a new schema in static, as @tadasant suggested). One note is that I want the validator to be able to know if a schema version is valid and to be able to validate non-current schemas (I think there are several uses cases for this), so this means we need all of the schemas from the static site, not just the current one. On the question of git-ops versus dynamic/runtime schema download, I think the real question is, do we want the schemas to be built-in to the applications versus dynamically fetched at runtime. Dynamically fetching the schemas has the following properties:
Anyway, I'd personally like to avoid the dynamic downloading approach for now - maybe reconsider when we have a final resolution on the upstream schema sourcing (ServerCard or otherwise). But if there's strong consensus that it needs to be dynamic, I can work on it. |
This is pretty compelling to me. And because these versions are immutable (i.e. after we publish |
Added schema validation and support for exhaustive and more detailed validation to existing validators. Added new
mcp-publisher validatecommand. No change to any existing tests or behavior other than the new command.Motivation and Context
Many currently published servers fail schema validation. Of those that pass, many have semantic/logical errors or other errors that make them impossible to configure, or their configuration when applied doesn't generate correct MCP server configuration (per their own documentation).
To address this, we need better tooling to validate servers and to inform server publishers of errors, issues, and best practices in a way that is clear and actionable, both during server development and at time of publishing.
I have started a discussion about the broader topic: #635
There is also a fairly detailed document describing the design, architecture, implementation, future plans, etc. See Enhanced Validation Design
This PR is the first step in the process of improving validation. It adds schema validation and updates all existing validators to use a new model that allows them to track context and return rich and exhaustive results. This has been done in a way that is backward compatible (ValidateServerJSON is unchanged as are all existing validation unit tests).
There is a new
mcp-publishercommand calledvalidatethat is currenty the only place that schema validation and enhanced (exhaustive/rich) validation results are exposed.Changes
Internal Validation improvements
A new
validatecommand has been added tomcp-publisherso publishers can evaluate their server.json before publishingUsage
Given a server.json that looks like this:
{ "$schema": "https://static.modelcontextprotocol.io/schemas/2025-09-29/server.schema.json", "name": "io.github.BobDickinson/registry", "description": "An MCP server that provides [describe what your server does]", "repository": { "url": "", "source": "" }, "version": "=1.0.0", "packages": [ { "registryType": "oci", "registryBaseUrl": "https://docker.io", "identifier": "registry", "version": "1.0.0", "transport": { "type": "sse" }, "packageArguments": [ { "type": "named", "name": "--mode", "description": "Operation mode", "format": "foo" } ], "environmentVariables": [ { "description": "Your API key for the service", "isRequired": true, "format": "string", "isSecret": true, "name": "YOUR_API_KEY" } ] } ] }Run the command:
mcp-publisher validate server.jsonWhich will produce the output:
Note in the results above:
Referenceprovided contains an absolute path to the schema rule that was violated, followed by the full schema path that enforced that rule (with $ref redirect values substited and enclosed in square brackets).In the final solution we would remove semantic validators that are redundant to schema validation (as in number 5 above). We have the option to use the structured data produced by schema validation to replace the generic schema validation message with a more descriptive message if that is an issue (in particular, if the only argument for an ad-hoc validator is that is produces a better/cleaner message, we would just move that message creation code to the schema error result formatter and special case it as opposed to having a redundant validator).
What's Next
If this general direction is supported and this PR is accepted, a fast follow would include:
Issues
Schema validation requires embedding the
server.schema.jsonfile into the validators package viago:embed, which is restricted from embedding files outside of the package. For this reason we copy the schema file into a schema subdir (via a prep target in the makefile) and we .gitignore it. I tried cleaning up the copy in the make clean target, but then the linter would complain about the embed target being missing if it wasn't there during development. I also considered just checking in the schema file and assuming a separate workflow for updating the embedded schema file, but it's not clear what that workflow would be or how it would be maintained. I'm open to a different approach, but the schema file does need to be embedded somehow for schema validation to work.How Has This Been Tested?
All existing validation tests pass, new tests implemented and all pass.
Breaking Changes
No breaking changes, no behavior change except new
validatecommandTypes of changes
Checklist