Skip to content

Conversation

@naterubin
Copy link

This PR came out of the FilmDrop sprint in which I was given time to work on improvements to the STAC workflow developer experience. I was interested in something that would give more visibility into the content of a Task beyond what can be viewed in a Terraform definition, and an idea that came out of that is a FilmDrop task registry. The metadata command that gets added here is in service of that.

The other change is a refactoring of the CLI functionality to live in its own module and not necessarily be tied to a single task. This lets the run command execute one of many tasks registered to a CLI instance, which is helpful when packaging multiple tasks in a single Docker image.

A note on a breaking change that this PR introduces: with the run command now able to execute multiple tasks, a mandatory --task argument has been added. This means that existing invocations of this command will now fail. I had thought about a solution that would allow omitting the --task argument if only one task is registered to a CLI instance, but wasn't totally sold on that approach. Any discussion is welcome.

Related Issue(s):

None

Proposed Changes:

  1. Factor CLI out of Task class into its own module.
  2. Allow run command to run one of multiple tasks, as opposed to just one.
  3. Add metadata command to output metadata of all tasks registered with CLI instance.

PR Checklist:

  • I have added my changes to the CHANGELOG or a CHANGELOG entry is not required.

@jkeifer
Copy link
Member

jkeifer commented Jan 9, 2026

First off, can you resolve the merge conflicts on this branch?

Second, I think there more of a breaking change than just the addition of --task, but that might point to an idea...

The problem is that projects using this currently have to set entrypoints or project.scripts pointing to each task's CLI method. This refactor means that project will also have to change to the new CLI there else they won't have a valid entrypoint.

Instead of making this change breaking with the remove of the existing per-task CLI, maybe we should keep the old CLI. We could add a deprecation warning to indicate to users that they'll need to migrate off it, but it wouldn't be a big maintenance burden to have both for a little while, and it allows existing projects to upgrade to get some of the recent changes without having to worry about their CLIs breaking.

Normally I'm all for breaking changes, but this is one where everything calling a stac-task will need to be updated. That's a lot of things in a lot of different places. Enabling projects to even support both CLIs for a time greatly simplifies the migration process.

#.idea/

# vim
*.swp
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not that you need to remove this, but I'd recommend setting an ignore for this in your personal global gitignore.


import fsspec
from boto3utils import s3
from pydantic import BaseModel
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pydantic is not in the project dependencies, is it? I know it is installed in the environment because it is a transitive dependency, but if we're going to pull this in as a direct dependency then we need to make sure it is explicitly part of the project dependency list.

version = "0.1.0"

input_model: BaseModel | None = None
output_model: BaseModel | None = None
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't love this. It feels like an appendix or vestigial tail...

That is, these models are useless when it comes to the actual task operation. Worse than that, the tests show that they aren't even enforcing the current payload format in any way, which means that completely invalid models could be supplied here.

Is it worth including this metadata command stuff in this PR versus maybe extracting the relevant code into an issue or something so we don't lose what you've done but don't merge it prematurely?

I think to make this change viable, we would need to either make a determination to change the payload format entirely by having no defined payload format (which we could do but I think would be a bad move, especially as part of a new version of stac-task as opposed to a completely new project), or to convert the current payload model to a pydantic one (which is probably a good move even if we do have concerns about the model long-term and might want to change it sooner than later). If we had a base payload pydantic model then these models make more sense as more strictly-defined subclasses of that base model. Until we have that I think these model additions are premature.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants