Represent statespace metadata with dataclasses #607

Dekermanjian · 2025-11-02T17:50:36Z

This is a draft proposal for #598

The idea is to handle each component separately using _set_{component} methods and all information are stored using data classes for easy mapping.

I believe this will simplify our tests of these components and will reduce redundancies where we have the same information spread across multiple sub-components like data_names and data_info.

@jessegrabowski let me know what you think I put a little notebook together to showcase the changes.

review-notebook-app · 2025-11-02T17:50:42Z

Check out this pull request on

See visual diffs & provide feedback on Jupyter Notebooks.

Powered by ReviewNB

jessegrabowski

This is a great first pass, much cleaner than what we have now.

pymc_extras/statespace/models/structural/components/regression_dataclass.py

jessegrabowski · 2025-11-06T20:06:12Z

We can also keep all of the existing properties like state_names, shock_names, state_dims, etc, but move them to the base class and just extract the requested info from the relevant Info objects.

jessegrabowski · 2025-11-07T14:10:12Z

Reflecting on it, I am convinced this is the way to go. It's 1000x more ergonomic. I made some changes to your initial code to make the API more "dictionary like", and to reduce code duplication. I moved everything to statespace/core/properties.py, because this is ultimately going to replace what we have in both the core models and the components.

Dekermanjian · 2025-11-07T14:37:48Z

@jessegrabowski, this is looking really cool! What can I do to help push this forward?

jessegrabowski · 2025-11-07T15:04:06Z

Delete the new regression_dataclass.py and simply refactor regression.py to use the new stuff.

We should keep your notebook with the plan to add it as a new example for the docs. Or it can be merged into the custom statespace notebook. So that should also be updated to import from the new properties.py file

Dekermanjian · 2025-11-07T15:15:27Z

Perfect! I'll work on that today!! It is really looking cool!

…sing the new dataclasses API

pymc_extras/statespace/core/properties.py

pymc_extras/statespace/models/structural/components/regression.py

pymc_extras/statespace/models/structural/components/regression_dataclass.py

pymc_extras/statespace/models/structural/core.py

pymc_extras/statespace/models/utilities.py

tests/statespace/core/test_properties.py

tests/statespace/models/structural/components/test_regression.py

tests/statespace/models/structural/conftest.py

Dekermanjian · 2025-11-11T12:20:08Z

@jessegrabowski, I agree with all of your comments above. I am going to start making those changes.

…uplicate with warning 2. removed unnecessary imports from __init__ after deleting regression_dataclass 3. updated components and structural classes to only utilize dataclasses and pull other objects from <foo>_info dataclasses 4. updated tests to conform to dataclass api

pymc_extras/statespace/core/properties.py

2. created tests for add and merge methods 3. added utility to convert from snake to pascal and integrated it in error messaging

… and placed default shoch and state setters

jessegrabowski

Incomplete review, I'll continue tomorrow AM

jessegrabowski · 2025-12-08T01:24:41Z

pymc_extras/statespace/core/properties.py

+            # if key in index:
+            #     raise ValueError(f"Duplicate {self.key_field} '{key}' detected.") # This needs to be possible for shared states


That shouldn't happen here though, it should come up in merge or add right? And we handle it there with the allow_duplicates flag

I think what happens is because our data classes are immutable the __post_init__ runs right after our merge/add because we always return new objects of the same dataclass and it see that there are duplicate keys even though the merge/add method had allowed them via allow_duplicates.

jessegrabowski · 2025-12-08T01:25:41Z

pymc_extras/statespace/core/properties.py

+            raise AttributeError(f"Items missing attribute '{self.key_field}': {missing_attr}")
+        object.__setattr__(self, "_index", index)
+
+    def _key(self, item: T) -> str:


Is this used?

Doesn't look like we are using this at the moment. I think it could come in handle later on to check for specific attributes.

pymc_extras/statespace/core/properties.py

pymc_extras/statespace/models/structural/components/regression.py

pymc_extras/statespace/models/structural/core.py

jessegrabowski · 2025-12-13T00:09:16Z

pymc_extras/statespace/models/structural/core.py

-        self.state_names = list(state_names) if state_names is not None else []
-        self.observed_state_names = (
-            list(observed_state_names) if observed_state_names is not None else []
+        self.param_info = ParameterInfo(


Should we change the component signature to just take the Info objects directly?

I think that would be nice. Right now, there is this intermediate conversion step that needs to take place.

Ok let's leave it for the next PR then and make the priority here just getting each model to be represented in the new way

pymc_extras/statespace/models/structural/core.py

pymc_extras/statespace/utils/message_tools.py

…_duplicates is False 2. converted component attributes into properties 3. removed _combine_property method 4. removed redundant observed_states property 5. fixed indentation bug

Dekermanjian · 2025-12-22T23:22:25Z

Hey @jessegrabowski, by switching a lot of the component attributes to properties I was able to simplify a good amount of downstream methods. If you don't mind taking a look at the current state of this before I go ahead and do the same with the rest of the SSM components.

jessegrabowski · 2025-12-22T23:38:25Z

pymc_extras/statespace/models/structural/components/regression.py


        self.coords_info = CoordInfo(coords=[regression_state_coord, endogenous_state_coord])

    def populate_component_properties(self) -> None:


This method won't be unique to regression right? We will want to move it up to the base class.

@jessegrabowski, in the base class there is a populate_component_properties method that raises a NotImplemented. Did you want to replace that with a generic method that sets _set_<foo> for the 2 defaults (shocks and states) that we provide?

jessegrabowski · 2025-12-22T23:41:34Z

pymc_extras/statespace/models/structural/core.py

            self.make_symbolic_graph()

-        self._component_info = {
+        self._component_info = {  # Should this be a dataclass??


Probably :)

Let's leave it for now though. The mission creep on this PR is already pretty terrible.

jessegrabowski · 2025-12-22T23:45:33Z

pymc_extras/statespace/models/structural/core.py

        raise NotImplementedError

-    def _set_shocks(self) -> None:
+    def _set_shocks(self) -> ShockInfo:


Make a comment in each of these methods that these are generic defaults that can/should be replaced with component specific logic. I got momentarily confused because they are called _set but they actually return info (i.e. they do a get, not a set). The name makes sense in the context of populate_component_properties`, but not in a vacuum; hence the comment.

Oh, that is right. These shouldn't be doing a get they should actually be doing a set. I will fix it and put in a docstring with the explanation that these are defaults that should be replaced.

jessegrabowski · 2025-12-22T23:46:37Z

Yeah it looks really great! Go ahead and do the others. Excited to get this over the finish line

proposal for updating propogate_component_properties using data classes

a4dacd8

Dekermanjian marked this pull request as draft November 2, 2025 17:50

Dekermanjian added enhancements New feature or request request discussion statespace labels Nov 2, 2025

jessegrabowski reviewed Nov 6, 2025

View reviewed changes

Iterate on proposal

7f32a48

jessegrabowski changed the title ~~proposal for updating propogate_component_properties using data classes~~ Represent statespace metadata with dataclasses Nov 7, 2025

Jesse Grabowski and others added 3 commits November 7, 2025 11:45

Fix iterator, add to_dict method to CoordsInfo

d65fc0a

Add observed_states helper to StateInfo

c6a48fc

made necessary changes to get the regression component test to pass u…

92e333f

…sing the new dataclasses API

jessegrabowski requested changes Nov 11, 2025

View reviewed changes

jessegrabowski reviewed Nov 15, 2025

View reviewed changes

pymc_extras/statespace/core/properties.py Outdated Show resolved Hide resolved

Dekermanjian added 2 commits November 16, 2025 09:44

1. added add and merge methods to base class

228acff

2. created tests for add and merge methods 3. added utility to convert from snake to pascal and integrated it in error messaging

removed data & coords setters in _set<foo> medthod in Component class…

1ae433f

… and placed default shoch and state setters

jessegrabowski requested changes Dec 8, 2025

View reviewed changes

jessegrabowski requested changes Dec 13, 2025

View reviewed changes

1. updated properties base class to handle duplicate names when allow…

834f96a

…_duplicates is False 2. converted component attributes into properties 3. removed _combine_property method 4. removed redundant observed_states property 5. fixed indentation bug

jessegrabowski reviewed Dec 22, 2025

View reviewed changes

		# if key in index:
		# raise ValueError(f"Duplicate {self.key_field} '{key}' detected.") # This needs to be possible for shared states


		self.coords_info = CoordInfo(coords=[regression_state_coord, endogenous_state_coord])

		def populate_component_properties(self) -> None:

Represent statespace metadata with dataclasses #607

Are you sure you want to change the base?

Represent statespace metadata with dataclasses #607

Uh oh!

Conversation

Dekermanjian commented Nov 2, 2025

Uh oh!

review-notebook-app bot commented Nov 2, 2025

Uh oh!

jessegrabowski left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jessegrabowski commented Nov 6, 2025

Uh oh!

jessegrabowski commented Nov 7, 2025

Uh oh!

Dekermanjian commented Nov 7, 2025

Uh oh!

jessegrabowski commented Nov 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Dekermanjian commented Nov 7, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Dekermanjian commented Nov 11, 2025

Uh oh!

Uh oh!

jessegrabowski left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Dekermanjian commented Dec 22, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

jessegrabowski commented Nov 7, 2025 •

edited

Loading