Skip to content

Make index check on statespace data less strict #434

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Mar 19, 2025

Conversation

jessegrabowski
Copy link
Member

@jessegrabowski jessegrabowski commented Mar 15, 2025

Closes #384

I was assuming that any pandas index of integers was a RangeIndex; this will now accept anything.

One issue is that if the index has holes (like the index goes 0, 1, 3, 4, ...), it will just be totally ignored, and internally we'll treat it as though it was continuous. I might be able to check for this, but it might also be the type of thing that you have to trust users to do the right thing with. Open to suggestions.

@ricardoV94
Copy link
Member

For the holes can you check if len = (max - min + 1)?

Copy link
Contributor

@AlexAndorra AlexAndorra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @jessegrabowski ! Looks good, modulo Ricardo's comment.

One issue is that if the index has holes (like the index goes 0, 1, 3, 4, ...), it will just be totally ignored, and internally we'll treat it as though it was continuous. I might be able to check for this, but it might also be the type of thing that you have to trust users to do the right thing with. Open to suggestions.

Should we error out and tell users they should pass in the missing data points so that they are inferred? Or at least display a warning that highlights that with this behavior, all data points are assumed to be linearly spaced?

@jessegrabowski
Copy link
Member Author

The thing already warns pretty aggressively, so I'm not really inclined to go that route (if anything I think we should go the other way and remove some of the useless warnings).

I don't have a good sense of whether an index [1, 2, 3, 5] is a valid time series index. What to do hinges on the answer to that question.

@AlexAndorra
Copy link
Contributor

I don't have a good sense of whether an index [1, 2, 3, 5] is a valid time series index

My intuition is that it's a shorthand for [1, 2, 3, NaN, 5], so if statespace treats it like [1, 2, 3, 4] , we might want to error out and ask the user to provide the explicit time index, i.e [1, 2, 3, 4, 5]. Does that make sense?

@jessegrabowski jessegrabowski merged commit 257fcec into pymc-devs:main Mar 19, 2025
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Check on Index of data too strict in statespace
3 participants