-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
[WIP] Imprecise indexer #22043
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP] Imprecise indexer #22043
Conversation
Hello @WeatherGod! Thanks for updating the PR.
Comment last updated on July 25, 2018 at 21:19 Hours UTC |
Is this in reference to any particular issue or discussion? |
you can already do this with reindex what is the usecase? |
The most relevant issues are probably #9530 and #9817, as well as pydata/xarray#2217 downstream. The use case here is the ability to make indexes that always do alignment using a tolerance. Pandas' current automatic alignment is not so useful when using floating point indexes, because that alignment is done without any consideration of near matches. |
Right, to summarize a bit, quite often with float64 indexes, you have two indexes which logically have similar keys, but because they were computed slightly differently, or came from different sources, they aren't binary identical. I first tried implementing this just within Float64Index, but quickly ran into issues where I needed support implemented within the base class. Of course, once that happened, well, you need to implement a lot of this up into the other classes as well. The basic premise of the design is that explicit will still always override implicit (which would be the tolerance attribute), which is why tolerance was added as an argument to many of the set operations. Also, any resulting indexes from these operations will have the tolerance that was used be set for its own tolerance attribute. |
a48b7d0
to
9701987
Compare
* took care of wrappers in datetimes and interval * fix tolerance handling in extended dtype index construction * fix unpickling of old pickles and a bug in numeric index unpickling * fix tolerance for constructor delegation in `__new__`.
91fdf83
to
c3e583c
Compare
My employeer has changed priorities for me, so I have been unable to pursue this work any further, and I don't foresee any free time to spend on this. I hope someone else can take this work further, even if it is just going through and adding documentation. The other major effort needed in this PR is to update the cython helpers for tolerance support, and unit tests. |
nice idea. PR is stale. if you'd like to continue, pls ping. |
This is a work-in-progress to add a
tolerance
attribute to theIndex
class, and to plumb its use throughout the Index machinery. My immediate goal at this point is to not break anything. There is still a lot more work to do before this is ready for prime time, but hopefully I can get some inputs on best practices for mucking about in such deep internals of pandas.git diff upstream/master -u -- "*.py" | flake8 --diff