-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
DOC: better explain the automatic alignment process #49939
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I wouldn’t object to deprecating automatic alignment |
Really? I hadn't even realised this was on the table. If so, then I love this idea. Would the idea be:
Like this, then advanced users who really use the power of indices just need to add an extra This would also be similar to @blazespinnaker 's comment #49694 (comment) , except that instead of there being a global option to control this, users would get a loud and clear error. In which case, thanks @blazespinnaker , and I'm sorry for having said that your comment was off-topic (I still think it's better to keep this discussion separate from PDEP0005 though) Example of where to get to: >>> ser1 = pd.Series([1,2,3])
>>> ser2 = pd.Series([4, 1, 2], index=[0, 1, 3])
>>> ser1 + ser2
---
ValueError: Operands are not aligned. Do `left, right = left.align(right, axis=0, copy=False)` before operating. |
I guess it's time for another @pandas-dev/pandas-core @pandas-dev/pandas-triage tag ... before putting together another PDEP, anyone have any initial thoughts on deprecating automatic alignment? I like the idea, as it would mean:
|
Related #47554 One thing to consider is that this would mess up doing chaining. For example, right now, you can do There are some advantages to automatic alignment, in terms of when exploring data, it helps you identify missing data quite easily. |
For chaining, you could do functools.reduce(lambda lhs, rhs: lhs + rhs, s1.align(s2)).dropna() This is kinda advanced, perhaps, but I think it's only advanced users that intentionally rely on automated alignment anyway. For identifying missing data, I think this would be even better - a loud and clear error immediately alerts you of missing data, whereas silent automated alignment introduces |
I'm pretty negative on deprecating this, to me auto-alignment is one of the key features of pandas that makes data wrangling significantly easier. I think having In my use, data at various levels of aggregation for products can be uniquely indexed by
This is natural and pleasant. Without this, each operation needs to first join to a temporary frame. |
That's a nice example, thanks Richard! An alternative suggested by Brock was to only deprecate auto-alignment in dunder operations. So in your example,
work throw an error, but
would work as it currently does This would retain the natural and pleasant functionality for advanced users, whilst ending up with fewer surprises for others |
Thanks @MarcoGorelli - while still having a way use these dunders with alignment does make me less negative to this proposal, I still think there are issues. Please correct any of these if they are wrong!
More generally, it seems to me the main motivation behind this proposal is that it would make pandas easier for new users. If there are other motivations (@jbrockmendel - I'm curious what your motivation is in particular), I think they would be good to identify. While I'm all for making pandas easier new users, I do not think we should be doing so at the expense of expert usage. |
I am not actively advocating the idea. I suggested it as an alternative to the NoIndex-mode given that the motivation seemed to be "automatic alignment is a major pain point". |
A place where documentation could be very helpful is in auto alignment and correlation. The results can be very surprising at their confidence even though what you're doing makes no sense at all. |
Thanks @rhshadrach , those are some valid points You're right about what my main motivation is Maybe we just need to document this more clearly, with a visible note in every operation which aligns redirecting to some page in the user guide I've updated this to be a docs issue. If anyone would like to work on it, please do comment, happy to help out, it would be good to get better docs on this one |
take |
Hey @MarcoGorelli I can take this up. I would need some guidance though :) |
nice, thanks! I think all that's needed are some notes saying that in general, pandas operations align on the index, perhaps starting with |
take |
@MarcoGorelli once this looks fine, I can create PRs for other functions. |
@MarcoGorelli please review the PR when you get time. |
Pandas version checks
main
hereLocation of the documentation
Throughtout https://pandas.pydata.org/pandas-docs/stable/index.html
Documentation problem
Originally reported by @blazespinnaker here
Suggested fix for documentation
Originally reported by @blazespinnaker here
The text was updated successfully, but these errors were encountered: