-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
API: Inferring dtype from iterables in pandas vs numpy #47673
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
By @simonjayhawkins #47294 (comment)
|
|
By @simonjayhawkins #47294 (comment)
|
Speaking generally, in situations where pandas and NumPy disagree and there is no clear choice that is best, I think we should value consistency with NumPy. If we think the NumPy choice is odd, even for NumPy itself, we should try to raise the issue with them. However the use cases for NumPy vs pandas can be different, and so there might be a choice which is sensible for NumPy and a different choice which is sensible for pandas. I do not know of any such examples currently. For the particular case @simonjayhawkins raised above, e.g. |
To add to the discussion, groupby-apply with a user-defined function and integers is currently broken in pandas (1.4.2). E.g. aggregating over a column with a dtype like uint16, would do, uh, this: pd.Series([np.uint16(1), np.uint16(41_000),])
Which is definitely not "the right thing"... |
There are a number of situations where pandas must take an iterable and infer a single dtype from the data it contains. Two examples are Series/DataFrame construction and
groupby.apply
when provided a user defined function (UDF).In #47294 there was some discussion on how pandas treats this situation vs numpy, the relevant parts repeated below. I'm moving this to its own issue for better tracking.
The text was updated successfully, but these errors were encountered: