-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
COMPAT: .map iterates over python types rather than storage type #13236
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
But the underlyng data type IS a numpy dtype. In this example is reasonable to have all float32 dtype, the same type.
thank you |
So < 0.18.1 we were NOT using
I think this is more correct actually (I don't think this was tested before), and did not provide a guarantee (either way) of what types it would result. cc @sinhrks |
I will reopen for discussion. |
Assuming
I feel it is natural numeric coerces to python repr also. Can u provide an usage which |
yeah I think this only matters for non-extension & i8 types (e.g. int/float) I think it does make sense to return native types ( Let's update the doc-string to indicate this? so repurposing this issue. |
@glaucouri want to do a pull-request? |
Before making the proposal i need to understand well what do you mean with: " it does make sense to return native types" . Why is preferred to convert any single element of a series if is not explicitly required? Probably is more efficient to iterate over native values and let user to cast only if necessary (with the usefull method map). Thank you |
u shouldn't use |
When iterating you want to have native types if at all possible so the user doesn't need to even think about this. numpy types are not normal for iterating in python; yes they are the 'same' but they often don't have the right methods or behaviors, esp for things like ``np.timedelta/np.datetime`, not to mention extension types. Since we are already converting to pythonic dtypes, makes sense to complete with int/float. |
This is true, map is not for performance. Why you expect that the map iterator must change underlying types (sometimes)? In the example above, if i cast a series to float64 i expect the type i've casted to in the same Thank you |
@glaucouri well, pandas is not simply a layer over numpy; it hides numpy more and more (and will continue to do so). The iterator type will always change the type to a python one. I don't see any argument not to do this. Essentially you are creating scalars, python scalars are perfectly fine types, vastly superior to numpy scalars. Appreciate If you would like to update the documentation. so |
Hi Jeff, Do you want to extend the box-python-type behaviour to all kind of iteration over Series? Take the first example:
And now we try some kind of iteration on it
If i'm not wrong some of these examples call explicitly iter but actually works differenlty from map. Do i missing something? Thank you Jeff, i will work on docs (just figured out how to do it). Gla |
@glaucouri use master and you will see that |
Ok, I did not realize they were already available in master.
So the approach is to implement a kind of 'python nativization' explicitly only with tolist and map method ? To be honest i prefer a solution where the type is not changed anyway, Moreover this casting has a non negligible ~20% overhead
Thank you again Jeff. |
@glaucouri see #13258 to fix the iteration. if you care about perf you are going about this the wrong way. |
you should provide enough reason to break existing user's code using numpy dtype. |
Code Sample, a copy-pastable example if possible
pandas 0.18+numpy 0.10:
pandas 0.18.1+numpy 0.11.0:
I expect to get the same dtype for the 3 print, why this is changed in last version?
output of
pd.show_versions()
Thank you
Gla
The text was updated successfully, but these errors were encountered: