Skip to content

COMPAT: box int/floats in __iter__ #13258

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
1 task
jreback opened this issue May 23, 2016 · 12 comments · Fixed by #17491
Closed
1 task

COMPAT: box int/floats in __iter__ #13258

jreback opened this issue May 23, 2016 · 12 comments · Fixed by #17491
Labels
Compat pandas objects compatability with Numpy or Python functions Dtype Conversions Unexpected or buggy dtype conversions
Milestone

Comments

@jreback
Copy link
Contributor

jreback commented May 23, 2016

xref #13236 as .map was recently clarified to box ints/floats (and all other dtypes) to python/pandas types (rather than numpy scalars). __iter__ should do the same (it already is done for datetimelikes), need to add for int/floats.

Furthermore Series.tolist() also returns python types (#10904)

This will make it more consistent with pandas iteration strategy.

We also already do this for object types

In [12]: type(list(Series([Timestamp('20130101'),Timestamp('20130101',tz='US/Eastern'),1,1.0,'foo']))[2])
Out[12]: int

.tolist()

In [3]: type(Series([1,2,3]).tolist()[2])
Out[3]: int

let's fix for pure types

In [14]: type(list(Series([1,2,3]))[2])
Out[14]: numpy.int64

When solving this issue, also have to look at:

@jreback jreback added Difficulty Novice Dtype Conversions Unexpected or buggy dtype conversions Compat pandas objects compatability with Numpy or Python functions labels May 23, 2016
@jreback jreback added this to the 0.18.2 milestone May 23, 2016
@jreback
Copy link
Contributor Author

jreback commented May 23, 2016

cc @glaucouri
cc @gliptak
@sinhrks

@jreback
Copy link
Contributor Author

jreback commented May 23, 2016

In [4]: arr = np.array([1,2,3])

In [7]: type(list(arr)[0])
Out[7]: numpy.int64

In [9]: type(arr.tolist()[0])
Out[9]: int

interesting that numpy converts with .tolist() but NOT with list(..); seems very odd to me.

@jreback
Copy link
Contributor Author

jreback commented May 23, 2016

cc @njsmith do you know why numpy is like this? (my last comment)
@shoyer

@njsmith
Copy link

njsmith commented May 23, 2016

There was a big debate in the early days about whether numpy indexing should return python scalars or what. The compromise was that it doesn't, but there is also this bolted on notion of "convert to a Python style object" put in to appease the other side. This python conversion API lost the battle for users and mindshare, and these days no one uses intentionally or even realizes that it's in there, but it's still in there. Specifically the python object API is .item(...) for index-and-convert-to-python, and .tolist() as a shorthand for, well, .item(...) (I.e. passing in a literal ellipsis analogous to arr[...], so, converting the whole array recursively to python objects.) That's what you're invoking with .tolist(). The names of these functions are very misleading.

OTOH list(arr) just uses list's usual convert-from-iterable logic, so you get the expected thing.

@glaucouri
Copy link

To be honest, i'm puzzled as Jeff, the question is: Why in numpy algo you must convert a ndarray to a python list? Is this unequivocally to go in a python types (so is better to convert to native data types) ?

I think is more clear to mantain always native types, without do any automatic conversion.
"""Explicit is better than implicit"""

Eventually if someone need to exit from this logic, he know exaclty what is doing, and why.
Probably is more clear a method for this purpose , something like:
to_py_list()
iter_py()

I appreciate this approach used in pandas for Timestamp.

my 50 cents.
Gla

@sinhrks
Copy link
Member

sinhrks commented May 26, 2016

+1. When we iterate numpy dtype, can use .values

@jstray
Copy link

jstray commented Sep 1, 2017

This has just bitten me in a surprising way, along the lines of #16048: converting a dataframe to a dict results in an object that cannot be converted to json.

import pandas as pd
import json

t = pd.DataFrame({'A':[1], 'B':[2]})
d = t.to_dict(orient="records") 

// at this point type(d[0]['A']) = numpy.int64

json.dumps(d)  // fails with "TypeError: 2 is not JSON serializable"

There seems to be no easy to way to generate a dict of purely Python types. Maybe one could hack something together using np.asscalar()? But it seems like this should not be necessary. Perhaps an option to use only native Python types in to_dict()?

And before someone suggests t.to_json(), that won't work if you want the dataframe to be just one part of the json output. I've resorted to concatenating strings to produce my final json result. Like a savage.

@jreback
Copy link
Contributor Author

jreback commented Sep 1, 2017

@jstray actually there is an easy way

PR to fix this issue! it's pretty straightforward :)

@jreback jreback modified the milestones: Interesting Issues, Next Major Release Sep 1, 2017
@jstray
Copy link

jstray commented Sep 1, 2017

@jreback what is the easy way?

@jreback jreback modified the milestones: 0.21.0, Interesting Issues Sep 10, 2017
@jstray
Copy link

jstray commented Sep 12, 2017

Thanks @jreback for your fix! This was way deeper into the Pandas codebase than I was qualified to go.

@makmanalp
Copy link
Contributor

Thanks @jreback ! This saves my butt!

@alexlenail
Copy link
Contributor

HUUUGE +1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Compat pandas objects compatability with Numpy or Python functions Dtype Conversions Unexpected or buggy dtype conversions
Projects
None yet
Development

Successfully merging a pull request may close this issue.

8 participants