Skip to content

shouldn't pandas.Index.tolist convert from numpy datatypes to native Python datatypes? #12715

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
gdementen opened this issue Mar 25, 2016 · 11 comments
Labels
Compat pandas objects compatability with Numpy or Python functions Dtype Conversions Unexpected or buggy dtype conversions Duplicate Report Duplicate issue or pull request

Comments

@gdementen
Copy link
Contributor

I wonder if it wouldn't be better/less surprising if Index.tolist did that conversion. FWIW, I was bitten by it via xlwings which tries to send an index values to Excel via COM by using index.tolist() and since the COM layer only handles basic Python types, it breaks. The patch seems trivial (I can submit a PR if you like), but I don't know whether you'd accept that, nor if it would have any implications.

- return list(self.values)
+ return self.values.tolist()

FWIW, it seems to be related to #10904.

Code Sample, a copy-pastable example if possible

import pandas as pd
import numpy as np

df = pd.DataFrame(np.zeros((3, 4)), index=np.arange(3))
l = df.index.tolist()
print(type(l[0]))
# <class 'numpy.int64'>

Expected Output

<class 'int'>

output of pd.show_versions()

python: 3.5.1
pandas: 0.18.0
numpy: 1.10.4

@jreback
Copy link
Contributor

jreback commented Mar 25, 2016

your patch won't do anything - you still end up with int64s
you have to iterate over the list and possibly convert to Python objects
it's not hard, might be slightly non performant but could be done

@gdementen
Copy link
Contributor Author

Sorry to disagree, but I did test that patch and it does work. Index.values is a numpy array, so it use tolist on that which DOES convert to python-native dtypes (which is the reason why I think it would be preferable that Index.tolist would do it too).

@jreback
Copy link
Contributor

jreback commented Mar 25, 2016

In [1]: df = pd.DataFrame(np.zeros((3, 4)), index=np.arange(3))

In [2]: l = df.index.tolist()

In [3]: l
Out[3]: [0, 1, 2]

In [4]: df.index.tolist()
Out[4]: [0, 1, 2]

In [5]: df.index.tolist()[0]
Out[5]: 0

In [6]: type(df.index.tolist()[0])
Out[6]: numpy.int64

In [7]: list(df.index)
Out[7]: [0, 1, 2]

In [8]: list(df.index)[0]
Out[8]: 0

In [9]: type(list(df.index)[0])
Out[9]: numpy.int64

@jreback
Copy link
Contributor

jreback commented Mar 25, 2016

Something like this would work. Though I would actually write in cython to avoid any perf issues. Of course need to see where this is actually used.

Note not touching things like Timestamp/Timedelta which are subclasses of python objs already (where the ints/foats are not)

In [12]: def converter(x):
   ....:     if isinstance(x, np.integer):
   ....:         x = int(x)
   ....:     elif isinstance(x, np.float):
   ....:         x = float(x)
   ....:     return x
   ....: 

In [13]: l = [ converter(x) for x in df.index ]

In [14]: type(l[0])
Out[14]: int

@jreback
Copy link
Contributor

jreback commented Mar 25, 2016

this is a dupe of #10904 . so if you'd like to submit a pull-request, so it with that issue number.

thanks!

@jreback jreback closed this as completed Mar 25, 2016
@jreback jreback added Dtype Conversions Unexpected or buggy dtype conversions Compat pandas objects compatability with Numpy or Python functions Difficulty Novice Duplicate Report Duplicate issue or pull request labels Mar 25, 2016
@jorisvandenbossche
Copy link
Member

@jreback I think you were missing the .values in the original post:

In [14]: df = pd.DataFrame(np.zeros((3, 4)), index=np.arange(3))

In [15]: l = df.index.tolist()

In [16]: type(df.index.tolist()[0])
Out[16]: numpy.int64

In [17]: type(df.index.values.tolist()[0])
Out[17]: long

But, @gdementen this patch won't work for all dtypes (the pandas-specific I mean, eg Timestamp)
But it's indeed a duplicate

@jreback
Copy link
Contributor

jreback commented Mar 25, 2016

@jorisvandenbossche right!

actually mixed dtypes are ok (as they are object) under the hood:

In [21]: s = Series(pd.date_range('20130101',periods=3))

In [22]: s.tolist()
Out[22]: 
[Timestamp('2013-01-01 00:00:00'),
 Timestamp('2013-01-02 00:00:00'),
 Timestamp('2013-01-03 00:00:00')]

In [23]: s = Series(pd.date_range('20130101',periods=3).tolist() + [3])

In [24]: s
Out[24]: 
0    2013-01-01 00:00:00
1    2013-01-02 00:00:00
2    2013-01-03 00:00:00
3                      3
dtype: object

In [25]: s.tolist()
Out[25]: 
[Timestamp('2013-01-01 00:00:00', offset='D'),
 Timestamp('2013-01-02 00:00:00', offset='D'),
 Timestamp('2013-01-03 00:00:00', offset='D'),
 3]

In [26]: type(s.tolist()[-1])
Out[26]: int

note that this would have to address both Series.tolist() and Index.tolist() (as they are not implemented the same, though could be, e.g moved to pandas/core/base.py).

Further this actually will require a bit of testing, e.g. going thru all the dtypes to make sure conversions are handled).

@gdementen
Copy link
Contributor Author

I was about to show you like @jorisvandenbossche did when Github crashed on me (site unavailable for a few minutes). I don't see this is as a duplicate given that these are two different classes we speak about and which do not inherit from each other, but you are the maintainers, so nevermind. I spent enough time on this already, and it will apparently be easier to fix downstream.

@jreback
Copy link
Contributor

jreback commented Mar 25, 2016

@gdementen see my comment above. I would accept a general fix, so these issues are exactly the same. An Index and a Series should be as close as possible (and they are now, as most methods ARE shared, just not all). We really really try not to do specialtly fixes unless there is no other choice.

Generally fixing upstream in a good idea, depends on the complexity. Here its straightforward, so if you have the time, pls submit a PR.

@gdementen
Copy link
Contributor Author

I know about upstream. I maintain a few projects myself. But sorry, the effort/usefulness_for_my_employer ratio is too high in this case and I am running out of time. PS: sorry if this whole conversation comes out as unfriendly (I am in an awful mood these days -- I work in Brussels)

@jreback
Copy link
Contributor

jreback commented Mar 25, 2016

@gdementen no worries and be safe!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Compat pandas objects compatability with Numpy or Python functions Dtype Conversions Unexpected or buggy dtype conversions Duplicate Report Duplicate issue or pull request
Projects
None yet
Development

No branches or pull requests

3 participants