-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
BUG: Wrong dtype using range
in DataFrame constructor on Windows
#16804
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I believe we're following the behavior of numpy here, where the size of
|
Though that not really a good explanation to the user for why the list would be int64, while the |
This is annoying, but agree with @TomAugspurger it's correct. Ultimately the range object is passed on to numpy which expands using the
|
I suppose we could intercept it and force |
we structure our tests to not use range at all for this reason however it is possible to change this by explicitly intercepting a range object in the Series constructor and introspecting the indices (we do this for RangeIndex already) so we could mark this if someone wants to do it |
Yeah, I actually think this would be a good idea. Not a big deal, but also perf to be picked up, as apparently numpy expands the list. In [24]: r = range(1000000)
In [25]: %timeit np.array(r)
139 ms ± 755 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
In [27]: %timeit np.arange(r.start, r.stop, r.step)
1.36 ms ± 8.26 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) |
When using
range
(python 3.5) in theDataFrame
constructor I get different dtypes depending on the system I'm running on:Problem description
On Unix:
On Windows:
The problem appeared in an arrow PR unittest
apache/arrow#790
Windows tests on appveyor:
https://ci.appveyor.com/project/ApacheSoftwareFoundation/arrow/build/1.0.2169
Unix tests on travis:
https://travis-ci.org/apache/arrow/builds/248514835
Expected Output
All systems create the same dtypes. In this case int64
Output of
pd.show_versions()
INSTALLED VERSIONS
commit: None
python: 3.5.3.final.0
python-bits: 64
OS: Windows
OS-release: 2012ServerR2
machine: AMD64
processor: Intel64 Family 6 Model 63 Stepping 2, GenuineIntel
byteorder: little
LC_ALL: None
LANG: None
LOCALE: None.None
pandas: 0.20.2
pytest: 3.1.2
pip: 9.0.1
setuptools: 27.2.0
Cython: 0.25.2
numpy: 1.13.0
scipy: None
xarray: None
IPython: None
sphinx: None
patsy: None
dateutil: 2.6.0
pytz: 2017.2
blosc: None
bottleneck: None
tables: None
numexpr: None
feather: None
matplotlib: None
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: None
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: None
s3fs: None
pandas_gbq: None
pandas_datareader: None
The text was updated successfully, but these errors were encountered: