Skip to content

BUG: Transpose construction/block error with certain dtypes #16362

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
RobinFiveWords opened this issue May 15, 2017 · 3 comments
Closed

BUG: Transpose construction/block error with certain dtypes #16362

RobinFiveWords opened this issue May 15, 2017 · 3 comments
Labels
Bug Regression Functionality that used to work in a prior pandas version
Milestone

Comments

@RobinFiveWords
Copy link

Code Sample, a copy-pastable example if possible

import pandas as pd
from io import StringIO

data = "dt1,obj1,obj2,obj3,num1\n,a,c,e,0\n,b,d,f,1"

df1 = pd.read_csv(StringIO(data), dtype={'num1': float}, parse_dates=['dt1'])
df2 = pd.read_csv(StringIO(data), dtype={'num1': int}, parse_dates=['dt1'])

print(df1.T)
print(df2.T)  # ValueError in v0.20.1 but not v0.19.2

Problem description

Tried to provide a minimal example here. I originally encountered the error with a dataframe containing datetime, object, and int dtypes, and triggered it with this...

df.iloc[[0, -1]].T

...and even found things like this:

df.iloc[[0, 1], [2, 3, 4, 9]].T  # okay
df.iloc[[0, 1], [2, 3, 4, 4, 9]].T  # ValueError
df.iloc[[0, 1], [2, 3, 4, 9, 4]].T  # okay (?!)

Expected Output

    0    1

dt1 NaT NaT
obj1 a b
obj2 c d
obj3 e f
num1 0 1
0 1
dt1 NaT NaT
obj1 a b
obj2 c d
obj3 e f
num1 0 1

Output of pd.show_versions()

# Paste the output here pd.show_versions() here INSTALLED VERSIONS ------------------ commit: None python: 3.6.0.final.0 python-bits: 64 OS: Windows OS-release: 7 machine: AMD64 processor: Intel64 Family 6 Model 61 Stepping 4, GenuineIntel byteorder: little LC_ALL: None LANG: None LOCALE: None.None

pandas: 0.20.1
pytest: 3.0.5
pip: 9.0.1
setuptools: 27.2.0
Cython: 0.25.2
numpy: 1.12.1
scipy: 0.19.0
xarray: None
IPython: 5.1.0
sphinx: 1.5.1
patsy: 0.4.1
dateutil: 2.6.0
pytz: 2016.10
blosc: None
bottleneck: 1.2.0
tables: 3.2.2
numexpr: 2.6.2
feather: None
matplotlib: 2.0.1
openpyxl: 2.4.1
xlrd: 1.0.0
xlwt: 1.2.0
xlsxwriter: 0.9.6
lxml: 3.7.2
bs4: 4.5.3
html5lib: None
sqlalchemy: 1.1.5
pymysql: None
psycopg2: None
jinja2: 2.9.4
s3fs: None
pandas_gbq: None
pandas_datareader: None

@chris-b1
Copy link
Contributor

This also repros with direct construction

pd.DataFrame(np.array([[pd.NaT, 'a', 'b', 0],
                       [pd.NaT, 'b', 'c', 1]]))
# IndexError: tuple index out of range

But doesn't if only one string column

pd.DataFrame(np.array([[pd.NaT, 'a', 0],
                       [pd.NaT, 'b', 1]]))
# fine!

@chris-b1 chris-b1 added Bug Regression Functionality that used to work in a prior pandas version labels May 15, 2017
@RobinFiveWords
Copy link
Author

I see some inconsistency in the maybe_infer_to_datetimelike function in pandas/core/dtypes/cast.py. v is first raveled but is only unraveled in the cases reaching line 830 or 838. When I added .reshape(shape) to end of line 840, Chris's test ran fine. What's the best approach here, to remove the reshapes in lines 830 and 838 and apply it just once, at the end of the return statement in line 865?

@chris-b1
Copy link
Contributor

@RobinFiveWords - easiest to review if you would submit a PR with that change - main thing will be that existing tests still pass.

@jreback jreback added this to the 0.20.2 milestone May 24, 2017
TomAugspurger pushed a commit to TomAugspurger/pandas that referenced this issue May 29, 2017
closes pandas-dev#16362

Author: RobinFiveWords <[email protected]>

Closes pandas-dev#16395 from RobinFiveWords/cast-infer-datetime-reshape-fix and squashes the following commits:

7ad1e7d [RobinFiveWords] redid lost changes to cast.py and test_cast.py
afa2eeb [RobinFiveWords] added whatsnew0.20.2 entry
7a35624 [RobinFiveWords] removed whatsnew entry again
2ec60a6 [RobinFiveWords] added back whatsnew change

(cherry picked from commit 05d0667)
TomAugspurger pushed a commit that referenced this issue May 30, 2017
closes #16362

Author: RobinFiveWords <[email protected]>

Closes #16395 from RobinFiveWords/cast-infer-datetime-reshape-fix and squashes the following commits:

7ad1e7d [RobinFiveWords] redid lost changes to cast.py and test_cast.py
afa2eeb [RobinFiveWords] added whatsnew0.20.2 entry
7a35624 [RobinFiveWords] removed whatsnew entry again
2ec60a6 [RobinFiveWords] added back whatsnew change

(cherry picked from commit 05d0667)
stangirala pushed a commit to stangirala/pandas that referenced this issue Jun 11, 2017
closes pandas-dev#16362

Author: RobinFiveWords <[email protected]>

Closes pandas-dev#16395 from RobinFiveWords/cast-infer-datetime-reshape-fix and squashes the following commits:

7ad1e7d [RobinFiveWords] redid lost changes to cast.py and test_cast.py
afa2eeb [RobinFiveWords] added whatsnew0.20.2 entry
7a35624 [RobinFiveWords] removed whatsnew entry again
2ec60a6 [RobinFiveWords] added back whatsnew change
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Regression Functionality that used to work in a prior pandas version
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants