Skip to content

BUG: melt changes type of tz-aware columns #15785

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
stigviaene opened this issue Mar 23, 2017 · 1 comment · Fixed by #20292
Closed

BUG: melt changes type of tz-aware columns #15785

stigviaene opened this issue Mar 23, 2017 · 1 comment · Fixed by #20292
Labels
Bug Dtype Conversions Unexpected or buggy dtype conversions Reshaping Concat, Merge/Join, Stack/Unstack, Explode Timezones Timezone data dtype
Milestone

Comments

@stigviaene
Copy link

Code Samples

import pandas as pd
frame = pd.DataFrame({'klass':range(5), 'ts': [pd.Timestamp('2017-03-23 08:22:42.173378+01'), pd.Timestamp('2017-03-23 08:22:42.178578+01'), pd.Timestamp('2017-03-23 08:22:42.173578+01'), pd.Timestamp('2017-03-23 08:22:42.178378+01'), pd.Timestamp('2017-03-23 08:22:42.163378+01')], 'attribute':['att1', 'att2', 'att3', 'att4', 'att5'], 'value': ['a', 'b', 'c', 'd', 'd']})
# At this point, frame.ts is of dtype datetime64[ns, pytz.FixedOffset(60)]
frame.set_index(['ts', 'klass'], inplace=True)
queried_index = frame.query('value=="d"').index
pivoted_frame = frame.reset_index().pivot_table(index=['klass', 'ts'], columns='attribute', values='value', aggfunc='first')
melted_frame = pd.melt(pivoted_frame.reset_index(), id_vars=['klass', 'ts'], var_name='attribute', value_name='value')
# At this point, melted_frame.ts is of dtype datetime64[ns]
queried_after_melted_index = melted_frame.query('value=="d"').set_index(['ts', 'klass']).index
frame.loc[queried_index]  # Works
frame.loc[queried_index] = 'test'  # Works
frame.loc[queried_after_melted_index]  # Works
frame.loc[queried_after_melted_index] = 'test'  # Breaks

The last statement gives:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python3.5/dist-packages/pandas/core/indexing.py", line 140, in __setitem__
    indexer = self._get_setitem_indexer(key)
  File "/usr/local/lib/python3.5/dist-packages/pandas/core/indexing.py", line 127, in _get_setitem_indexer
    return self._convert_to_indexer(key, is_setter=True)
  File "/usr/local/lib/python3.5/dist-packages/pandas/core/indexing.py", line 1230, in _convert_to_indexer
    raise KeyError('%s not in index' % objarr[mask])
KeyError: "MultiIndex(levels=[[2017-03-23 07:22:42.163378, 2017-03-23 07:22:42.173378, 2017-03-23 07:22:42.173578, 2017-03-23 07:22:42.178378, 2017-03-23 07:22:42.178578], [0, 1, 2, 3, 4]],\n           labels=[[3, 0], [3, 4]],\n           names=['ts', 'klass']) not in index"

Problem description

  • It is counter-intuitive that any operation (which does not explicitly mention in its docs that it does) alters the type of any column.
  • Also counter-intuitive is that frame.loc has different behavior in a statement than it has in an assignment.

Expected Output

  • melted_frame.ts and frame.ts have the same dtype.
  • DataFrame.loc fails in both cases, not just in an assignment, or succeeds in both.

Output of pd.show_versions()

INSTALLED VERSIONS ------------------ commit: None python: 3.5.2.final.0 python-bits: 64 OS: Linux OS-release: 4.4.0-66-generic machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: en_US.UTF-8

pandas: 0.19.2
nose: None
pip: 9.0.1
setuptools: 20.7.0
Cython: None
numpy: 1.12.0
scipy: None
statsmodels: None
xarray: None
IPython: 5.3.0
sphinx: None
patsy: None
dateutil: 2.6.0
pytz: 2016.10
blosc: None
bottleneck: None
tables: None
numexpr: None
matplotlib: None
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: 0.7.3
lxml: 3.5.0
bs4: 4.4.1
html5lib: 0.999
httplib2: 0.9.1
apiclient: None
sqlalchemy: None
pymysql: None
psycopg2: 2.6.1 (dt dec pq3 ext lo64)
jinja2: 2.8
boto: None
pandas_datareader: None

@jreback
Copy link
Contributor

jreback commented Mar 23, 2017

@stigviaene .melt doesn't have the battery of tests that most other things have. So not suprising that this doesn't convert correctly. Welcome to have you submit a patch to fix or at least see if you can locate the problem.

your comments on indexing are orthogonal. If you have a specific bug/comment you can raise in another issue.

@jreback jreback added Bug Difficulty Intermediate Dtype Conversions Unexpected or buggy dtype conversions Reshaping Concat, Merge/Join, Stack/Unstack, Explode Timezones Timezone data dtype labels Mar 23, 2017
@jreback jreback added this to the Next Major Release milestone Mar 23, 2017
@jreback jreback changed the title melt changes type of timestamp columns BUG: melt changes type of timestamp columns Mar 23, 2017
@jreback jreback changed the title BUG: melt changes type of timestamp columns BUG: melt changes type of tz-aware columns Mar 23, 2017
@jreback jreback modified the milestones: Next Major Release, 0.23.0 Mar 12, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Dtype Conversions Unexpected or buggy dtype conversions Reshaping Concat, Merge/Join, Stack/Unstack, Explode Timezones Timezone data dtype
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants