Skip to content

TypeError when setting with enlargement of timezoned Timestamp to an empty DataFrame #16044

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
Dmitrii-I opened this issue Apr 18, 2017 · 4 comments
Labels
Bug Indexing Related to indexing on series/frames, not to indexes themselves Timezones Timezone data dtype

Comments

@Dmitrii-I
Copy link

This is an obscure edge case, but I feel it needs to be reported.

For an empty dataframe with 2 or more columns, setting with enlargment of a row that includes a timezoned timestamp, fails.

from pandas import DataFrame, Timestamp

# works:
df = DataFrame(columns=['a', 'b'])
df.loc[len(df)] = [Timestamp.now(), 55]

# does not work:
df = DataFrame(columns=['a', 'b'])
df.loc[len(df)] = [Timestamp.now(tz='Europe/Paris'), 55]

Traceback (most recent call last):
  File "/usr/local/lib/python3.4/dist-packages/IPython/core/interactiveshell.py", line 3066, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "<ipython-input-53-b144415de186>", line 1, in <module>
    df.loc[len(df)] = [Timestamp.now(tz='Europe/Paris'), 55]
  File "/usr/local/lib/python3.4/dist-packages/pandas/core/indexing.py", line 141, in __setitem__
    self._setitem_with_indexer(indexer, value)
  File "/usr/local/lib/python3.4/dist-packages/pandas/core/indexing.py", line 385, in _setitem_with_indexer
    self.obj._data = self.obj.append(value)._data
  File "/usr/local/lib/python3.4/dist-packages/pandas/core/frame.py", line 4435, in append
    verify_integrity=verify_integrity)
  File "/usr/local/lib/python3.4/dist-packages/pandas/tools/merge.py", line 1452, in concat
    return op.get_result()
  File "/usr/local/lib/python3.4/dist-packages/pandas/tools/merge.py", line 1650, in get_result
    copy=self.copy)
  File "/usr/local/lib/python3.4/dist-packages/pandas/core/internals.py", line 4825, in concatenate_block_managers
    placement=placement) for placement, join_units in concat_plan]
  File "/usr/local/lib/python3.4/dist-packages/pandas/core/internals.py", line 4825, in <listcomp>
    placement=placement) for placement, join_units in concat_plan]
  File "/usr/local/lib/python3.4/dist-packages/pandas/core/internals.py", line 4922, in concatenate_join_units
    for ju in join_units]
  File "/usr/local/lib/python3.4/dist-packages/pandas/core/internals.py", line 4922, in <listcomp>
    for ju in join_units]
  File "/usr/local/lib/python3.4/dist-packages/pandas/core/internals.py", line 5195, in get_reindexed_values
    missing_arr = np.empty(self.shape, dtype=empty_dtype)
TypeError: data type not understood

Output of pd.show_versions()

pd.show_versions()
INSTALLED VERSIONS
------------------
commit: None
python: 3.4.3.final.0
python-bits: 64
OS: Linux
OS-release: 3.19.0-66-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8
pandas: 0.19.2
nose: None
pip: 9.0.1
setuptools: 32.3.1
Cython: 0.24.1
numpy: 1.12.1
scipy: 0.17.1
statsmodels: 0.6.1
xarray: None
IPython: 4.0.0
sphinx: None
patsy: 0.4.1
dateutil: 2.6.0
pytz: 2017.2
blosc: None
bottleneck: None
tables: 3.2.2
numexpr: 2.4.6
matplotlib: 1.5.1
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: 0.999
httplib2: 0.9.2
apiclient: 1.6.1
sqlalchemy: None
pymysql: None
psycopg2: 2.6.1 (dt dec pq3 ext lo64)
jinja2: 2.8
boto: None
pandas_datareader: None

@jreback
Copy link
Contributor

jreback commented Apr 18, 2017

yeah, though we had already had an issue for this. You are welcome to dive in.

@jreback jreback added Bug Difficulty Intermediate Indexing Related to indexing on series/frames, not to indexes themselves Timezones Timezone data dtype labels Apr 18, 2017
@jreback jreback added this to the Next Major Release milestone Apr 18, 2017
@mroeschke
Copy link
Member

xref #12985. Appending a dataframe with a tz-aware column has the same traceback.

@kstohr
Copy link

kstohr commented May 20, 2018

If you are encountering this issue and need a quick workaround (as I did), try temporarily setting a timestamp to the table missing timstamps. Then set the values to np.nan or None after concating.

Look forward to a permanent fix.

#create timestamp
ts = pd.Timestamp.now(tz='UTC')

#set timestamp
new_df.loc[:, 'created_at'] = ts
new_df.loc[:, 'updated_at'] = ts

#concat df's
new_df = pd.concat([old_df, new_df], ignore_index =True) 

#set to null or None 
new_df.loc[<new_rows_index>, 'created_at'] = None 
new_df.loc[<new_rows_index>, 'updated_at'] = None 

@WillAyd
Copy link
Member

WillAyd commented May 20, 2018

@kstohr thanks for the tip but this is now working on master. See #21014

@WillAyd WillAyd closed this as completed May 20, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Indexing Related to indexing on series/frames, not to indexes themselves Timezones Timezone data dtype
Projects
None yet
Development

No branches or pull requests

5 participants