BUG: Cannot append to DataFrame with Timestamp column with non-nanosecond unit #55374

AdrianDAlessandro · 2023-10-03T15:57:48Z

Pandas version checks

I have checked that this issue has not already been reported.
I have confirmed this bug exists on the latest version of pandas.
I have confirmed this bug exists on the main branch of pandas.

Reproducible Example

import pandas as pd
df = pd.DataFrame({"time": pd.Timestamp(1513393355, unit="us"), "A":[0]}) # Note micro-second unit
df.loc[1] = df.loc[0] # <-- This raises a ValueError for incompatible shapes

Issue Description

Appending a row to a DataFrame with .loc is broken when one of the columns contains a timestamp that has units that are not nanoseconds. Attempting to do so raises a ValueError.

Expected Behavior

The new row should be appended to the DataFrame successfully. The below example works:

import pandas as pd
df = pd.DataFrame({"time": pd.Timestamp(1513393355, unit="ns"), "A":[0]}) # Note nano-second unit
df.loc[1] = df.loc[0]

Installed Versions

INSTALLED VERSIONS

commit : 77bc67a
python : 3.11.1.final.0
python-bits : 64
OS : Darwin
OS-release : 22.6.0
Version : Darwin Kernel Version 22.6.0: Wed Jul 5 22:22:05 PDT 2023; root:xnu-8796.141.3~6/RELEASE_ARM64_T6000
machine : arm64
processor : arm
byteorder : little
LC_ALL : None
LANG : en_US.UTF-8
LOCALE : en_US.UTF-8

pandas : 0+untagged.33100.g77bc67a.dirty
numpy : 1.24.4
pytz : 2023.3
dateutil : 2.8.2
setuptools : 65.5.0
pip : 23.2.1
Cython : 0.29.33
pytest : 7.4.0
hypothesis : 6.82.5
sphinx : 6.2.1
blosc : 1.11.1
feather : None
xlsxwriter : 3.1.2
lxml.etree : 4.9.3
html5lib : 1.1
pymysql : 1.4.6
psycopg2 : 2.9.7
jinja2 : 3.1.2
IPython : 8.14.0
pandas_datareader : None
bs4 : 4.12.2
bottleneck : 1.3.7
dataframe-api-compat: None
fastparquet : 2023.7.0
fsspec : 2023.6.0
gcsfs : 2023.6.0
matplotlib : 3.7.2
numba : 0.57.1
numexpr : 2.8.5
odfpy : None
openpyxl : 3.1.2
pandas_gbq : None
pyarrow : 12.0.1
pyreadstat : 1.2.2
python-calamine : None
pyxlsb : 1.0.10
s3fs : 2023.6.0
scipy : 1.11.2
sqlalchemy : 2.0.20
tables : None
tabulate : 0.9.0
xarray : 2023.7.0
xlrd : 2.0.1
zstandard : 0.21.0
tzdata : 2023.3
qtpy : None
pyqt5 : None

The text was updated successfully, but these errors were encountered:

There is a bug in pandas pandas-dev/pandas#55374

jadelsoufi · 2023-10-03T17:22:39Z

I could reproduce the issue and found that the error disappears if the timestamp is higher to pandas.Timestamp.max that is equal to Timestamp('2262-04-11 23:47:16.854775807'), whatever the units ('m', 's', 'ms',...)

so for example the below code with 'us' units works fine:

df = pd.DataFrame({"time": pd.Timestamp(9.3e15, unit="us"), "A":[0]})
df.loc[1] = df.loc[0]

I know it doesn't help too much but it can give some direction for the investigation.

AdrianDAlessandro · 2023-10-03T20:43:05Z

I should say that when I was digging into this error to begin with, the error occurred in the BlockManager inside the concat function. But the concat function doesn't return the same error when used directly. It seemed like the blocks are being calculated differently when assigning a Series to a DataFrame row with .loc

>>> import pandas as pd
>>> df = pd.DataFrame({"time": pd.Timestamp(1513393355, unit="us"), "A":[0]})
>>> pd.concat([df,df])
                        time  A
0 1970-01-01 00:25:13.393355  0
0 1970-01-01 00:25:13.393355  0
>>> df.loc[1] = df.loc[0]
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/.../.venv/lib/python3.10/site-packages/pandas/core/indexing.py", line 885, in __setitem__
    iloc._setitem_with_indexer(indexer, value, self.name)
  File "/.../.venv/lib/python3.10/site-packages/pandas/core/indexing.py", line 1883, in _setitem_with_indexer
    self._setitem_with_indexer_missing(indexer, value)
  File "/.../.venv/lib/python3.10/site-packages/pandas/core/indexing.py", line 2241, in _setitem_with_indexer_missing
    self.obj._mgr = self.obj._append(value)._mgr
  File "/.../.venv/lib/python3.10/site-packages/pandas/core/frame.py", line 10227, in _append
    result = concat(
  File "/.../.venv/lib/python3.10/site-packages/pandas/core/reshape/concat.py", line 393, in concat
    return op.get_result()
  File "/.../.venv/lib/python3.10/site-packages/pandas/core/reshape/concat.py", line 680, in get_result
    new_data = concatenate_managers(
  File "/.../.venv/lib/python3.10/site-packages/pandas/core/internals/concat.py", line 199, in concatenate_managers
    return BlockManager(tuple(blocks), axes)
  File "/.../.venv/lib/python3.10/site-packages/pandas/core/internals/managers.py", line 916, in __init__
    self._verify_integrity()
  File "/.../.venv/lib/python3.10/site-packages/pandas/core/internals/managers.py", line 923, in _verify_integrity
    raise_construction_error(tot_items, block.shape[1:], self.axes)
  File "/.../.venv/lib/python3.10/site-packages/pandas/core/internals/managers.py", line 2118, in raise_construction_error
    raise ValueError(f"Shape of passed values is {passed}, indices imply {implied}")
ValueError: Shape of passed values is (1, 2), indices imply (2, 2)

jadelsoufi · 2023-10-09T13:09:54Z

currently investigating concat.py files, dtype: datetime64[ns] is always assigned to self.objs[1]._mgr whatever the units of the pd.Timestamp() instance, in reshape.concat.py line 665 :

            for obj in self.objs:
                indexers = {}
                for ax, new_labels in enumerate(self.new_axes):
                    # ::-1 to convert BlockManager ax to DataFrame ax
                    if ax == self.bm_axis:
                        # Suppress reindexing on concat axis
                        continue

                    # 1-ax to convert BlockManager axis to DataFrame axis
                    obj_labels = obj.axes[1 - ax]
                    if not new_labels.equals(obj_labels):
                        indexers[ax] = obj_labels.get_indexer(new_labels)

                mgrs_indexers.append((obj._mgr, indexers))

jadelsoufi · 2023-10-09T15:37:11Z

take

maxdolle · 2023-10-10T06:47:20Z

Not necessarily a duplicate but pretty similar to #55067

rachtsingh · 2023-10-24T17:07:20Z

I think I have a similar issue, with pandas 2.1.1 and pyarrow 13.0.0:

ipdb> for c in inputs:
    print(c.dtypes)

ts       datetime64[us, UTC]
ident                 object
val                   object
dtype: object
ts       datetime64[us, UTC]
ident                 object
val                   object
dtype: object
ts       datetime64[ns, UTC]
ident         string[python]
val                  Float32
dtype: object
ts       datetime64[us, UTC]
ident         string[python]
val                  Float64
dtype: object
ts       datetime64[us, UTC]
ident                 object
val                  Float64
dtype: object
ts       datetime64[us, UTC]
ident                 object
val                  Float64
dtype: object

ipdb> print(pd.concat(inputs).dtypes)
ts       object
ident    object
val      object
dtype: object

Let me know if I should open a new issue because it's substantially different; I know some of these have already been addressed in main (and I didn't have a chance to check main) so didn't want to spam.

rachtsingh · 2023-10-24T17:08:22Z

Ah, this is a bug in pyarrow==13.0.0, downgrading to 12.0.0 solves that.

jbrockmendel · 2023-11-17T23:06:56Z

No longer raises on main, possibly closed by #53641?

mroeschke · 2024-01-26T22:51:27Z

Looks to have been addressed by #53641 so closing

AdrianDAlessandro added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Oct 3, 2023

AdrianDAlessandro added a commit to ImperialCollegeLondon/gridlington-datahub that referenced this issue Oct 3, 2023

Explicitly use nanosecond units when creating opal dataframe

8689baf

There is a bug in pandas pandas-dev/pandas#55374

AdrianDAlessandro added a commit to ImperialCollegeLondon/gridlington-datahub that referenced this issue Oct 3, 2023

Explicitly use nanosecond units when creating opal dataframe

34b4a29

There is a bug in pandas pandas-dev/pandas#55374

AdrianDAlessandro added a commit to ImperialCollegeLondon/gridlington-datahub that referenced this issue Oct 3, 2023

Explicitly use nanosecond units when creating opal dataframe

2d9f3d5

There is a bug in pandas pandas-dev/pandas#55374

github-actions bot assigned jadelsoufi Oct 9, 2023

This was referenced Oct 24, 2023

Bugfix 55374 concatenate timestamp issue #55660

Closed

BUG: shape issue when concatenating different datetime resolution (#55374) #55661

Closed

rachtsingh mentioned this issue Oct 24, 2023

GH-36311: [C++] Fix integer overflows in utf8_slice_codeunits apache/arrow#36575

Merged

jbrockmendel mentioned this issue Oct 24, 2023

BUG: pd.concat dataframes with different datetime64 resolutions #53641

Merged

4 tasks

jorisvandenbossche added Non-Nano datetime64/timedelta64 with non-nanosecond resolution and removed Needs Triage Issue that has not been reviewed by a pandas team member labels Oct 26, 2023

jorisvandenbossche added this to the 2.1.3 milestone Oct 26, 2023

jorisvandenbossche modified the milestones: 2.1.3, 2.1.4 Nov 13, 2023

lithomas1 modified the milestones: 2.1.4, 2.2 Dec 8, 2023

lithomas1 modified the milestones: 2.2, 2.2.1 Jan 20, 2024

mroeschke closed this as completed Jan 26, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG: Cannot append to DataFrame with Timestamp column with non-nanosecond unit #55374

BUG: Cannot append to DataFrame with Timestamp column with non-nanosecond unit #55374

AdrianDAlessandro commented Oct 3, 2023

INSTALLED VERSIONS

jadelsoufi commented Oct 3, 2023

AdrianDAlessandro commented Oct 3, 2023

jadelsoufi commented Oct 9, 2023 •

edited

Loading

jadelsoufi commented Oct 9, 2023

maxdolle commented Oct 10, 2023

rachtsingh commented Oct 24, 2023

rachtsingh commented Oct 24, 2023

jbrockmendel commented Nov 17, 2023

mroeschke commented Jan 26, 2024

BUG: Cannot append to DataFrame with Timestamp column with non-nanosecond unit #55374

BUG: Cannot append to DataFrame with Timestamp column with non-nanosecond unit #55374

Comments

AdrianDAlessandro commented Oct 3, 2023

Pandas version checks

Reproducible Example

Issue Description

Expected Behavior

Installed Versions

INSTALLED VERSIONS

jadelsoufi commented Oct 3, 2023

AdrianDAlessandro commented Oct 3, 2023

jadelsoufi commented Oct 9, 2023 • edited Loading

jadelsoufi commented Oct 9, 2023

maxdolle commented Oct 10, 2023

rachtsingh commented Oct 24, 2023

rachtsingh commented Oct 24, 2023

jbrockmendel commented Nov 17, 2023

mroeschke commented Jan 26, 2024

jadelsoufi commented Oct 9, 2023 •

edited

Loading