Skip to content

REGR: copy() within apply() raises ValueError: cannot create a DatetimeTZBlock without a tz, as of 1.0.0 #31505

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
DomKennedy opened this issue Jan 31, 2020 · 0 comments · Fixed by #31614
Labels
Regression Functionality that used to work in a prior pandas version
Milestone

Comments

@DomKennedy
Copy link

DomKennedy commented Jan 31, 2020

Code Sample, a copy-pastable example if possible

Minimal example:

import pandas as pd

df = pd.DataFrame({"foo": [pd.Timestamp("2020", tz="UTC")]}, dtype="object")
df.apply(lambda col: col.copy())  # raises exception below

Real-life usage:

def filter_dataframe_by_dict(df, filters):
    """
    Filter the specified dataframe to only those rows which match the specified filters

    Parameters
    ----------
    df : pd.DataFrame
    
    filters : Mapping
        dict, keyed by a subset of `df.columns`

    Returns
    -------
    pd.DataFrame
       Same columns as `df`, including only those rows which match `filters` on all specified values.
    """
    filters = pd.Series(filters, dtype="object")
    mask = df[filters.index].apply(
        # astype("object") calls `copy()` internally, and is necessary to ensure dtype-agnostic 
        # comparisons.
        lambda row: row.astype("object").equals(filters), axis="columns"
    )
    return df[mask]

records = pd.DataFrame(columns = ["foo", "bar", "baz"])

records.loc[0] = {"foo": pd.Timestamp("2019", tz="UTC"), "bar": 1, "baz": 6.283}
records.loc[1] = {"foo": pd.Timestamp("2020", tz="UTC"), "bar": 2, "baz": 6.283}

filters = {"foo": pd.Timestamp("2020", tz="UTC")}

filter_dataframe_by_dict(records, filters)  # raises below exception

Exception:

Traceback (most recent call last):
  File "pandas_bug.py", line 29, in <module>
    df.apply(lambda col: col.copy())
  File ".venv/lib/python3.6/site-packages/pandas/core/frame.py", line 6875, in apply
    return op.get_result()
  File ".venv/lib/python3.6/site-packages/pandas/core/apply.py", line 186, in get_result
    return self.apply_standard()
  File ".venv/lib/python3.6/site-packages/pandas/core/apply.py", line 296, in apply_standard
    values, self.f, axis=self.axis, dummy=dummy, labels=labels
  File "pandas/_libs/reduction.pyx", line 617, in pandas._libs.reduction.compute_reduction
  File "pandas/_libs/reduction.pyx", line 127, in pandas._libs.reduction.Reducer.get_result
  File "pandas_bug.py", line 29, in <lambda>
    df.apply(lambda row: row.copy(), axis="columns")
  File ".venv/lib/python3.6/site-packages/pandas/core/generic.py", line 5810, in copy
    data = self._data.copy(deep=deep)
  File ".venv/lib/python3.6/site-packages/pandas/core/internals/managers.py", line 794, in copy
    res = self.apply("copy", deep=deep)
  File ".venv/lib/python3.6/site-packages/pandas/core/internals/managers.py", line 442, in apply
    applied = getattr(b, f)(**kwargs)
  File ".venv/lib/python3.6/site-packages/pandas/core/internals/blocks.py", line 696, in copy
    return self.make_block_same_class(values, ndim=self.ndim)
  File ".venv/lib/python3.6/site-packages/pandas/core/internals/blocks.py", line 281, in make_block_same_class
    return make_block(values, placement=placement, ndim=ndim, klass=type(self))
  File ".venv/lib/python3.6/site-packages/pandas/core/internals/blocks.py", line 3028, in make_block
    return klass(values, ndim=ndim, placement=placement)
  File ".venv/lib/python3.6/site-packages/pandas/core/internals/blocks.py", line 1723, in __init__
    values = self._maybe_coerce_values(values)
  File ".venv/lib/python3.6/site-packages/pandas/core/internals/blocks.py", line 2306, in _maybe_coerce_values
    raise ValueError("cannot create a DatetimeTZBlock without a tz")
ValueError: cannot create a DatetimeTZBlock without a tz

Problem description

Essentially, the problem arises when .copy() is called from with an .apply call, on a row/column of a DataFrame which:

  • has dtype object
  • consists solely of tz-aware Timestamp objects

While this seems like a fairly artificial set of conditions, as the "real world example" above is intended to demonstrate, it can indeed occur "organically". Appending rows to an initially empty DataFrame results in the dtype defaulting to object for all columns, so if the filters passed to filter_dataframe_by_dict happen to only consist of timestamp-valued columns, the error conditions are met.

Note that the error only seems to arise in .apply calls; making the same calls on the rows return using iterrrows() or iloc works just fine.

This problem only occurs as of the latest pandas 1.0.0 release.

Output of pd.show_versions()

INSTALLED VERSIONS
------------------
commit           : None
python           : 3.6.8.final.0
python-bits      : 64
OS               : Linux
OS-release       : 4.15.0-74-generic
machine          : x86_64
processor        : x86_64
byteorder        : little
LC_ALL           : None
LANG             : en_GB.UTF-8
LOCALE           : en_GB.UTF-8

pandas           : 1.0.0
numpy            : 1.17.4
pytz             : 2019.3
dateutil         : 2.8.1
pip              : 19.3.1
setuptools       : 41.6.0
Cython           : None
pytest           : 5.3.0
hypothesis       : None
sphinx           : None
blosc            : None
feather          : None
xlsxwriter       : None
lxml.etree       : None
html5lib         : None
pymysql          : None
psycopg2         : None
jinja2           : None
IPython          : None
pandas_datareader: None
bs4              : None
bottleneck       : None
fastparquet      : None
gcsfs            : None
lxml.etree       : None
matplotlib       : None
numexpr          : None
odfpy            : None
openpyxl         : None
pandas_gbq       : None
pyarrow          : None
pytables         : None
pytest           : 5.3.0
pyxlsb           : None
s3fs             : None
scipy            : None
sqlalchemy       : 1.3.11
tables           : None
tabulate         : None
xarray           : None
xlrd             : None
xlwt             : None
xlsxwriter       : None
numba            : None
@jorisvandenbossche jorisvandenbossche added the Regression Functionality that used to work in a prior pandas version label Feb 1, 2020
@jorisvandenbossche jorisvandenbossche added this to the 1.0.1 milestone Feb 1, 2020
@jorisvandenbossche jorisvandenbossche changed the title copy() within apply() raises ValueError: cannot create a DatetimeTZBlock without a tz, as of 1.0.0 REGR: copy() within apply() raises ValueError: cannot create a DatetimeTZBlock without a tz, as of 1.0.0 Feb 4, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Regression Functionality that used to work in a prior pandas version
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants