Skip to content

BUG: Reindexing two tz-aware indices drops tz on the target index when tolerance and method is specified for only "ffill" and "bfill" #38566

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
2 of 3 tasks
ketozhang opened this issue Dec 18, 2020 · 3 comments · Fixed by #39095
Labels
Bug Regression Functionality that used to work in a prior pandas version Reshaping Concat, Merge/Join, Stack/Unstack, Explode Timezones Timezone data dtype
Milestone

Comments

@ketozhang
Copy link

ketozhang commented Dec 18, 2020

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • (optional) I have confirmed this bug exists on the master branch of pandas.


Code Sample

df = pd.DataFrame({'value': [0, 1, 2, 3]},
                  index=[pd.Timestamp('2020-01-01 05:00:00+0000', tz='UTC'),
                         pd.Timestamp('2020-01-01 06:00:00+0000', tz='UTC'),
                         pd.Timestamp('2020-01-01 07:00:00+0000', tz='UTC'),
                         pd.Timestamp('2020-01-01 08:00:00+0000', tz='UTC')
                         ]
                  )

new_index = pd.Series([pd.Timestamp('2020-01-01 5:30:00+0000', tz='UTC'),
                       pd.Timestamp('2020-01-01 6:30:00+0000', tz='UTC'),
                       pd.Timestamp('2020-01-01 7:30:00+0000', tz='UTC'),
                       pd.Timestamp('2020-01-01 8:30:00+0000', tz='UTC'),
                       pd.Timestamp('2020-01-01 9:30:00+0000', tz='UTC')]
                      )

new_df = df.reindex(new_index, method="ffill", tolerance=pd.Timedelta("1 hour"))

Problem description

The following exception is raised when method is "ffill" and "bfill" but not "nearest" (see #32740) AND tolerance is specified

TypeError: DatetimeArray subtraction must have the same timezones or no timezones

I found the timezone was dropped when reaching this function on lines 3024 and 3036

target_values = target._get_engine_target()
if self.is_monotonic_increasing and target.is_monotonic_increasing:
engine_method = (
self._engine.get_pad_indexer
if method == "pad"
else self._engine.get_backfill_indexer
)
indexer = engine_method(target_values, limit)
else:
indexer = self._get_fill_indexer_searchsorted(target, method, limit)
if tolerance is not None:
indexer = self._filter_indexer_tolerance(target_values, indexer, tolerance)

where target is the target index that's tz-aware. However once converted to target_values, the tz info disappears from the numpy array.

I found a working solution but unsure if this behavior affects any other parts functionalities

- self._filter_indexer_tolerance(target_values, indexer, tolerance)
+ self._filter_indexer_tolerance(target, indexer, tolerance)

Expected Output

                           value
2020-01-01 05:30:00+00:00    0.0
2020-01-01 06:30:00+00:00    1.0
2020-01-01 07:30:00+00:00    2.0
2020-01-01 08:30:00+00:00    3.0
2020-01-01 09:30:00+00:00    NaN

Output of pd.show_versions()

INSTALLED VERSIONS
------------------
commit           : b5958ee1999e9aead1938c0bba2b674378807b3d
python           : 3.7.6.final.0
python-bits      : 64
OS               : Linux
OS-release       : 4.19.0-11-cloud-amd64
Version          : #1 SMP Debian 4.19.146-1 (2020-09-17)
machine          : x86_64
processor        : 
byteorder        : little
LC_ALL           : None
LANG             : C.UTF-8
LOCALE           : en_US.UTF-8

pandas           : 1.1.5
numpy            : 1.19.2
pytz             : 2020.1
dateutil         : 2.8.1
pip              : 19.2.3
setuptools       : 41.2.0
Cython           : None
pytest           : 6.1.1
hypothesis       : None
sphinx           : None
blosc            : None
feather          : None
xlsxwriter       : None
lxml.etree       : None
html5lib         : None
pymysql          : None
psycopg2         : 2.8.6 (dt dec pq3 ext lo64)
jinja2           : 2.11.2
IPython          : 7.18.1
pandas_datareader: None
bs4              : None
bottleneck       : None
fsspec           : 0.8.4
fastparquet      : 0.4.1
gcsfs            : 0.7.1
matplotlib       : 3.3.3
numexpr          : None
odfpy            : None
openpyxl         : None
pandas_gbq       : None
pyarrow          : 1.0.1
pytables         : None
pyxlsb           : None
s3fs             : None
scipy            : 1.5.3
sqlalchemy       : 1.3.20
tables           : None
tabulate         : None
xarray           : None
xlrd             : 1.2.0
xlwt             : None
numba            : 0.51.2
@ketozhang ketozhang added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Dec 18, 2020
@ketozhang
Copy link
Author

ketozhang commented Dec 18, 2020

A simpler example

df = pd.DataFrame(
    {'value': [0, 1, 2, 3]}, 
    index=pd.date_range('2020-01-01 00:00:00', periods=4, freq='H', tz="UTC")
)
new_index = pd.date_range('2020-01-01 00:01:00', periods=4, freq='H', tz="UTC")
new_df = df.reindex(new_index, method="ffill", tolerance=pd.Timedelta("1 hour"))

@simonjayhawkins
Copy link
Member

Thanks @ketozhang for the report.

pandas-0.25.3 was giving the expected output, so will label as regression pending further investigation.

@simonjayhawkins simonjayhawkins added Regression Functionality that used to work in a prior pandas version Reshaping Concat, Merge/Join, Stack/Unstack, Explode Timezones Timezone data dtype and removed Needs Triage Issue that has not been reviewed by a pandas team member labels Dec 18, 2020
@simonjayhawkins simonjayhawkins added this to the Contributions Welcome milestone Dec 18, 2020
@jreback jreback modified the milestones: Contributions Welcome, 1.3 Jan 11, 2021
@ketozhang
Copy link
Author

Great work, thanks @phofl

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Regression Functionality that used to work in a prior pandas version Reshaping Concat, Merge/Join, Stack/Unstack, Explode Timezones Timezone data dtype
Projects
None yet
3 participants