Skip to content

df.loc modifies datetime columns #28837

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
bdepardo opened this issue Oct 8, 2019 · 5 comments · Fixed by #28964
Closed

df.loc modifies datetime columns #28837

bdepardo opened this issue Oct 8, 2019 · 5 comments · Fixed by #28964
Labels
good first issue Needs Tests Unit test(s) needed to prevent regressions
Milestone

Comments

@bdepardo
Copy link

bdepardo commented Oct 8, 2019

Code Sample, a copy-pastable example if possible

# Your code here
>>> df = pd.DataFrame.from_dict({"date": [1485264372711, 1485265925110, 1540215845888, 1540282121025]})
>>> df["date_dt"] = pd.to_datetime(df["date"], unit='ms', cache=True)
>>> df
            date                 date_dt
0  1485264372711 2017-01-24 13:26:12.711
1  1485265925110 2017-01-24 13:52:05.110
2  1540215845888 2018-10-22 13:44:05.888
3  1540282121025 2018-10-23 08:08:41.025
>>> df.loc[:, "date_dt_cp"] = df.loc[:, "date_dt"]
>>> df
            date                 date_dt              date_dt_cp
0  1485264372711 2017-01-24 13:26:12.711 2017-01-24 13:26:12.711
1  1485265925110 2017-01-24 13:52:05.110 2017-01-24 13:52:05.110
2  1540215845888 2018-10-22 13:44:05.888 2018-10-22 13:44:05.888
3  1540282121025 2018-10-23 08:08:41.025 2018-10-23 08:08:41.025
>>> df.loc[[2,3], "date_dt_cp"] = df.loc[[2,3], "date_dt"]
>>> df
            date                 date_dt                    date_dt_cp
0  1485264372711 2017-01-24 13:26:12.711 2017-01-24 13:26:12.711000064
1  1485265925110 2017-01-24 13:52:05.110 2017-01-24 13:52:05.110000128
2  1540215845888 2018-10-22 13:44:05.888 2018-10-22 13:44:05.888000000
3  1540282121025 2018-10-23 08:08:41.025 2018-10-23 08:08:41.024999936

Problem description

Using .loc[] on datetime columns to assign values modifies the dates.
When .loc[] is used on all the lines (df.loc[:, "date_dt_cp"] = df.loc[:, "date_dt"]) the dates are unchanged

>>> df
            date                 date_dt              date_dt_cp
0  1485264372711 2017-01-24 13:26:12.711 2017-01-24 13:26:12.711
1  1485265925110 2017-01-24 13:52:05.110 2017-01-24 13:52:05.110
2  1540215845888 2018-10-22 13:44:05.888 2018-10-22 13:44:05.888
3  1540282121025 2018-10-23 08:08:41.025 2018-10-23 08:08:41.025

but when selecting only a subset of lines (df.loc[[2,3], "date_dt_cp"] = df.loc[[2,3], "date_dt"]), the values of the dates are changed:

>>> df
            date                 date_dt                    date_dt_cp
0  1485264372711 2017-01-24 13:26:12.711 2017-01-24 13:26:12.711000064
1  1485265925110 2017-01-24 13:52:05.110 2017-01-24 13:52:05.110000128
2  1540215845888 2018-10-22 13:44:05.888 2018-10-22 13:44:05.888000000
3  1540282121025 2018-10-23 08:08:41.025 2018-10-23 08:08:41.024999936

Expected Output

The last assignment in the example above shouldn't update the values:

            date                 date_dt              date_dt_cp
0  1485264372711 2017-01-24 13:26:12.711 2017-01-24 13:26:12.711
1  1485265925110 2017-01-24 13:52:05.110 2017-01-24 13:52:05.110
2  1540215845888 2018-10-22 13:44:05.888 2018-10-22 13:44:05.888
3  1540282121025 2018-10-23 08:08:41.025 2018-10-23 08:08:41.025

Output of pd.show_versions()

INSTALLED VERSIONS ------------------ commit : None python : 3.7.3.final.0 python-bits : 64 OS : Darwin OS-release : 18.7.0 machine : x86_64 processor : i386 byteorder : little LC_ALL : None LANG : fr_FR.UTF-8 LOCALE : fr_FR.UTF-8

pandas : 0.25.1
numpy : 1.17.2
pytz : 2019.2
dateutil : 2.8.0
pip : 19.1.1
setuptools : 41.0.1
Cython : 0.29.10
pytest : 3.7.4
hypothesis : None
sphinx : None
blosc : None
feather : None
xlsxwriter : None
lxml.etree : None
html5lib : None
pymysql : None
psycopg2 : 2.8.3 (dt dec pq3 ext lo64)
jinja2 : 2.10.1
IPython : 7.3.0
pandas_datareader: None
bs4 : None
bottleneck : 1.2.1
fastparquet : None
gcsfs : None
lxml.etree : None
matplotlib : 3.0.0
numexpr : 2.7.0
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : None
pytables : None
s3fs : None
scipy : 1.3.0
sqlalchemy : None
tables : None
xarray : None
xlrd : None
xlwt : 1.3.0
xlsxwriter : None

@mroeschke
Copy link
Member

mroeschke commented Oct 9, 2019

I am not getting the same result on master, but I did get the same result as you on 0.25.1.

It must have been fixed in the meantime and could use a regression test. Care to contribute a test?

In [4]: df
Out[4]:
            date                 date_dt
0  1485264372711 2017-01-24 13:26:12.711
1  1485265925110 2017-01-24 13:52:05.110
2  1540215845888 2018-10-22 13:44:05.888
3  1540282121025 2018-10-23 08:08:41.025

In [5]: df.loc[:, "date_dt_cp"] = df.loc[:, "date_dt"]

In [6]: df
Out[6]:
            date                 date_dt              date_dt_cp
0  1485264372711 2017-01-24 13:26:12.711 2017-01-24 13:26:12.711
1  1485265925110 2017-01-24 13:52:05.110 2017-01-24 13:52:05.110
2  1540215845888 2018-10-22 13:44:05.888 2018-10-22 13:44:05.888
3  1540282121025 2018-10-23 08:08:41.025 2018-10-23 08:08:41.025

In [7]: df.loc[[2,3], "date_dt_cp"] = df.loc[[2,3], "date_dt"]

In [8]: df
Out[8]:
            date                 date_dt              date_dt_cp
0  1485264372711 2017-01-24 13:26:12.711 2017-01-24 13:26:12.711
1  1485265925110 2017-01-24 13:52:05.110 2017-01-24 13:52:05.110
2  1540215845888 2018-10-22 13:44:05.888 2018-10-22 13:44:05.888
3  1540282121025 2018-10-23 08:08:41.025 2018-10-23 08:08:41.025

@mroeschke mroeschke added good first issue Needs Tests Unit test(s) needed to prevent regressions labels Oct 9, 2019
@bdepardo
Copy link
Author

bdepardo commented Oct 9, 2019

Thanks.
For the test, why not, if you guide me on where to write it.
Shall it be in pandas/tests/indexing/test_loc.py?

@mroeschke
Copy link
Member

Sure. A new test pandas/tests/indexing/test_loc.py sounds good.

@bdepardo
Copy link
Author

Thanks @rohitsanj

@bdepardo
Copy link
Author

FYI I tested with 0.25.2 and the bug is still there.
The bugfix must be in a commit for a future release

mroeschke pushed a commit that referenced this issue Oct 29, 2019
* TST: added test for df.loc modifies datetime columns

Issue number #28837

* ran black pandas command

* MAINT: Address reviewer comments
Reksbril pushed a commit to Reksbril/pandas that referenced this issue Nov 18, 2019
* TST: added test for df.loc modifies datetime columns

Issue number pandas-dev#28837

* ran black pandas command

* MAINT: Address reviewer comments
proost pushed a commit to proost/pandas that referenced this issue Dec 19, 2019
* TST: added test for df.loc modifies datetime columns

Issue number pandas-dev#28837

* ran black pandas command

* MAINT: Address reviewer comments
proost pushed a commit to proost/pandas that referenced this issue Dec 19, 2019
* TST: added test for df.loc modifies datetime columns

Issue number pandas-dev#28837

* ran black pandas command

* MAINT: Address reviewer comments
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue Needs Tests Unit test(s) needed to prevent regressions
Projects
None yet
3 participants