Skip to content

df.loc modifies datetime columns #28837

Closed
@bdepardo

Description

@bdepardo

Code Sample, a copy-pastable example if possible

# Your code here
>>> df = pd.DataFrame.from_dict({"date": [1485264372711, 1485265925110, 1540215845888, 1540282121025]})
>>> df["date_dt"] = pd.to_datetime(df["date"], unit='ms', cache=True)
>>> df
            date                 date_dt
0  1485264372711 2017-01-24 13:26:12.711
1  1485265925110 2017-01-24 13:52:05.110
2  1540215845888 2018-10-22 13:44:05.888
3  1540282121025 2018-10-23 08:08:41.025
>>> df.loc[:, "date_dt_cp"] = df.loc[:, "date_dt"]
>>> df
            date                 date_dt              date_dt_cp
0  1485264372711 2017-01-24 13:26:12.711 2017-01-24 13:26:12.711
1  1485265925110 2017-01-24 13:52:05.110 2017-01-24 13:52:05.110
2  1540215845888 2018-10-22 13:44:05.888 2018-10-22 13:44:05.888
3  1540282121025 2018-10-23 08:08:41.025 2018-10-23 08:08:41.025
>>> df.loc[[2,3], "date_dt_cp"] = df.loc[[2,3], "date_dt"]
>>> df
            date                 date_dt                    date_dt_cp
0  1485264372711 2017-01-24 13:26:12.711 2017-01-24 13:26:12.711000064
1  1485265925110 2017-01-24 13:52:05.110 2017-01-24 13:52:05.110000128
2  1540215845888 2018-10-22 13:44:05.888 2018-10-22 13:44:05.888000000
3  1540282121025 2018-10-23 08:08:41.025 2018-10-23 08:08:41.024999936

Problem description

Using .loc[] on datetime columns to assign values modifies the dates.
When .loc[] is used on all the lines (df.loc[:, "date_dt_cp"] = df.loc[:, "date_dt"]) the dates are unchanged

>>> df
            date                 date_dt              date_dt_cp
0  1485264372711 2017-01-24 13:26:12.711 2017-01-24 13:26:12.711
1  1485265925110 2017-01-24 13:52:05.110 2017-01-24 13:52:05.110
2  1540215845888 2018-10-22 13:44:05.888 2018-10-22 13:44:05.888
3  1540282121025 2018-10-23 08:08:41.025 2018-10-23 08:08:41.025

but when selecting only a subset of lines (df.loc[[2,3], "date_dt_cp"] = df.loc[[2,3], "date_dt"]), the values of the dates are changed:

>>> df
            date                 date_dt                    date_dt_cp
0  1485264372711 2017-01-24 13:26:12.711 2017-01-24 13:26:12.711000064
1  1485265925110 2017-01-24 13:52:05.110 2017-01-24 13:52:05.110000128
2  1540215845888 2018-10-22 13:44:05.888 2018-10-22 13:44:05.888000000
3  1540282121025 2018-10-23 08:08:41.025 2018-10-23 08:08:41.024999936

Expected Output

The last assignment in the example above shouldn't update the values:

            date                 date_dt              date_dt_cp
0  1485264372711 2017-01-24 13:26:12.711 2017-01-24 13:26:12.711
1  1485265925110 2017-01-24 13:52:05.110 2017-01-24 13:52:05.110
2  1540215845888 2018-10-22 13:44:05.888 2018-10-22 13:44:05.888
3  1540282121025 2018-10-23 08:08:41.025 2018-10-23 08:08:41.025

Output of pd.show_versions()

INSTALLED VERSIONS ------------------ commit : None python : 3.7.3.final.0 python-bits : 64 OS : Darwin OS-release : 18.7.0 machine : x86_64 processor : i386 byteorder : little LC_ALL : None LANG : fr_FR.UTF-8 LOCALE : fr_FR.UTF-8

pandas : 0.25.1
numpy : 1.17.2
pytz : 2019.2
dateutil : 2.8.0
pip : 19.1.1
setuptools : 41.0.1
Cython : 0.29.10
pytest : 3.7.4
hypothesis : None
sphinx : None
blosc : None
feather : None
xlsxwriter : None
lxml.etree : None
html5lib : None
pymysql : None
psycopg2 : 2.8.3 (dt dec pq3 ext lo64)
jinja2 : 2.10.1
IPython : 7.3.0
pandas_datareader: None
bs4 : None
bottleneck : 1.2.1
fastparquet : None
gcsfs : None
lxml.etree : None
matplotlib : 3.0.0
numexpr : 2.7.0
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : None
pytables : None
s3fs : None
scipy : 1.3.0
sqlalchemy : None
tables : None
xarray : None
xlrd : None
xlwt : 1.3.0
xlsxwriter : None

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions