Assign back converted multiple columns to datetime failed #20511

jesrael · 2018-03-28T06:42:09Z

I want convert multiple columns to datetimes:

df = pd.DataFrame(np.random.choice(['2015-01-01','2016-01-01'], size=(5,6))).add_prefix('date')
print (df)
        date0       date1       date2       date3       date4       date5
0  2015-01-01  2016-01-01  2015-01-01  2015-01-01  2016-01-01  2016-01-01
1  2016-01-01  2016-01-01  2016-01-01  2015-01-01  2016-01-01  2015-01-01
2  2015-01-01  2015-01-01  2016-01-01  2016-01-01  2016-01-01  2015-01-01
3  2016-01-01  2015-01-01  2015-01-01  2015-01-01  2015-01-01  2015-01-01
4  2015-01-01  2016-01-01  2015-01-01  2016-01-01  2016-01-01  2015-01-01

i= range(0,6,2)
df.iloc[:,i] = df.iloc[:,i].apply(lambda x: pd.to_datetime(x, errors='coerce'))
print (df.dtypes)
date0    object
date1    object
date2    object
date3    object
date4    object
date5    object
dtype: object

print (df)
                 date0       date1                date2       date3  \
0  1420070400000000000  2016-01-01  1420070400000000000  2015-01-01   
1  1451606400000000000  2016-01-01  1451606400000000000  2015-01-01   
2  1420070400000000000  2015-01-01  1451606400000000000  2016-01-01   
3  1451606400000000000  2015-01-01  1420070400000000000  2015-01-01   
4  1420070400000000000  2016-01-01  1420070400000000000  2016-01-01   

                 date4       date5  
0  1451606400000000000  2016-01-01  
1  1451606400000000000  2015-01-01  
2  1451606400000000000  2015-01-01  
3  1420070400000000000  2015-01-01  
4  1451606400000000000  2015-01-01

I test converting:

print (df.iloc[:,i].apply(lambda x: pd.to_datetime(x, errors='coerce')).dtypes)
date0    datetime64[ns]
date2    datetime64[ns]
date4    datetime64[ns]
dtype: object

But assign back convert datetimes to unix dates. Also I test loc and same problem.

print (pd.show_versions())

INSTALLED VERSIONS
------------------
commit: None
python: 3.5.1.final.0
python-bits: 64
OS: Windows
OS-release: 7
machine: AMD64
processor: Intel64 Family 6 Model 60 Stepping 3, GenuineIntel
byteorder: little
LC_ALL: None
LANG: sk_SK
LOCALE: None.None

pandas: 0.22.0
pytest: 2.8.5
pip: 9.0.1
setuptools: 38.4.0
Cython: 0.23.4
numpy: 1.12.1
scipy: 0.19.0
pyarrow: 0.7.0
xarray: None
IPython: 4.1.2
sphinx: 1.3.1
patsy: 0.4.1
dateutil: 2.5.1
pytz: 2016.2
blosc: None
bottleneck: 1.2.1
tables: 3.2.2
numexpr: 2.6.2
feather: 0.4.0
matplotlib: 2.0.0
openpyxl: 2.4.8
xlrd: 0.9.4
xlwt: 1.0.0
xlsxwriter: 0.8.4
lxml: 3.6.0
bs4: 4.6.0
html5lib: 0.999
sqlalchemy: 1.0.12
pymysql: None
psycopg2: None
jinja2: 2.8
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: 0.4.0
None

The text was updated successfully, but these errors were encountered:

TomAugspurger · 2018-03-28T19:55:10Z

It's not entirely clear what you're trying to do. For

I want convert multiple columns to datetimes:

you can use apply and to_datetime

In [15]: df2 = df.apply(pd.to_datetime, errors='coerce')

In [16]: df2.dtypes
Out[16]:
date0    datetime64[ns]
date1    datetime64[ns]
date2    datetime64[ns]
date3    datetime64[ns]
date4    datetime64[ns]
date5    datetime64[ns]
dtype: object

jesrael · 2018-03-29T05:43:56Z

@TomAugspurger - soory, maybe miss:

But assign back convert datetimes to unix dates. Also I test loc and same problem.

It converting nice if assign to new DataFrame, but failed if assign to subset, then datetimes columns are converting to unix datetime? Or something else?

TomAugspurger · 2018-03-29T10:38:06Z

I'm not sure I understand. You might want to try searching stackoverflow under the pandas tag: https://stackoverflow.com/questions/tagged/pandas

…

On Thu, Mar 29, 2018 at 12:44 AM, jesrael ***@***.***> wrote: @TomAugspurger <https://github.com/TomAugspurger> - soory, maybe miss: But assign back convert datetimes to unix dates. Also I test loc and same problem. It converting nice if assign to new DataFrame, but failed if assign to subset, then datetimes columns are converting to unix datetime? Or something else? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#20511 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ABQHIrIAYurh0yc7sJ_ykjEtqZzAP1q5ks5tjHShgaJpZM4S-GLA> .

jesrael · 2018-03-29T12:12:24Z

@TomAugspurger - Hmmm, I am user https://stackoverflow.com/users/2901002/jezrael

I try explain more:

df = pd.DataFrame(np.random.choice(['2015-01-01','2016-01-01'], size=(5,6))).add_prefix('date')
print (df)
        date0       date1       date2       date3       date4       date5
0  2015-01-01  2016-01-01  2015-01-01  2015-01-01  2016-01-01  2016-01-01
1  2016-01-01  2016-01-01  2016-01-01  2015-01-01  2016-01-01  2015-01-01
2  2015-01-01  2015-01-01  2016-01-01  2016-01-01  2016-01-01  2015-01-01
3  2016-01-01  2015-01-01  2015-01-01  2015-01-01  2015-01-01  2015-01-01
4  2015-01-01  2016-01-01  2015-01-01  2016-01-01  2016-01-01  2015-01-01

i= range(0,6,2)
df.iloc[:,i] = df.iloc[:,i].apply(lambda x: pd.to_datetime(x, errors='coerce'))

I get:

print (df.dtypes)
date0    object
date1    object
date2    object
date3    object
date4    object
date5    object
dtype: object

and I expected:

print (df.dtypes)
date0    datetime64[ns]
date1    object
date2    datetime64[ns]
date3    object
date4    datetime64[ns]
date5    object
dtype: object

shippy · 2018-11-29T01:19:20Z

@TomAugspurger This really looks like a bug. Let's see if I can explain it more clearly.

The bug presents in two ways:

.apply(pd.to_datetime) called on a multi-column slice converts the columns to datetime64 after the call, but not during the assignment to the same multi-column slice.
Further, assignment of the result of multi-column .apply(pd.to_datetime) transforms the datetime string to a nanosecond timestamp.

This violates expectations in two ways:

The column doesn't look as expected, and there's no hint as to how to get the expected format back.
The column doesn't behave as expected, e.g. cannot be subtracted from other datetime columns,

To demonstrate, let's set up a six-column DataFrame. The leftmost column will stay as-is, for side-by-side comparison; the rest will have pd.to_datetime called upon them in various forms. All five rightmost columns should all be datetime64 and equal to one another by the end of the exercise. Only the three rightmost will.

We'll recreate the bug in apply_multislice1 and apply_multislice2.

test_df = (pd.DataFrame(np.random.choice(['2016-01-01'], size=(2, 6)))
        .add_prefix('date')
        .rename(columns={
            'date0': 'original',
            'date1': 'apply_multislice1',
            'date2': 'apply_multislice2',
            'date3': 'apply_unislice',
            'date4': 'apply_noslice',
            'date5': 'assigned'}))

     original apply_multislice1 apply_multislice2 apply_unislice apply_noslice    assigned
0  2016-01-01        2016-01-01        2016-01-01     2016-01-01    2016-01-01  2016-01-01
1  2016-01-01        2016-01-01        2016-01-01     2016-01-01    2016-01-01  2016-01-01

Bug demonstration

I'll quickly demonstrate the case that succeeds without assignment:

assert (test_df.loc[:, ['apply_multislice1', 'apply_multislice2']].apply(pd.to_datetime) == pd.to_datetime('2016-01-01')).all(axis=None)

If it's assigned back to itself, however, it changes value to a nanosecond timestamp that still has dtype object:

# Assignment
test_df.loc[:, ['apply_multislice1', 'apply_multislice2']] = test_df.loc[:, ['apply_multislice1', 'apply_multislice2']].apply(pd.to_datetime)

# Succeeds, but shouldn't:
assert (test_df.loc[:, ['apply_multislice1', 'apply_multislice2']] == 1451606400000000000).all(axis=None)
assert test_df.loc[:, ['apply_multislice1', 'apply_multislice2']].dtypes.isin([object]).all()

# Fails, but shouldn't:
assert (test_df.loc[:, ['apply_multislice1', 'apply_multislice2']] == pd.to_datetime('2016-01-01')).all(axis=None)
assert test_df.loc[:, ['apply_multislice1', 'apply_multislice2']].dtypes.isin(['datetime64']).all()

Variations without bug

I could think of three other ways to do the datetime conversion - directly, and through .apply with different slicing:

test_df.loc[:, 'apply_unislice'] = test_df.loc[:, ['apply_unislice']].apply(pd.to_datetime)
test_df.loc[:, 'apply_noslice'] = test_df.loc[:, 'apply_noslice'].apply(pd.to_datetime)
test_df.loc[:, 'assigned'] = pd.to_datetime(test_df.loc[:, 'assigned'])

# Succeeds, as expected:
assert (test_df.loc[:, ['apply_unislice', 'apply_noslice', 'assigned']] == pd.to_datetime('2016-01-01')).all(axis=None)
assert not test_df.loc[:, ['apply_unislice', 'apply_noslice', 'assigned']].dtypes.isin([object]).any(axis=None)

Final outcome

All five rightmost columns should have the same content and dtype, but by now we know they don't.

     original    apply_multislice1    apply_multislice2 apply_unislice apply_noslice   assigned
0  2016-01-01  1451606400000000000  1451606400000000000     2016-01-01    2016-01-01 2016-01-01
1  2016-01-01  1451606400000000000  1451606400000000000     2016-01-01    2016-01-01 2016-01-01
original                     object
apply_multislice1            object
apply_multislice2            object
apply_unislice       datetime64[ns]
apply_noslice        datetime64[ns]
assigned             datetime64[ns]

Recovery from the bug

If the bugged outcome in apply_multislice* is converted as nanoseconds and assigned, pd.to_datetime will change dtype correctly, so at least the value wasn't lost.

# Continuing where we left off: Succeeds, but shouldn't:
assert (test_df.loc[:, ['apply_multislice1', 'apply_multislice2']] == 1451606400000000000).all(axis=None)
# Call pd.to_datetime(unit='ns')
test_df.loc[:, ['apply_multislice1', 'apply_multislice2']] = test_df.loc[:, ['apply_multislice1', 'apply_multislice2']].apply(pd.to_datetime, unit='ns')
# Operates as expected
assert (test_df.loc[:, ['apply_multislice1', 'apply_multislice2']] == pd.to_datetime('2016-01-01')).all(axis=None)

Environment

INSTALLED VERSIONS
------------------
commit: None
python: 2.7.12.final.0
python-bits: 64
OS: Linux
OS-release: 4.4.0-137-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: None.None

pandas: 0.23.4
pytest: 4.0.0
pip: 18.1
setuptools: 40.6.2
Cython: None
numpy: 1.15.4
scipy: 1.1.0
pyarrow: None
xarray: None
IPython: 5.8.0
sphinx: None
patsy: None
dateutil: 2.7.5
pytz: 2018.7
blosc: None
bottleneck: None
tables: None
numexpr: None
feather: None
matplotlib: 2.2.3
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: None
sqlalchemy: 1.2.14
pymysql: None
psycopg2: None
jinja2: 2.10
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None

TomAugspurger · 2018-11-29T02:19:31Z

@jesrael sorry I missed the fact that assignment was the buggy part.

In [48]: df = pd.DataFrame({"A": ['2015-01-01', '2015-01-02'], 'B': ['2015', '2016']})

In [49]: df2 = df.copy()

In [50]: df2.iloc[:, [0]] = pd.DataFrame({"A": pd.to_datetime(['2015', '2016'])})

In [51]: df2
Out[51]:
                     A     B
0  1420070400000000000  2015
1  1451606400000000000  2016

A few observations:

This doesn't occur when all of the columns are being updated:

In [64]: df2.iloc[:, [0, 1]] = pd.DataFrame({"A": pd.to_datetime(['2015', '2016']), 'B': pd.to_datetime(['2015', '2016'])})

In [65]: df2
Out[65]:
           A          B
0 2015-01-01 2015-01-01
1 2016-01-01 2016-01-01

The internal blocks are unsurprisingly incorrect

In [69]: df2.iloc[:, [0]] = pd.DataFrame({"A": pd.to_datetime(['2015', '2016'])})

In [70]: df2._data
Out[70]:
BlockManager
Items: Index(['A', 'B'], dtype='object')
Axis 1: RangeIndex(start=0, stop=2, step=1)
ObjectBlock: slice(0, 2, 1), 2 x 2, dtype: object

we'd like to split that object block so that the newly assigned column becomes a DatetimeBlock.

pandas-dev/pandas#20511

mroeschke · 2021-06-19T17:31:51Z

This looks okay on master. Could use a test

In [13]: In [48]: df = pd.DataFrame({"A": ['2015-01-01', '2015-01-02'], 'B': ['2015', '2016']})
    ...:
    ...: In [49]: df2 = df.copy()
    ...:
    ...: In [50]: df2.iloc[:, [0]] = pd.DataFrame({"A": pd.to_datetime(['2015', '2016'])})

In [14]: df2
Out[14]:
                     A     B
0  2015-01-01 00:00:00  2015
1  2016-01-01 00:00:00  2016

parthi-siva · 2022-01-27T04:39:02Z

take

This tests make sure when converting multiple columns to datetimes and when assiging back it remains as datetime not as unix date as mentioned in GH pandas-dev#20511.

TomAugspurger closed this as completed Mar 28, 2018

TomAugspurger added the Usage Question label Mar 28, 2018

TomAugspurger added this to the No action milestone Mar 28, 2018

TomAugspurger reopened this Nov 29, 2018

TomAugspurger modified the milestones: No action, Contributions Welcome Nov 29, 2018

TomAugspurger added Indexing Related to indexing on series/frames, not to indexes themselves Datetime Datetime data dtype Dtype Conversions Unexpected or buggy dtype conversions Difficulty Advanced and removed Usage Question labels Nov 29, 2018

shippy added a commit to ABCD-STUDY/fitbit-pipeline that referenced this issue Dec 13, 2018

Move pandas bug documentation to the pandas issue

60838ac

pandas-dev/pandas#20511

shippy added a commit to ABCD-STUDY/fitbit-pipeline that referenced this issue Dec 13, 2018

Move pandas bug documentation to the pandas issue

6989ce3

pandas-dev/pandas#20511

jbrockmendel removed Effort Medium labels Oct 21, 2019

mroeschke added the Bug label Apr 1, 2020

phofl mentioned this issue Nov 11, 2020

BUG: Bug in loc did not change dtype when complete column was assigned #37749

Closed

8 tasks

mroeschke added good first issue and removed Bug Dtype Conversions Unexpected or buggy dtype conversions Indexing Related to indexing on series/frames, not to indexes themselves Datetime Datetime data dtype labels Jun 19, 2021

mroeschke added the Needs Tests Unit test(s) needed to prevent regressions label Jun 19, 2021

github-actions bot assigned parthi-siva Jan 27, 2022

parthi-siva mentioned this issue Mar 18, 2022

TST: Additional tests to iloc setitem #46419

Closed

4 tasks

jreback modified the milestones: Contributions Welcome, 1.5 Mar 18, 2022

jreback added the Indexing Related to indexing on series/frames, not to indexes themselves label Mar 18, 2022

parthi-siva mentioned this issue May 10, 2022

TST: Assign back multiple column to datetime #46982

Merged

3 tasks

jreback closed this as completed in #46982 Jun 5, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Assign back converted multiple columns to datetime failed #20511

Assign back converted multiple columns to datetime failed #20511

jesrael commented Mar 28, 2018

TomAugspurger commented Mar 28, 2018

jesrael commented Mar 29, 2018

TomAugspurger commented Mar 29, 2018 via email

jesrael commented Mar 29, 2018

shippy commented Nov 29, 2018

TomAugspurger commented Nov 29, 2018

mroeschke commented Jun 19, 2021

parthi-siva commented Jan 27, 2022

Assign back converted multiple columns to datetime failed #20511

Assign back converted multiple columns to datetime failed #20511

Comments

jesrael commented Mar 28, 2018

TomAugspurger commented Mar 28, 2018

jesrael commented Mar 29, 2018

TomAugspurger commented Mar 29, 2018 via email

jesrael commented Mar 29, 2018

shippy commented Nov 29, 2018

Bug demonstration

Variations without bug

Final outcome

Recovery from the bug

Environment

TomAugspurger commented Nov 29, 2018

mroeschke commented Jun 19, 2021

parthi-siva commented Jan 27, 2022