-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
Assign back converted multiple columns to datetime failed #20511
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
It's not entirely clear what you're trying to do. For
you can use In [15]: df2 = df.apply(pd.to_datetime, errors='coerce')
In [16]: df2.dtypes
Out[16]:
date0 datetime64[ns]
date1 datetime64[ns]
date2 datetime64[ns]
date3 datetime64[ns]
date4 datetime64[ns]
date5 datetime64[ns]
dtype: object |
@TomAugspurger - soory, maybe miss:
It converting nice if assign to new DataFrame, but failed if assign to subset, then datetimes columns are converting to unix datetime? Or something else? |
I'm not sure I understand. You might want to try searching stackoverflow
under the pandas tag: https://stackoverflow.com/questions/tagged/pandas
…On Thu, Mar 29, 2018 at 12:44 AM, jesrael ***@***.***> wrote:
@TomAugspurger <https://github.com/TomAugspurger> - soory, maybe miss:
But assign back convert datetimes to unix dates. Also I test loc and same problem.
It converting nice if assign to new DataFrame, but failed if assign to
subset, then datetimes columns are converting to unix datetime? Or
something else?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#20511 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ABQHIrIAYurh0yc7sJ_ykjEtqZzAP1q5ks5tjHShgaJpZM4S-GLA>
.
|
@TomAugspurger - Hmmm, I am user https://stackoverflow.com/users/2901002/jezrael I try explain more:
I get:
and I expected:
|
@TomAugspurger This really looks like a bug. Let's see if I can explain it more clearly. The bug presents in two ways:
This violates expectations in two ways:
To demonstrate, let's set up a six-column DataFrame. The leftmost column will stay as-is, for side-by-side comparison; the rest will have We'll recreate the bug in test_df = (pd.DataFrame(np.random.choice(['2016-01-01'], size=(2, 6)))
.add_prefix('date')
.rename(columns={
'date0': 'original',
'date1': 'apply_multislice1',
'date2': 'apply_multislice2',
'date3': 'apply_unislice',
'date4': 'apply_noslice',
'date5': 'assigned'}))
Bug demonstrationI'll quickly demonstrate the case that succeeds without assignment: assert (test_df.loc[:, ['apply_multislice1', 'apply_multislice2']].apply(pd.to_datetime) == pd.to_datetime('2016-01-01')).all(axis=None) If it's assigned back to itself, however, it changes value to a nanosecond timestamp that still has dtype # Assignment
test_df.loc[:, ['apply_multislice1', 'apply_multislice2']] = test_df.loc[:, ['apply_multislice1', 'apply_multislice2']].apply(pd.to_datetime)
# Succeeds, but shouldn't:
assert (test_df.loc[:, ['apply_multislice1', 'apply_multislice2']] == 1451606400000000000).all(axis=None)
assert test_df.loc[:, ['apply_multislice1', 'apply_multislice2']].dtypes.isin([object]).all()
# Fails, but shouldn't:
assert (test_df.loc[:, ['apply_multislice1', 'apply_multislice2']] == pd.to_datetime('2016-01-01')).all(axis=None)
assert test_df.loc[:, ['apply_multislice1', 'apply_multislice2']].dtypes.isin(['datetime64']).all() Variations without bugI could think of three other ways to do the datetime conversion - directly, and through test_df.loc[:, 'apply_unislice'] = test_df.loc[:, ['apply_unislice']].apply(pd.to_datetime)
test_df.loc[:, 'apply_noslice'] = test_df.loc[:, 'apply_noslice'].apply(pd.to_datetime)
test_df.loc[:, 'assigned'] = pd.to_datetime(test_df.loc[:, 'assigned'])
# Succeeds, as expected:
assert (test_df.loc[:, ['apply_unislice', 'apply_noslice', 'assigned']] == pd.to_datetime('2016-01-01')).all(axis=None)
assert not test_df.loc[:, ['apply_unislice', 'apply_noslice', 'assigned']].dtypes.isin([object]).any(axis=None) Final outcomeAll five rightmost columns should have the same content and dtype, but by now we know they don't.
Recovery from the bugIf the bugged outcome in # Continuing where we left off: Succeeds, but shouldn't:
assert (test_df.loc[:, ['apply_multislice1', 'apply_multislice2']] == 1451606400000000000).all(axis=None)
# Call pd.to_datetime(unit='ns')
test_df.loc[:, ['apply_multislice1', 'apply_multislice2']] = test_df.loc[:, ['apply_multislice1', 'apply_multislice2']].apply(pd.to_datetime, unit='ns')
# Operates as expected
assert (test_df.loc[:, ['apply_multislice1', 'apply_multislice2']] == pd.to_datetime('2016-01-01')).all(axis=None) Environment
|
@jesrael sorry I missed the fact that assignment was the buggy part. In [48]: df = pd.DataFrame({"A": ['2015-01-01', '2015-01-02'], 'B': ['2015', '2016']})
In [49]: df2 = df.copy()
In [50]: df2.iloc[:, [0]] = pd.DataFrame({"A": pd.to_datetime(['2015', '2016'])})
In [51]: df2
Out[51]:
A B
0 1420070400000000000 2015
1 1451606400000000000 2016 A few observations: This doesn't occur when all of the columns are being updated: In [64]: df2.iloc[:, [0, 1]] = pd.DataFrame({"A": pd.to_datetime(['2015', '2016']), 'B': pd.to_datetime(['2015', '2016'])})
In [65]: df2
Out[65]:
A B
0 2015-01-01 2015-01-01
1 2016-01-01 2016-01-01 The internal blocks are unsurprisingly incorrect In [69]: df2.iloc[:, [0]] = pd.DataFrame({"A": pd.to_datetime(['2015', '2016'])})
In [70]: df2._data
Out[70]:
BlockManager
Items: Index(['A', 'B'], dtype='object')
Axis 1: RangeIndex(start=0, stop=2, step=1)
ObjectBlock: slice(0, 2, 1), 2 x 2, dtype: object we'd like to split that object block so that the newly assigned column becomes a DatetimeBlock. |
This looks okay on master. Could use a test
|
take |
This tests make sure when converting multiple columns to datetimes and when assiging back it remains as datetime not as unix date as mentioned in GH pandas-dev#20511.
This tests make sure when converting multiple columns to datetimes and when assiging back it remains as datetime not as unix date as mentioned in GH pandas-dev#20511.
This tests make sure when converting multiple columns to datetimes and when assiging back it remains as datetime not as unix date as mentioned in GH pandas-dev#20511.
This tests make sure when converting multiple columns to datetimes and when assiging back it remains as datetime not as unix date as mentioned in GH pandas-dev#20511.
I want convert multiple columns to datetimes:
I test converting:
But assign back convert datetimes to unix dates. Also I test
loc
and same problem.The text was updated successfully, but these errors were encountered: