Skip to content

Converting float64 values to datetime64[ns] format using pd.to_datetime results in NaT if errors='coerce' #13180

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
lvphj opened this issue May 15, 2016 · 2 comments
Labels
Bug Datetime Datetime data dtype
Milestone

Comments

@lvphj
Copy link

lvphj commented May 15, 2016

Code Sample, a copy-pastable example if possible

A Pandas dataframe contains a date variable that is formatted as float64 as follows:

tempDF = pd.DataFrame({'group': ['a','b'],
                       'date': [1.434692e+18,1.432766e+18]})

print(tempDF)

Which appears as:

           date group
0  1.434692e+18     a
1  1.432766e+18     b

The date variables can be formatted to datetime64[ns] as follows:

tempDF['date'] = pd.to_datetime(tempDF['date'],format=None)
print(tempDF)

Which works as expected:

                 date group
0 2015-06-19 05:33:20     a
1 2015-05-27 22:33:20     b

However, if errors='coerce' is included in the do_datetime() function, the dates are returned as NaT.

tempDF['date'] = pd.to_datetime(tempDF['date'],format=None,errors='coerce')
print(tempDF)

Which produces the following:

  date group
0  NaT     a
1  NaT     b

This issue was identified as a result of a reported bug (#12821 ) where datetime64[ns] columns are returned as float64 following agg() function where all dates in a group are NaT. Converting columns back to datetime64[ns] format is necessary. However, identifying valid numeric values as errors when errors='coerce' and returning NaT values results in unnecessary data-loss.

Expected Output

                 date group
0 2015-06-19 05:33:20     a
1 2015-05-27 22:33:20     b

output of pd.show_versions()

INSTALLED VERSIONS
------------------
commit: None
python: 3.4.1.final.0
python-bits: 64
OS: Darwin
OS-release: 15.4.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: None
LANG: en_GB.UTF-8

pandas: 0.18.1
nose: None
pip: 1.5.6
setuptools: 20.1.1
Cython: None
numpy: 1.11.0
scipy: 0.16.1
statsmodels: None
xarray: None
IPython: 4.1.1
sphinx: None
patsy: None
dateutil: 2.5.3
pytz: 2016.4
blosc: None
bottleneck: None
tables: None
numexpr: None
matplotlib: 1.5.0
openpyxl: 2.3.2
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: None
httplib2: None
apiclient: None
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: 2.8
boto: None
pandas_datareader: None
@lvphj lvphj changed the title Converting integer values to datetime results in NaT if errors='coerce' Converting float64 values to datetime64[ns] format using pd.to_datetime results in NaT if errors='coerce' May 15, 2016
@jreback jreback added Bug Datetime Datetime data dtype labels May 15, 2016
@jreback jreback added this to the 0.18.2 milestone May 15, 2016
@jreback
Copy link
Contributor

jreback commented May 15, 2016

fixed in #13183

yeah I had previously removed the path for unit processing from array_to_datetime as was subsumed by array_with_unit_to_datetime. But I also removed the default unit to None from ns (IOW it would only trigger if passed). So this was an oversite.

thanks for the report.

@lvphj
Copy link
Author

lvphj commented May 15, 2016

Thanks for the rapid response!

jreback added a commit to jreback/pandas that referenced this issue May 18, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Datetime Datetime data dtype
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants