Skip to content

BUG Decode to UTF-8 the dtype string read from a hdf file #31756

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Feb 9, 2020
Merged

BUG Decode to UTF-8 the dtype string read from a hdf file #31756

merged 1 commit into from
Feb 9, 2020

Conversation

pedroreys
Copy link
Contributor

Fixes GH31750

The dtype value wasn't being decoded to UTF-8 when reading a DataFrame
from a hdf file. This was a problem when reading a hdf that was
created from python 2 with a fixed format as the dtype was being read as b'datetime'
instead of datetime, which caused HDFStore to read the data as
int64 instead of coercing it to the correct datetime64 dtype.

@pedroreys pedroreys requested a review from jreback February 6, 2020 20:31
Copy link
Member

@WillAyd WillAyd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor nit on whatsnew otherwise lgtm

@WillAyd WillAyd added the IO HDF5 read_hdf, HDFStore label Feb 6, 2020
Fixes GH31750

The dtype value wasn't being decoded to `UTF-8` when reading a DataFrame
from a hdf file. This was a problem when reading a hdf that was
created from python 2 with a fixed format as the dtype was being read as `b'datetime'`
instead of `datetime`, which caused `HDFStore` to read the data as
`int64` instead of coercing it to the correct `datetime64` dtype.

move doc to right file
Copy link
Member

@WillAyd WillAyd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm @jreback

@jreback jreback added this to the 1.1 milestone Feb 9, 2020
Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm. thanks @pedroreys

note this might close some older issues, if you'd have a look. (comment on the issue if this is the case; we might need additional validation tests).

@jreback jreback merged commit a96bdbd into pandas-dev:master Feb 9, 2020
@jreback
Copy link
Contributor

jreback commented Feb 9, 2020

thanks

@pedroreys
Copy link
Contributor Author

thanks folks

@pedroreys
Copy link
Contributor Author

@jreback to get this fix backported to 1.0.2, do I need to open a second PR targeting that branch, or is it going to get picked up automatically by that backport bot?

@jreback
Copy link
Contributor

jreback commented Feb 10, 2020

we don’t backport regular big fixes
they come out in the next major release

1 similar comment
@jreback
Copy link
Contributor

jreback commented Feb 10, 2020

we don’t backport regular big fixes
they come out in the next major release

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
IO HDF5 read_hdf, HDFStore
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Decoding issue when reading in py3 a datetime64 hdf data that was created in py2
3 participants