-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
Fixed read_csv with CategoricalDtype with boolean categories (20498) #20826
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fixed read_csv with CategoricalDtype with boolean categories (20498) #20826
Conversation
6b5cc4f
to
5495551
Compare
Codecov Report
@@ Coverage Diff @@
## master #20826 +/- ##
==========================================
+ Coverage 92.29% 92.29% +<.01%
==========================================
Files 161 161
Lines 51497 51502 +5
==========================================
+ Hits 47530 47535 +5
Misses 3967 3967
Continue to review full report at Codecov.
|
I don't think I'll have time before the RC, and we're simply fixing a bug
in the existing code.
I'd like to keep the just merge bug fixes with the RC before the final, so
your call if you want to push to 0.23.1.
…On Fri, Apr 27, 2018 at 6:56 AM, Jeff Reback ***@***.***> wrote:
***@***.**** commented on this pull request.
------------------------------
In pandas/core/arrays/categorical.py
<#20826 (comment)>:
> @@ -523,6 +527,11 @@ def _from_inferred_categories(cls, inferred_categories, inferred_codes,
cats = to_datetime(inferred_categories, errors='coerce')
elif is_timedelta64_dtype(dtype.categories):
cats = to_timedelta(inferred_categories, errors='coerce')
+ elif dtype.categories.is_boolean():
this isn't necessary for the RC. this needs to be fixed in this PR.
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#20826 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ABQHIilori4wAlmJJnIwNQm7XMGDPaHpks5tswdqgaJpZM4TkOVV>
.
|
then push to 0.23.1 (alternatively as @jorisvandenbossche mentoined, can make a 0.23rc milestone tag) |
@TomAugspurger do you have time to look at this one? (I don't know how far off it is) |
With pandas 0.23, I have the following code where the csv is correctly generated (to_csv) but the read_csv (with appropriate dtypes) will fill the column with NaN
does this PR address that ? |
Possibly. If you can provide a unit test I can include it or you can make a followup PR including it. |
... around to_csv/read_csv and pd.api.types.CategoricalDtype for instance using pd.api.types.CategoricalDtype(categories=ConnectionRoles, ordered=True) to_csv will correctly write the column but read_csv will fail expecting a string pandas-dev/pandas#20826. I ended up writing my own converter
IIRC this was fine, can you rebase |
Hello @TomAugspurger! Thanks for updating the PR.
|
this looks good; haven’t looked at what is failing |
@TomAugspurger can you merge master |
@gfyoung if you'd like to rebase this, maybe can get this in |
Yikes! Looks like this one got lost in the myriad of other PR's. Sure thing. |
Previously, was being parsed as object instead of boolean. Closes pandas-devgh-20498. Original Author: @TomAugspurger Rebased by @gfyoung due to merge conflicts.
a1f64ff
to
f96a854
Compare
@jreback @TomAugspurger : Rebased this PR. Wanted to note a couple of things (also helps to address many of the outstanding comments that were still current):
One extra thing I had to do was check for
|
looks ok to me. @TomAugspurger if you'd have a look. |
@TomAugspurger : Any thoughts on this? I think we're waiting on your review before merging. |
Sorry for the delay, thanks for rebasing and fixing things up. |
Previously, was being parsed as object instead of boolean. Closes pandas-devgh-20498. Original Author: @TomAugspurger Rebased by @gfyoung due to merge conflicts.
Previously, was being parsed as object instead of boolean. Closes pandas-devgh-20498. Original Author: @TomAugspurger Rebased by @gfyoung due to merge conflicts.
git diff upstream/master -u -- "*.py" | flake8 --diff