-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
BUG: read_csv raising ValueError for tru_values/false_values and boolean dtype #39012
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i would instead allow passing in the true/false values into BooleanArrary._from_sequence_from_strings (and if they aren't passed, then use what is there now). and handle the conversion there rather than doing it this way.
Hm this sounds good. Will try this tomorrow or Friday. Thx. |
This is way better. To avoid special casing the boolean case, I added **kwargs to |
Passing such additional keywords to (alternative could be to try/except, catch the error, and then try again without passing the additional keywords (and potentially raising a warning so that external EAs can update to accept |
we don't have these kinds of guarantees on EA. I think we should accept them. |
pandas/pandas/core/arrays/base.py Lines 213 to 237 in 510bc20
(and in the online API docs) If we can easily avoid breaking third-party EA implementations, then we should do that. |
I think we need to in this case (for BooleanArray) to make a proper api. yes I agree we can avoid breaking all others (see above my comment). |
Yep, no problem for adding it to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the updates!
@@ -257,6 +257,8 @@ class BooleanArray(BaseMaskedArray): | |||
|
|||
# The value used to fill '_data' to avoid upcasting | |||
_internal_fill_value = False | |||
_TRUE_VALUES = {"True", "TRUE", "true", "1", "1.0"} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
these could also be module level (doesn't matter), note in the parser itself we only have
object _true_values = [b'True', b'TRUE', b'true']
object _false_values = [b'False', b'FALSE', b'false']
but yeah i think we agreed the 1/1.0 are fine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this case let's keep it there
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
small comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
thanks @phofl very nice! |
I am not really experienced with cython, so I would appreciate feedback on the switiching function. This was not done previously in case of ea boolean dtype, hence why this was failing before.