Skip to content

BUG/DTYPES: preserve bools in convert_objects #7416

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jun 10, 2014
Merged

BUG/DTYPES: preserve bools in convert_objects #7416

merged 1 commit into from
Jun 10, 2014

Conversation

cpcloud
Copy link
Member

@cpcloud cpcloud commented Jun 10, 2014

closes #7126

@cpcloud cpcloud added this to the 0.14.1 milestone Jun 10, 2014
@cpcloud cpcloud self-assigned this Jun 10, 2014
@jreback
Copy link
Contributor

jreback commented Jun 10, 2014

do a perf test on the suite just in case

@cpcloud
Copy link
Member Author

cpcloud commented Jun 10, 2014

yep

@cpcloud
Copy link
Member Author

cpcloud commented Jun 10, 2014

needs a doc to note that now convert_objects won't upcast bool

@cpcloud
Copy link
Member Author

cpcloud commented Jun 10, 2014

hm actually it never did up cast ... only issue was when there was mixed bool + others

@jreback
Copy link
Contributor

jreback commented Jun 10, 2014

ahh..yes mixed is always tricky

@cpcloud
Copy link
Member Author

cpcloud commented Jun 10, 2014

i think just an oversight, as there IS a util.is_bool_object function

@jreback
Copy link
Contributor

jreback commented Jun 10, 2014

yep....i have always modified maybe_convert_objects (except for a long time ago)

@cpcloud
Copy link
Member Author

cpcloud commented Jun 10, 2014

maybe_convert_objects differentiates between bool and general numeric types so it ends up returning its input if the input is mixed bool + others, so then the issue was that in maybe_convert_numeric no checking for bools was done so it would fail all the elif checks and go to nan.

@cpcloud
Copy link
Member Author

cpcloud commented Jun 10, 2014

rats looks like cparser depends a lot on the current behavior

@cpcloud
Copy link
Member Author

cpcloud commented Jun 10, 2014

i wonder if there's a huge ML discussion somewhere about the distinction between bool and int even tho bool inherits from int

@cpcloud
Copy link
Member Author

cpcloud commented Jun 10, 2014

@jreback good 2 go?

@cpcloud
Copy link
Member Author

cpcloud commented Jun 10, 2014

oh wait perf

@jreback
Copy link
Contributor

jreback commented Jun 10, 2014

looks fine (but check perf)

@cpcloud
Copy link
Member Author

cpcloud commented Jun 10, 2014

ran it 4 times, pretty variable on different parts of the codebase:

Round 1:

-------------------------------------------------------------------------------
Test name                                    | head[ms] | base[ms] |  ratio   |
-------------------------------------------------------------------------------
frame_ctor_dtindex_BQuarterEndx2             |   1.4610 |   1.2843 |   1.1376 |
frame_assign_timeseries_index                |   0.6957 |   0.6104 |   1.1398 |
strings_contains_few                         |   5.3413 |   4.6817 |   1.1409 |
frame_ctor_dtindex_BDayx1                    |   1.4300 |   1.1743 |   1.2177 |
frame_ctor_dtindex_BYearBeginx1              |   1.5349 |   1.2330 |   1.2449 |
reindex_multiindex                           |   1.5914 |   1.2546 |   1.2684 |
frame_ctor_dtindex_BYearEndx1                |   1.7547 |   1.2757 |   1.3755 |
frame_mult                                   |   6.4913 |   4.0673 |   1.5959 |
frame_add_st                                 |   6.1157 |   3.8283 |   1.5975 |
-------------------------------------------------------------------------------
Test name                                    | head[ms] | base[ms] |  ratio   |
-------------------------------------------------------------------------------

Round 2 just on a regex that matches most of the previous round's benches:

-------------------------------------------------------------------------------
Test name                                    | head[ms] | base[ms] |  ratio   |
-------------------------------------------------------------------------------
frame_ctor_dtindex_BMonthBeginx1             |   1.4087 |   1.2794 |   1.1011 |
frame_ctor_dtindex_CBMonthEndx1              |   3.1866 |   2.7313 |   1.1667 |
eval_frame_mult_python                       |  19.4876 |  14.1617 |   1.3761 |
frame_mult_st                                |   5.5260 |   3.9387 |   1.4030 |
-------------------------------------------------------------------------------
Test name                                    | head[ms] | base[ms] |  ratio   |
-------------------------------------------------------------------------------

Round 3 (same as the previous round):

-------------------------------------------------------------------------------
Test name                                    | head[ms] | base[ms] |  ratio   |
-------------------------------------------------------------------------------
frame_ctor_dtindex_CustomBusinessDayx2       |   1.2734 |   1.2634 |   1.0079 |
frame_mult_st                                |   3.9593 |   3.9230 |   1.0093 |
frame_ctor_dtindex_Secondx2                  |   0.9654 |   0.9537 |   1.0123 |
eval_frame_mult_python_one_thread            |  14.0200 |  13.7023 |   1.0232 |
frame_ctor_dtindex_BYearBeginx2              |   1.3304 |   1.2457 |   1.0680 |
strings_contains_few                         |   5.2733 |   4.8534 |   1.0865 |
-------------------------------------------------------------------------------
Test name                                    | head[ms] | base[ms] |  ratio   |
-------------------------------------------------------------------------------

Round 4 (same as round 1):

-------------------------------------------------------------------------------
Test name                                    | head[ms] | base[ms] |  ratio   |
-------------------------------------------------------------------------------
join_dataframe_index_single_key_small        |   6.4983 |   5.9470 |   1.0927 |
frame_reindex_columns                        |   0.3057 |   0.2790 |   1.0954 |
strings_contains_many                        |   5.3923 |   4.8310 |   1.1162 |
eval_frame_and_python                        |  44.8277 |  38.6840 |   1.1588 |
panel_shift_minor                            |   0.1193 |   0.0870 |   1.3708 |
-------------------------------------------------------------------------------
Test name                                    | head[ms] | base[ms] |  ratio   |
-------------------------------------------------------------------------------

@cpcloud
Copy link
Member Author

cpcloud commented Jun 10, 2014

i think this is ok, just variability rearing its ugly head

@jreback
Copy link
Contributor

jreback commented Jun 10, 2014

yep
anytime I change key code I give a check

cpcloud added a commit that referenced this pull request Jun 10, 2014
BUG/DTYPES: preserve bools in convert_objects
@cpcloud cpcloud merged commit 18d0155 into pandas-dev:master Jun 10, 2014
@cpcloud cpcloud deleted the convert-bools-fix-7126 branch June 10, 2014 21:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Dtype Conversions Unexpected or buggy dtype conversions Enhancement Numeric Operations Arithmetic, Comparison, and Logical operations
Projects
None yet
Development

Successfully merging this pull request may close these issues.

convert_objects incorrectly converting bools to nans
2 participants