Skip to content

BUG: make mean() raise an exception for strings #44131

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 6 commits into from

Conversation

kinshukdua
Copy link

@kinshukdua kinshukdua commented Oct 21, 2021

This PR makes mean() raise an exception when sum() returns a string. Sum performs as expected since it concatenates. However _ensure_numeric causes implicit conversion of str to a numeric datatype

the_sum = _ensure_numeric(values.sum(axis, dtype=dtype_sum))

Instead of this, the return value of sum() will be checked and if it is a string, an exception will be raised.
This is a breaking change.

@jbrockmendel
Copy link
Member

There are a bunch of issues about reductions with strings. can you see if this closes any of the others?

@kinshukdua
Copy link
Author

Added all the issues I could find that this PR would close, I think we also need to decide whether we want to keep the behavior of sum() concatenating strings, add a note about the same or change it.
Regarding the failing tests, I'll try to see why they are failing and try to fix that.

Copy link
Member

@phofl phofl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your change broke tests. Please add tests for every related issue

@jorisvandenbossche
Copy link
Member

Regarding the failing tests (reporting from gitter here): I also can't reproduce those locally when checking out this branch. Not fully sure what would cause this discrepancy (in any case it doesn't seem to be related to the python or numpy version, since CI is failing for several versions of those).

@kinshukdua
Copy link
Author

@phofl I reran the tests after installing the environments from ci using conda and the tests pass locally even on the environments (variants of actions-38.yaml) for which the tests are failing on ci. Any idea why this is happening?

@jreback
Copy link
Contributor

jreback commented Oct 24, 2021

ok if you are going to close an issue, pls explicitly make sure its listed in the test its (e.g. a reference) or have an explict test.

@jreback jreback added Algos Non-arithmetic algos: value_counts, factorize, sorting, isin, clip, shift, diff ExtensionArray Extending pandas with custom dtypes or arrays. labels Oct 24, 2021
@jreback
Copy link
Contributor

jreback commented Oct 24, 2021

dont' worry about .sum it is fine.

@phofl
Copy link
Member

phofl commented Oct 24, 2021

Not really, no

@jbrockmendel
Copy link
Member

@kinshukdua can you add tests specific to each of the issues this is closing

@kinshukdua
Copy link
Author

@jbrockmendel yes, I'll add them, sorry for the delay.

@kinshukdua kinshukdua requested a review from phofl November 13, 2021 13:59
@jreback jreback added this to the 1.4 milestone Nov 13, 2021
@jreback
Copy link
Contributor

jreback commented Nov 13, 2021

can you replicate these examples in tests:
#22642
#36703
#44008

also pls list the issues in the whatsnew note

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this warrents a sub-section in the whatsnew e.g. showing what previously worked and the new

@kinshukdua
Copy link
Author

can you replicate these examples in tests: #22642 #36703 #44008

also pls list the issues in the whatsnew note

The column "A" tests the string to integer implicit conversion in #22642 and #44008, since the column won't be dropped if they are integers.

The column "D" in the test dataframe takes care of the cascading to complex-128 issue in #36703. If J is converted to complex, we would see that in the output series.

Should I create a different test for each issue?

@kinshukdua kinshukdua requested a review from jreback November 15, 2021 12:14
@jreback
Copy link
Contributor

jreback commented Nov 28, 2021

can you merge master and will have a look

@kinshukdua
Copy link
Author

@jreback, sorry for the delay, I rebased to fix the merge conflict.

@jreback
Copy link
Contributor

jreback commented Dec 5, 2021

@kinshukdua this is failing tests

@jreback
Copy link
Contributor

jreback commented Dec 23, 2021

this is worthile but needs to be passing, and pls merge master

@jreback jreback removed this from the 1.4 milestone Dec 23, 2021
@kinshukdua
Copy link
Author

@jreback as I've explained before, for some reason the tests are all passing on my local environment, I've tried to reproduce the CI as close as possible by using the yaml files to recreate my testing environment. However, the tests still seem to pass. @jorisvandenbossche confirmed it by doing the same . I'm not sure what I should do.

@jreback
Copy link
Contributor

jreback commented Jan 28, 2022

@jreback as I've explained before, for some reason the tests are all passing on my local environment, I've tried to reproduce the CI as close as possible by using the yaml files to recreate my testing environment. However, the tests still seem to pass. @jorisvandenbossche confirmed it by doing the same . I'm not sure what I should do.

well sure, we are running multiple environments in CI which may have slightly different behavior.

you need to install one of the failing environments and investigate.

we cannot merge until CI is green.

@jbrockmendel
Copy link
Member

FWIW i wasn't able to reproduce the errors on either my "normal" build or a py38+np1.18.5 build. @kinshukdua is right that something weird is going on here.

Doesn't change the fact that the CI needs to get to green before this can be merged. Just saying it's not an easy problem.

@jreback
Copy link
Contributor

jreback commented Mar 6, 2022

@kinshukdua if you can merge master and get this passing

@mroeschke
Copy link
Member

Thanks for the effort on this pull request, but it appears to have gone stale. If interested in continuing, please merge in main and we can reopen.

@mroeschke mroeschke closed this May 7, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Algos Non-arithmetic algos: value_counts, factorize, sorting, isin, clip, shift, diff ExtensionArray Extending pandas with custom dtypes or arrays.
Projects
None yet
6 participants