Skip to content

BUG: in DataFrame.count not returning subclassed data types. #31139

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 2 commits into from
Closed

BUG: in DataFrame.count not returning subclassed data types. #31139

wants to merge 2 commits into from

Conversation

EmilianoJordan
Copy link

@EmilianoJordan EmilianoJordan commented Jan 19, 2020

  • closes #xxxx
  • tests added / passed
  • passes black pandas
  • passes git diff upstream/master -u -- "*.py" | flake8 --diff
  • whatsnew entry

Once #30945 is merged these tests will pass.

I also, didn't know if a PR without first submitting an issue was appropriate, but to cut down on workload I decided to give it a try. If I'm mistaken please let me know.

As for the test there are four logical branches in df.count() that needed to be address within the test:

  1. Non-homogeneous data types (this is the partion of the test that is dependent on BUG: Use self._constructor_sliced in df._reduce to respect subclassed… #30945 as it uses .sum()).
        df = tm.SubclassedDataFrame(
            {
                "Person": ["John", "Myla", "Lewis", "John", "Myla"],
                "Age": [24.0, np.nan, 21.0, 33, 26],
                "Single": [False, True, True, True, False],
            }
        )
        result = df.count()
        assert isinstance(result, tm.SubclassedSeries)
  1. Homogeneous data.
        df = tm.SubclassedDataFrame({"A": [1, 0, 3], "B": [0, 5, 6], "C": [7, 8, 0]})
        result = df.count()
        assert isinstance(result, tm.SubclassedSeries)
  1. MultiIndex with level kwarg.
        df = tm.SubclassedDataFrame(
            [[10, 11, 12, 13], [20, 21, 22, 23], [30, 31, 32, 33], [40, 41, 42, 43]],
            index=MultiIndex.from_tuples(
                list(zip(list("AABB"), list("cdcd"))), names=["aaa", "ccc"]
            ),
            columns=MultiIndex.from_tuples(
                list(zip(list("WWXX"), list("yzyz"))), names=["www", "yyy"]
            ),
        )
        result = df.count(level=1)
        assert isinstance(result, tm.SubclassedDataFrame)
  1. Force length of axis to be 0.
        df = tm.SubclassedDataFrame()
        result = df.count()
        assert isinstance(result, tm.SubclassedSeries)

@pep8speaks
Copy link

pep8speaks commented Jan 19, 2020

Hello @EmilianoJordan! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2020-01-19 20:23:00 UTC

@jreback jreback added Compat pandas objects compatability with Numpy or Python functions API - Consistency Internal Consistency of API/Behavior Numeric Operations Arithmetic, Comparison, and Logical operations and removed Compat pandas objects compatability with Numpy or Python functions labels Jan 20, 2020
@jreback
Copy link
Contributor

jreback commented Jan 20, 2020

you can include the other PR in this commit to get it passing.

@EmilianoJordan EmilianoJordan deleted the subclass-count branch January 21, 2020 13:13
@EmilianoJordan EmilianoJordan restored the subclass-count branch January 21, 2020 13:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
API - Consistency Internal Consistency of API/Behavior Numeric Operations Arithmetic, Comparison, and Logical operations
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants