-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
groupby, as_index=False, with pandas.Series.count() as an agg #8381
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
show pd.show_versions() |
|
I updated my original comment--I realized my last example didn't make sense..was counting strangely (according to my intuition) |
count has a different implementation that most other methods (e.g. they are in cython or are essentially a python loop over the groups). count you get for 'free' as it is available by definition when you group. The other routines handle nuiscance columns (e.g. trying to perform a numeric operation on a string column) by excluding them. count needs to do the same. This works for care to do a pull-request? |
Sorry, but I have no C programming experience, otherwise I would help. Hopefully other's find value in fixing this bug. |
no c involved all python |
Ok! me and @jcauteru will give it a try |
gr8! create a test to compare against an expected result keep in mind their are lots of other tests which have to pass as well |
FYI, I haven't forgotten about this. What's going wrong is that "astype('int64') is being applied to the nuisance columns (the strings). The bug can be fixed (at least for this small test case originally posted) by removing the requirement that the count is of the dtype int64 or, alternatively, by passing the function to _python_agg_general which iterates through everything except the exclusions in groupby.py. Both of these fixes fail the nose tests (primarily AssertionError: attr is not equal [dtype]: dtype('float64') != dtype('int64')) so I'm exploring a different method, perhaps requiring int64 at a different point in routine. @jcauteru |
I am experiencing a similar problem when the column used for groupby is of type float. No exception is raised, but the resulting column in casted to int64:
x y My version is : commit: None pandas: 0.16.2 |
Is this issue related to #10355 ? |
they look similar. You want to take a crack at writing some tests and use the fix I suggested in that issue to see if it fixes? |
I will try but it will take some time.... I am still having problems with the test environment (I've cloned the repository and run the existing tests, before changing any code, and I get FAILED (SKIP=543, errors=1, failures=2) , now I am trying to checkout a release tag ). I am also using a less powerful computer for development. |
I've been stung with this issue too. Running @livia-b's test on latest master gives a different error:
It works for
|
It seems to work for SeriesGroupBy objects but not DataFrameGroupBy. We only get a SeriesGroupBy if we select a column after using
|
These all are fixed in master (you need a very recent master)
xref #11079 (it was fixed in an earlier commit) |
Why doesn't the pandas.Series.count() method work as a valid aggregation with groupby when as_index=False?
Now, if I try to do a group by
Here is the error I get:
When i set as_index=True, I get
When I change the agg function and set_index=False, I get a weird result tooL
UPDATE: Realized my last result was not counting correctly and am now thoroughly confused.
The text was updated successfully, but these errors were encountered: