-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
unique aggregation unexpectedly returning different type #22558
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
How weird! I would have expected a |
After doing some digging I was able to find the cause for the mismatch in behavior. Lines 538 to 544 in a5fe9cf
After getting the result of the 'unique' aggregation as an OrderedDict ,OrderedDict([('a', array([1, 2, 3])), ('b', array([1]))] andOrderedDict([('a', array([2])), ('b', array([1]))] the code I'm linking tries to cast it to a Dataframe , and return a Series of lists if the casting fails. In other words, when the lengths of the lists in the values of result are equal a DataFrame is returned, and a Series otherwise.
I've confirmed that we can get the desired behavior if we don't try to cast to a Thoughts? |
I said give it a try, run tests, and see what happens! |
These look correct on master now. Could use a test
|
take |
* TST: Added test for consistent type with unique agg #22558 * TST: Added test for consistent type with unique agg #22558 * TST: Moved and restructured test #22558 * TST: Added test for nested series #22400 * TST: Added equality test for nested series #22400 Co-authored-by: Steven Rotondo <[email protected]>
* TST: Added test for consistent type with unique agg #22558 * TST: Added test for consistent type with unique agg #22558 * TST: Moved and restructured test #22558 * TYP: Fixed mypy issues in frequencies * TYP: Removed accidental inclusion Co-authored-by: Steven Rotondo <[email protected]>
* TST: Added test for consistent type with unique agg pandas-dev#22558 * TST: Added test for consistent type with unique agg pandas-dev#22558 * TST: Moved and restructured test pandas-dev#22558 * TST: Added test for nested series pandas-dev#22400 * TST: Added equality test for nested series pandas-dev#22400 Co-authored-by: Steven Rotondo <[email protected]>
* TST: Added test for consistent type with unique agg pandas-dev#22558 * TST: Added test for consistent type with unique agg pandas-dev#22558 * TST: Moved and restructured test pandas-dev#22558 * TYP: Fixed mypy issues in frequencies * TYP: Removed accidental inclusion Co-authored-by: Steven Rotondo <[email protected]>
* TST: Added test for consistent type with unique agg #22558 * TST: Added test for consistent type with unique agg #22558 * TST: Moved and restructured test #22558 * TST: Moved test to different file #22558 * TST: Changed scalars to 1-element lists Co-authored-by: Steven Rotondo <[email protected]>
Code Sample
Output
Problem description
When performing 'unique' aggregations on a dataframe, the results can be returned as different types in an unexpected manner.
Generally, when performing a 'unique' aggregation on several columns of a dataframe as done above, a
pandas.Series
of numpy arrays is returned, with one element per aggregation column. This, I think, is the expected behavior, and is demonstrated in the first aggregation above.However, there is a special case. When all aggregation columns have exactly 1 unique element, a
pandas.DataFrame
with one row is returned instead. I'm pretty sure this is unintended behavior, and it requires special case handling when doing such aggregations.Expected Output
Output of
pd.show_versions()
INSTALLED VERSIONS
commit: None
python: 3.6.4.final.0
python-bits: 64
OS: Darwin
OS-release: 17.7.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: en_US.UTF-8
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8
pandas: 0.23.1
pytest: None
pip: 9.0.3
setuptools: 39.0.1
Cython: None
numpy: 1.14.5
scipy: 1.1.0
pyarrow: None
xarray: None
IPython: 6.2.1
sphinx: 1.7.5
patsy: 0.5.0
dateutil: 2.7.0
pytz: 2018.3
blosc: None
bottleneck: None
tables: 3.4.2
numexpr: 2.6.4
feather: None
matplotlib: 2.2.0
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: 1.0.1
sqlalchemy: 1.2.5
pymysql: None
psycopg2: None
jinja2: 2.10
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None
The text was updated successfully, but these errors were encountered: