-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
Failure in groupby-apply if aggregating timedelta and datetime columns #15562
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
What you are doing completely non-performant and way less readable than this idiomatic.
I'll mark it as a bug, though, its shouldn't error (as you are giving back a series; the inference logic is already quite complex). welcome for a PR to fix though. |
Oh yes, this is definitely horrible pandas code to write! It's a simplified example from a much more complex script doing things more complicated than min(). I found a work-around where I return a DataFrame from the applied function rather than a Series. |
@field-cady my point is you can compute
or you can be even more specific via
is also quite idiomatic / readable. |
It seems that the error comes from the creation of Ex. Here we our initial |
so the purpose is to infer from a bunch of objects whether they are datetimelikes (e.g. datetime, datetime w/tz or timedeltas). This goes thru some logic to see if its possible, then tries to convert, backing aways if things are not fully convertible (IOW there are mixed non-same types things that are not NaNs/NaT's). IOW you will get a datetime64[ns] or datetime64[tz-aware] or timedelta64[ns] or back what you started. The only thing is I think I originally made this work regardless of the passed in shape (see the ravel). This is wrong, it should preserve the shape and return a list of array-like if ndim > 1 or array-like if ndim == 1 . array-like are the converted objects or original array if you cannot convert successfully. So this would fix this issue (and another one, ill find the reference). happy to have you put up a patch! |
Looks to work on master. Could use a test:
|
works with released version (0.25.1), too. Can be closed. |
Care to contribute a regression test @qudade? |
@mroeschke Sure, will do! |
Code Sample, a copy-pastable example if possible
Problem description
The current behavior is that the groupby-apply line fails, with the error message indicated below.
Expected Output
clientid date clientid_age
0 A 2017-02-01 00:00:00 0 days
1 B 2017-02-01 00:00:00 0 days
2 C 2017-02-01 00:00:00 0 days
Output of
pd.show_versions()
pandas: 0.18.1
nose: 1.3.7
pip: 9.0.1
setuptools: 27.2.0
Cython: 0.24.1
numpy: 1.11.3
scipy: 0.18.1
statsmodels: 0.6.1
xarray: None
IPython: 5.1.0
sphinx: 1.4.6
patsy: 0.4.1
dateutil: 2.5.3
pytz: 2016.6.1
blosc: None
bottleneck: 1.1.0
tables: 3.2.3.1
numexpr: 2.6.1
matplotlib: 1.5.3
openpyxl: 2.3.2
xlrd: 1.0.0
xlwt: 1.1.2
xlsxwriter: 0.9.3
lxml: 3.6.4
bs4: 4.5.1
html5lib: None
httplib2: None
apiclient: None
sqlalchemy: 1.0.13
pymysql: None
psycopg2: 2.6.2 (dt dec pq3 ext lo64)
jinja2: 2.8
boto: 2.42.0
pandas_datareader: None
Traceback (most recent call last):
File "", line 2, in
File "/Users/fieldc/anaconda/lib/python2.7/site-packages/pandas/core/groupby.py", line 651, in apply
return self._python_apply_general(f)
File "/Users/fieldc/anaconda/lib/python2.7/site-packages/pandas/core/groupby.py", line 660, in _python_apply_general
not_indexed_same=mutated or self.mutated)
File "/Users/fieldc/anaconda/lib/python2.7/site-packages/pandas/core/groupby.py", line 3343, in _wrap_applied_output
axis=self.axis).unstack()
File "/Users/fieldc/anaconda/lib/python2.7/site-packages/pandas/core/series.py", line 2043, in unstack
return unstack(self, level, fill_value)
File "/Users/fieldc/anaconda/lib/python2.7/site-packages/pandas/core/reshape.py", line 408, in unstack
return unstacker.get_result()
File "/Users/fieldc/anaconda/lib/python2.7/site-packages/pandas/core/reshape.py", line 169, in get_result
return DataFrame(values, index=index, columns=columns)
File "/Users/fieldc/anaconda/lib/python2.7/site-packages/pandas/core/frame.py", line 255, in init
copy=copy)
File "/Users/fieldc/anaconda/lib/python2.7/site-packages/pandas/core/frame.py", line 432, in _init_ndarray
return create_block_manager_from_blocks([values], [columns, index])
File "/Users/fieldc/anaconda/lib/python2.7/site-packages/pandas/core/internals.py", line 3993, in create_block_manager_from_blocks
construction_error(tot_items, blocks[0].shape[1:], axes, e)
File "/Users/fieldc/anaconda/lib/python2.7/site-packages/pandas/core/internals.py", line 3967, in construction_error
if block_shape[0] == 0:
IndexError: tuple index out of range
The text was updated successfully, but these errors were encountered: