Skip to content

PERF: vbench for mixed groupby with datetime (GH7555) #7560

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jun 24, 2014
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion doc/source/v0.14.1.txt
Original file line number Diff line number Diff line change
Expand Up @@ -143,7 +143,7 @@ Performance
- Improvements in dtype inference for numeric operations involving yielding performance gains for dtypes: ``int64``, ``timedelta64``, ``datetime64`` (:issue:`7223`)
- Improvements in Series.transform for significant performance gains (:issue:`6496`)
- Improvements in DataFrame.transform with ufuncs and built-in grouper functions for signifcant performance gains (:issue:`7383`)

- Regression in groupby aggregation of datetime64 dtypes (:issue:`7555`)



Expand Down
7 changes: 3 additions & 4 deletions pandas/core/groupby.py
Original file line number Diff line number Diff line change
Expand Up @@ -2332,17 +2332,16 @@ def _cython_agg_blocks(self, how, numeric_only=True):
data = data.get_numeric_data(copy=False)

for block in data.blocks:
values = block.values

is_numeric = is_numeric_dtype(values.dtype)
values = block._try_operate(block.values)

if is_numeric:
if block.is_numeric:
values = com.ensure_float(values)

result, _ = self.grouper.aggregate(values, how, axis=agg_axis)

# see if we can cast the block back to the original dtype
result = block._try_cast_result(result)
result = block._try_coerce_and_cast_result(result)

newb = make_block(result, placement=block.mgr_locs)
new_blocks.append(newb)
Expand Down
8 changes: 6 additions & 2 deletions pandas/core/internals.py
Original file line number Diff line number Diff line change
Expand Up @@ -412,6 +412,11 @@ def _try_coerce_result(self, result):
""" reverse of try_coerce_args """
return result

def _try_coerce_and_cast_result(self, result, dtype=None):
result = self._try_coerce_result(result)
result = self._try_cast_result(result, dtype=dtype)
return result

def _try_fill(self, value):
return value

Expand Down Expand Up @@ -513,8 +518,7 @@ def setitem(self, indexer, value):
dtype, _ = _infer_dtype_from_scalar(value)
else:
dtype = 'infer'
values = self._try_coerce_result(values)
values = self._try_cast_result(values, dtype)
values = self._try_coerce_and_cast_result(values, dtype)
return [make_block(transf(values),
ndim=self.ndim, placement=self.mgr_locs,
fastpath=True)]
Expand Down
7 changes: 7 additions & 0 deletions vb_suite/groupby.py
Original file line number Diff line number Diff line change
Expand Up @@ -244,6 +244,13 @@ def f():
groupby_last_float32 = Benchmark('data2.groupby(labels).last()', setup,
start_date=datetime(2013, 1, 1))

# with datetimes (GH7555)
setup = common_setup + """
df = DataFrame({'a' : date_range('1/1/2011',periods=100000,freq='s'),'b' : range(100000)})
"""

groupby_mixed_first = Benchmark('df.groupby("b").first()', setup,
start_date=datetime(2013, 5, 1))

#----------------------------------------------------------------------
# groupby_indices replacement, chop up Series
Expand Down