groupby/transform with NaNs in grouped column

What's the expected behavior when grouping on a column containing `NaN` and then applying `transform`? For a `Series`, the current result is to throw an exception:

```
>>> df = pd.DataFrame({
...     'a' : range(10),
...     'b' : [1, 1, 2, 3, np.nan, 4, 4, 5, 5, 5]})
>>> 
>>> df.groupby(df.b)['a'].transform(max)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "pandas/core/groupby.py", line 2422, in transform
    return self._transform_fast(cyfunc)
  File "pandas/core/groupby.py", line 2463, in _transform_fast
    return self._set_result_index_ordered(Series(values))
  File "pandas/core/groupby.py", line 498, in _set_result_index_ordered
    result.index = self.obj.index
  File "pandas/core/generic.py", line 1997, in __setattr__
    return object.__setattr__(self, name, value)
  File "pandas/src/properties.pyx", line 65, in pandas.lib.AxisProperty.__set__ (pandas/lib.c:41301)
    obj._set_axis(self.axis, value)
  File "pandas/core/series.py", line 273, in _set_axis
    self._data.set_axis(axis, labels)
  File "pandas/core/internals.py", line 2219, in set_axis
    'new values have %d elements' % (old_len, new_len))
ValueError: Length mismatch: Expected axis has 9 elements, new values have 10 elements
```

For a `DataFrame`, the missing value gets filled in with what looks like an uninitialized value from `np.empty_like`:

```
>>> df.groupby(df.b).transform(max)
   a
0  1
1  1
2  2
3  3
4 -1
5  6
6  6
7  9
8  9
9  9
```

It seems like either it should fill in the missing values with `NaN` (which might require a change of dtype), or just drop those rows from the result (which requires the shape to change). Either solution has the potential to surprise.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

groupby/transform with NaNs in grouped column #9941

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

groupby/transform with NaNs in grouped column #9941

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions