Skip to content

BUG/QST: Series.transform with a dictionary #35811

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
rhshadrach opened this issue Aug 19, 2020 · 3 comments · Fixed by #35964
Closed

BUG/QST: Series.transform with a dictionary #35811

rhshadrach opened this issue Aug 19, 2020 · 3 comments · Fixed by #35964
Milestone

Comments

@rhshadrach
Copy link
Member

rhshadrach commented Aug 19, 2020

What is the expected output of passing a dictionary to Series.transform? For example:

s = pd.Series([1, 2, 3])
result1 = s.transform({'a': lambda x: x + 1})
result2 = s.transform({'a': lambda x: x + 1, 'b': lambda x: x + 2})

The docs say that dict of axis labels -> functions is acceptable, but I can't find any example in the docs where the output is described/shown. Under the hood, Series.transform is just calling Series.aggregate which produces the following outputs for result1 and result2.

# result1
a  0    2
   1    3
   2    4
dtype: int64

# result2
a  0    2
   1    3
   2    4
b  0    3
   1    4
   2    5
dtype: int64

result1 is deemed acceptable (the length of the result equals the length of the input) and is returned, but result2 raises; it is not a transformation.

I am wondering if a better return would be a DataFrame where the keys are the column names ('a' and 'b' in this example).

@onshek
Copy link
Contributor

onshek commented Aug 25, 2020

I find this is a bug actually, the docs for pd.Series.transform is the same with pd.DataFrame.transform as

    @doc(
        NDFrame.transform,
        klass=_shared_doc_kwargs["klass"],
        axis=_shared_doc_kwargs["axis"],
    )

And I get this:

In [3]: s = pd.Series([1, 2, 3]) 
   ...: result1 = s.transform({'a': lambda x: x + 1}) 
   ...: result2 = s.transform({'a': lambda x: x + 1, 'b': lambda x: x + 2})        
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-3-32dce75aad7a> in <module>
      1 s = pd.Series([1, 2, 3])
      2 result1 = s.transform({'a': lambda x: x + 1})
----> 3 result2 = s.transform({'a': lambda x: x + 1, 'b': lambda x: x + 2})

~/Applications/anaconda3/lib/python3.7/site-packages/pandas/core/series.py in transform(self, func, axis, *args, **kwargs)
   3910         # Validate the axis parameter
   3911         self._get_axis_number(axis)
-> 3912         return super().transform(func, *args, **kwargs)
   3913 
   3914     def apply(self, func, convert_dtype=True, args=(), **kwds):

~/Applications/anaconda3/lib/python3.7/site-packages/pandas/core/generic.py in transform(self, func, *args, **kwargs)
  10812         result = self.agg(func, *args, **kwargs)
  10813         if is_scalar(result) or len(result) != len(self):
> 10814             raise ValueError("transforms cannot produce " "aggregated results")
  10815 
  10816         return result

ValueError: transforms cannot produce aggregated results

@rhshadrach
Copy link
Member Author

@onshek - yes, agreed. The question then is, what should be the output? A dataframe with columns being the keys of the dictionary seems reasonable to me, but I can't find any documentation saying/showing this. I think that fits in well with DataFrame.transform where the keys subset the columns (and, thus, are the column names in the results)

@rhshadrach rhshadrach added the Bug label Aug 25, 2020
@rhshadrach rhshadrach changed the title QST: Series.transform with a dictionary BUG/QST: Series.transform with a dictionary Aug 25, 2020
@onshek
Copy link
Contributor

onshek commented Aug 26, 2020

@rhshadrach I think the main point is axis:

In [11]: s = pd.Series(range(3))

In [13]: s.transform([np.sqrt, np.exp], axis=0)
Out[13]:
       sqrt       exp
0  0.000000  1.000000
1  1.000000  2.718282
2  1.414214  7.389056

And s.transform([np.sqrt, np.exp], axis=1) can be designed to (but now it fails)

        0  0.000000  
sqrt    1  1.000000  
        2  1.414214  
        0  1.000000  
exp     1  2.718282
        2  7.389056

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants