Skip to content

Bug in Series.str.repeat with string dtype and sequence of repeats #31632

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
TomAugspurger opened this issue Feb 3, 2020 · 2 comments · Fixed by #31684
Closed

Bug in Series.str.repeat with string dtype and sequence of repeats #31632

TomAugspurger opened this issue Feb 3, 2020 · 2 comments · Fixed by #31684
Labels
Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate NA - MaskedArrays Related to pd.NA and nullable extension arrays Strings String extension data type and string data
Milestone

Comments

@TomAugspurger
Copy link
Contributor

Code Sample, a copy-pastable example if possible

In [1]: import pandas as pd

In [2]: s = pd.Series(['a', None], dtype="string")

In [3]: s.str.repeat([1, 2])
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
~/sandbox/pandas/pandas/core/strings.py in rep(x, r)
    781             try:
--> 782                 return bytes.__mul__(x, r)
    783             except TypeError:

TypeError: descriptor '__mul__' requires a 'bytes' object but received a 'NAType'

During handling of the above exception, another exception occurred:

TypeError                                 Traceback (most recent call last)
<ipython-input-3-a01827562f7a> in <module>
----> 1 s.str.repeat([1, 2])

~/sandbox/pandas/pandas/core/strings.py in wrapper(self, *args, **kwargs)
   1950                 )
   1951                 raise TypeError(msg)
-> 1952             return func(self, *args, **kwargs)
   1953
   1954         wrapper.__name__ = func_name

~/sandbox/pandas/pandas/core/strings.py in repeat(self, repeats)
   2780     @forbid_nonstring_types(["bytes"])
   2781     def repeat(self, repeats):
-> 2782         result = str_repeat(self._parent, repeats)
   2783         return self._wrap_result(result)
   2784

~/sandbox/pandas/pandas/core/strings.py in str_repeat(arr, repeats)
    785
    786         repeats = np.asarray(repeats, dtype=object)
--> 787         result = libops.vec_binop(com.values_from_object(arr), repeats, rep)
    788         return result
    789

~/sandbox/pandas/pandas/_libs/ops.pyx in pandas._libs.ops.vec_binop()
    239                 result[i] = y
    240             else:
--> 241                 raise
    242
    243     return maybe_convert_bool(result.base)  # `.base` to access np.ndarray

~/sandbox/pandas/pandas/_libs/ops.pyx in pandas._libs.ops.vec_binop()
    232         y = right[i]
    233         try:
--> 234             result[i] = op(x, y)
    235         except TypeError:
    236             if x is None or is_nan(x):

~/sandbox/pandas/pandas/core/strings.py in rep(x, r)
    782                 return bytes.__mul__(x, r)
    783             except TypeError:
--> 784                 return str.__mul__(x, r)
    785
    786         repeats = np.asarray(repeats, dtype=object)

TypeError: descriptor '__mul__' requires a 'str' object but received a 'NAType'

Problem description

The str_repeat method correctly handles NA values when repeats is a scalar, but fails when its a sequence.

Expected Output

0       a
1    <NA>
dtype: string
@TomAugspurger TomAugspurger added this to the Contributions Welcome milestone Feb 3, 2020
@TomAugspurger TomAugspurger added Strings String extension data type and string data NA - MaskedArrays Related to pd.NA and nullable extension arrays Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate labels Feb 3, 2020
@prakhar987
Copy link
Contributor

will look into this

@prakhar987
Copy link
Contributor

Seems to me is_nan() is not able to identify <NA>.
In vec_binop of ops.pyx if I use type(x) is pandas._libs.missing.NAType as an additional condition then the code works. I will look into is_nan to fix this.

prakhar987 added a commit to prakhar987/pandas that referenced this issue Feb 5, 2020
@jreback jreback modified the milestones: Contributions Welcome, 1.0.2 Feb 9, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate NA - MaskedArrays Related to pd.NA and nullable extension arrays Strings String extension data type and string data
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants