Skip to content

Series.str.cat(',') without "sep" keyword arg raises unintuitive TypeError: len() of unsized object #11334

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
hack-c opened this issue Oct 15, 2015 · 6 comments
Labels
Error Reporting Incorrect or improved errors from pandas Strings String extension data type and string data
Milestone

Comments

@hack-c
Copy link

hack-c commented Oct 15, 2015

I think this easy-to-make mistake should raise a more descriptive error... maybe something like

ValueError: Did you mean to supply a 'sep' keyword?

I might be missing the main usage of str.cat(), but all the uses I've found involve concatenating series together with a sep character.

In [1]: import pandas as pd

In [2]: s = pd.Series(['foo', 'bar', 'baz'])

In [3]: s.str.cat('|')
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-3-8bc5f2a1347b> in <module>()
----> 1 s.str.cat('|')

/home/charles.hack/anaconda/envs/seahydra/lib/python2.7/site-packages/pandas/core/strings.pyc in cat(self, others, sep, na_rep)
   1137     @copy(str_cat)
   1138     def cat(self, others=None, sep=None, na_rep=None):
-> 1139         result = str_cat(self.series, others=others, sep=sep, na_rep=na_rep)
   1140         return self._wrap_result(result)
   1141

/home/charles.hack/anaconda/envs/seahydra/lib/python2.7/site-packages/pandas/core/strings.pyc in str_cat(arr, others, sep, na_rep)
     71         arrays = _get_array_list(arr, others)
     72
---> 73         n = _length_check(arrays)
     74         masks = np.array([isnull(x) for x in arrays])
     75         cats = None

/home/charles.hack/anaconda/envs/seahydra/lib/python2.7/site-packages/pandas/core/strings.pyc in _length_check(others)
    111         if n is None:
    112             n = len(x)
--> 113         elif len(x) != n:
    114             raise ValueError('All arrays must be same length')
    115

TypeError: len() of unsized object

In [4]: s.str.cat(sep='|')
Out[4]: 'foo|bar|baz'
@jreback
Copy link
Contributor

jreback commented Oct 15, 2015

sounds reasonable. pull-request?

@jreback jreback added Error Reporting Incorrect or improved errors from pandas Strings String extension data type and string data labels Oct 15, 2015
@jreback jreback added this to the Next Major Release milestone Oct 15, 2015
@hack-c
Copy link
Author

hack-c commented Oct 16, 2015

Will do.

@hack-c
Copy link
Author

hack-c commented Jan 8, 2016

I propose to fix this by changing the order of the keyword args, like so:

def str_cat(arr, sep=None, others=None, na_rep=None):
    """
    Concatenate strings in the Series/Index with given separator.

    Parameters
    ----------
    sep : string or None, default None
    others : list-like, or list of list-likes
      If None, returns str concatenating strings of the Series
    na_rep : string or None, default None
        If None, an NA in any array will propagate

So that sep comes first.

This is an API change though and breaks some tests, because str_cat used to expect the others argument first. (I'm not sure why.)

@jreback, does that sound reasonable?

That way,

    >>> Series(['a', 'b', 'c']).str.cat('|')
    'a|b|c'

    >>> Series(['a','b',np.nan,'c']).str.cat(' ', na_rep='?')
    'a b ? c'

    >>> Series(['a', 'b', 'c']).str.cat(',', others=['A', 'B', 'C'])
    0    a,A
    1    b,B
    2    c,C
    dtype: object


    >>> Series(['a', 'b']).str.cat(others=[['x', 'y'], ['1', '2']], sep=',')
    0    a,x,1
    1    b,y,2
    dtype: object

@jreback
Copy link
Contributor

jreback commented Jan 8, 2016

@hack-c no, that doesn't make sense, why just catch and raise a nice error?

@hack-c
Copy link
Author

hack-c commented Jan 8, 2016

Ok. :) I just like that API design better but realize it's late in the game for a breaking change like that.

@jreback
Copy link
Contributor

jreback commented Jan 8, 2016

I think you can check for this error conditiion and give a really helpful error (e.g. you can even say, did you mean 'sep')

@jreback jreback modified the milestones: 0.18.0, Next Major Release Feb 12, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Error Reporting Incorrect or improved errors from pandas Strings String extension data type and string data
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants