Skip to content

Series.str.split() behavior on multi-character patterns (pandas 0.9.1, py2.7) #2513

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
gdraps opened this issue Dec 13, 2012 · 3 comments
Closed
Labels
Milestone

Comments

@gdraps
Copy link
Contributor

gdraps commented Dec 13, 2012

Not sure if this is an issue in my setup or the intended behavior, but str.split behavior with multi-character patterns changed between 0.9.0 and 0.9.1, most likely due to #2119. Here is the original behavior going back to 0.8.1 as best I can tell.

In [1]: pd.__version__
Out[1]: '0.9.1.dev-c252129'

In [2]: s = pd.Series(["D0->D2"])

In [3]: s.str.split("->")
Out[3]: 0    [D0, D2]

In 0.9.1:

In [1]: pd.__version__
Out[1]: '0.9.1'

In [2]: s = pd.Series(["D0->D2"])

In [3]: s.str.split("->")
Out[3]: 0    [D0->D2]

Setting n=0 restores the behavior, on my Python install at least.

In [5]: s.str.split("->", n=0)
Out[5]: 0    [D0, D2]

Reproducible in 0.10.0b1.

In [1]: pd.__version__
Out[1]: '0.10.0b1'

In [2]: s = pd.Series(["D0->D2"])

In [3]: s.str.split("->")
Out[3]: 0    [D0->D2]

Python 2.7.2+ (default, Jul 20 2012, 22:12:53)
[GCC 4.6.1] on linux2

Any thoughts on changing the default n for str.split back to 0?

Many thanks!

@wesm
Copy link
Member

wesm commented Dec 13, 2012

thanks for the report, look like a bug. will have a look

@gdraps
Copy link
Contributor Author

gdraps commented Dec 13, 2012

Dug a little and found that str.split and re.split treat maxsplit differently: str.split returns all splits when maxsplit is -1, while re.split does the same when maxsplit is 0.

@changhiskhan
Copy link
Contributor

n is None by default now and {None, 0, -1} will all have the same behavior

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants