Description
I'm working on some language analysis and using pandas to munge the data and grab some descriptive stats. This is just an illustrative example, I'm doing all kinds of slighty different things.
Suppose I have a series containing chunks of text, and I want to turn the line into multiple lines, preserving the index values. Here are the naive results:
In [53]: s=pd.Series(['This is text No 1.', 'and here is no. 2','and 3'],index=['Alice','Bob','Alice'])
...: s
Out[53]:
Alice This is text No 1.
Bob and here is no. 2
Alice and 3
dtype: object
In [54]: s.map(lambda x: x.split(' '))
Out[54]:
Alice [This, is, text, No, 1.]
Bob [and, here, is, no., 2]
Alice [and, 3]
dtype: object
In [55]: s.apply(lambda x: pd.Series(x.split(' ')))
Out[55]:
0 1 2 3 4
Alice This is text No 1.
Bob and here is no. 2
Alice and 3 NaN NaN NaN
What I'd like is to be able to do is (Made up example):
In [67]: s.flatmap(lambda x: x.split(' '))
Out[67]:
Alice This
Alice is
Alice text
Alice No
Alice 1.
Bob and
Bob here
Bob is
Bob no.
Bob 2
Alice and
Alice 3
dtype: object
In general, I'd like to be able to explode a single row in a dataframe into multiple rows, by transforming one column value into multiple values, each becoming a new row with the value of other columns prserved, for example:
In [69]: df=pd.DataFrame([['2014-01-01','Alice',"A B"],['2014-01-02','Bob','C D']],columns=['dt','name','text'])
...: df
Out[69]:
dt name text
0 2014-01-01 Alice A B
1 2014-01-02 Bob C D
In [70]: df.flatmap(lambda x: x.split(),on='text')
dt name text
0 2014-01-01 Alice A
1 2014-01-01 Alice B
2 2014-01-01 Bob C
3 2014-01-01 Bob D
Perhaps there's another way to do this, but that's how my natural instict suggests this should be done, flatmap is a fairly universal concept.
Groupby already does similar things based on return type, It doesn't have to be limited to groupby though.