Skip to content

Feature request: Series.flatmap, DataFrame.flatmap #8517

Closed
@kay1793

Description

@kay1793

I'm working on some language analysis and using pandas to munge the data and grab some descriptive stats. This is just an illustrative example, I'm doing all kinds of slighty different things.

Suppose I have a series containing chunks of text, and I want to turn the line into multiple lines, preserving the index values. Here are the naive results:

In [53]: s=pd.Series(['This is text No 1.', 'and here is no. 2','and 3'],index=['Alice','Bob','Alice'])
    ...: s
Out[53]: 
Alice    This is text No 1.
Bob       and here is no. 2
Alice                 and 3
dtype: object

In [54]: s.map(lambda x: x.split(' '))
Out[54]: 
Alice    [This, is, text, No, 1.]
Bob       [and, here, is, no., 2]
Alice                    [and, 3]
dtype: object

In [55]: s.apply(lambda x: pd.Series(x.split(' ')))
Out[55]: 
          0     1     2    3    4
Alice  This    is  text   No   1.
Bob     and  here    is  no.    2
Alice   and     3   NaN  NaN  NaN

What I'd like is to be able to do is (Made up example):

In [67]: s.flatmap(lambda x: x.split(' '))
Out[67]: 
Alice    This
Alice    is
Alice    text
Alice    No
Alice    1.
Bob     and
Bob    here
Bob    is
Bob    no.
Bob    2
Alice   and
Alice  3
dtype: object

In general, I'd like to be able to explode a single row in a dataframe into multiple rows, by transforming one column value into multiple values, each becoming a new row with the value of other columns prserved, for example:

In [69]: df=pd.DataFrame([['2014-01-01','Alice',"A B"],['2014-01-02','Bob','C D']],columns=['dt','name','text'])
    ...: df
Out[69]: 
           dt   name text
0  2014-01-01  Alice  A B
1  2014-01-02    Bob  C D

In [70]: df.flatmap(lambda x: x.split(),on='text')
           dt   name text
0  2014-01-01  Alice  A
1  2014-01-01  Alice  B
2  2014-01-01    Bob  C
3  2014-01-01    Bob  D

Perhaps there's another way to do this, but that's how my natural instict suggests this should be done, flatmap is a fairly universal concept.
Groupby already does similar things based on return type, It doesn't have to be limited to groupby though.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions