Skip to content

ENH: Add FrozenList.union and .difference #23394

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Nov 3, 2018
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 10 additions & 0 deletions doc/source/groupby.rst
Original file line number Diff line number Diff line change
Expand Up @@ -125,6 +125,16 @@ We could naturally group by either the ``A`` or ``B`` columns, or both:
grouped = df.groupby('A')
grouped = df.groupby(['A', 'B'])

.. versionadded:: 0.24

If we also have a MultiIndex on columns ``A`` and ``B``, we can group by all
but the specified columns

.. ipython:: python

df2 = df.set_index(['A', 'B'])
grouped = df2.groupby(level=df2.index.names.difference(['B'])

These will split the DataFrame on its index (rows). We could also split by the
columns:

Expand Down
5 changes: 2 additions & 3 deletions doc/source/whatsnew/v0.24.0.txt
Original file line number Diff line number Diff line change
Expand Up @@ -13,10 +13,9 @@ v0.24.0 (Month XX, 2018)
New features
~~~~~~~~~~~~
- :func:`merge` now directly allows merge between objects of type ``DataFrame`` and named ``Series``, without the need to convert the ``Series`` object into a ``DataFrame`` beforehand (:issue:`21220`)


- ``ExcelWriter`` now accepts ``mode`` as a keyword argument, enabling append to existing workbooks when using the ``openpyxl`` engine (:issue:`3441`)

- ``FrozenList`` has gained the ``.union()`` and ``.difference()`` methods. This functionality greatly simplifies groupby's that rely on explicitly excluding certain columns. See :ref:`Splitting an object into groups
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This still uses FrozenList, a term that is nowhere mentioned currently in the docs.

Personally, that also indicates a problem with this feature for me. I think we should either properly document FrozenList its methods in the pandas API, or either not have additional methods on it (compared to list).

<groupby.split>` for more information (:issue:`15475`, :issue:`15506`)
- :func:`DataFrame.to_parquet` now accepts ``index`` as an argument, allowing
the user to override the engine's default behavior to include or omit the
dataframe's indexes from the resulting Parquet file. (:issue:`20768`)
Expand Down
42 changes: 37 additions & 5 deletions pandas/core/indexes/frozen.py
Original file line number Diff line number Diff line change
Expand Up @@ -23,15 +23,47 @@ class FrozenList(PandasObject, list):
because it's technically non-hashable, will be used
for lookups, appropriately, etc.
"""
# Sidenote: This has to be of type list, otherwise it messes up PyTables
# typechecks
# Side note: This has to be of type list. Otherwise,
# it messes up PyTables type checks.

def __add__(self, other):
def union(self, other):
"""
Returns a FrozenList with other concatenated to the end of self.

Parameters
----------
other : array-like
The array-like whose elements we are concatenating.

Returns
-------
diff : FrozenList
The collection difference between self and other.
"""
if isinstance(other, tuple):
other = list(other)
return self.__class__(super(FrozenList, self).__add__(other))
return type(self)(super(FrozenList, self).__add__(other))

def difference(self, other):
"""
Returns a FrozenList with elements from other removed from self.

Parameters
----------
other : array-like
The array-like whose elements we are removing self.

Returns
-------
diff : FrozenList
The collection difference between self and other.
"""
other = set(other)
temp = [x for x in self if x not in other]
return type(self)(temp)

__iadd__ = __add__
# TODO: Consider deprecating these in favor of `union` (xref gh-15506)
__add__ = __iadd__ = union

# Python 2 compat
def __getslice__(self, i, j):
Expand Down
23 changes: 20 additions & 3 deletions pandas/tests/indexes/test_frozen.py
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@ class TestFrozenList(CheckImmutable, CheckStringMixin):
mutable_methods = ('extend', 'pop', 'remove', 'insert')
unicode_container = FrozenList([u("\u05d0"), u("\u05d1"), "c"])

def setup_method(self, method):
def setup_method(self, _):
self.lst = [1, 2, 3, 4, 5]
self.container = FrozenList(self.lst)
self.klass = FrozenList
Expand All @@ -25,13 +25,30 @@ def test_add(self):
expected = FrozenList([1, 2, 3] + self.lst)
self.check_result(result, expected)

def test_inplace(self):
def test_iadd(self):
q = r = self.container

q += [5]
self.check_result(q, self.lst + [5])
# other shouldn't be mutated

# Other shouldn't be mutated.
self.check_result(r, self.lst)

def test_union(self):
result = self.container.union((1, 2, 3))
expected = FrozenList(self.lst + [1, 2, 3])
self.check_result(result, expected)

def test_difference(self):
result = self.container.difference([2])
expected = FrozenList([1, 3, 4, 5])
self.check_result(result, expected)

def test_difference_dupe(self):
result = FrozenList([1, 2, 3, 2]).difference([2])
expected = FrozenList([1, 3])
self.check_result(result, expected)


class TestFrozenNDArray(CheckImmutable, CheckStringMixin):
mutable_methods = ('put', 'itemset', 'fill')
Expand Down