Skip to content

ENH: MultiIndex.from_frame #23141

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 52 commits into from
Dec 9, 2018
Merged
Show file tree
Hide file tree
Changes from 10 commits
Commits
Show all changes
52 commits
Select commit Hold shift + click to select a range
79bdecb
ENH - add from_frame method and accompanying squeeze method to multii…
sds9995 Oct 13, 2018
fa82618
ENH - guarentee that order of labels is preserved in multiindex to_fr…
sds9995 Oct 13, 2018
64b45d6
CLN - adhere to PEP8 line length
sds9995 Oct 13, 2018
64c7bb1
CLN - remove trailing whitespace
sds9995 Oct 13, 2018
3ee676c
ENH - raise TypeError on inappropriate input
sds9995 Oct 16, 2018
fd266f5
TST - add tests for mi.from_frame and mi.squeeze
sds9995 Oct 16, 2018
4bc8f5b
CLN - pep8 adherence in tests
sds9995 Oct 16, 2018
9d92b70
CLN - last missed pep8 fix
sds9995 Oct 16, 2018
45595ad
BUG - remove pd.DataFrame in favor of local import
ms7463 Oct 16, 2018
3530cd3
DOC - add more detailed docstrings for from_frame and squeeze
sds9995 Oct 18, 2018
1c22791
DOC - update MultiIndex.from_frame and squeeze doctests to comply wit…
sds9995 Oct 28, 2018
cf78780
CLN - cleanup docstrings and source
sds9995 Oct 28, 2018
64c2750
TST - reorganize some of the multiindex tests
sds9995 Oct 28, 2018
ede030b
CLN - adhere to pep8 line length
sds9995 Oct 28, 2018
190c341
BUG - ensure dtypes are preserved in from_frame and to_frame
sds9995 Nov 3, 2018
e0df632
TST - add tests for ensuring dtype fidelity and custom names for from…
sds9995 Nov 3, 2018
78ff5c2
CLN - pep8 adherence
sds9995 Nov 3, 2018
0252db9
DOC - add examples and change order of kwargs for from_frame
sds9995 Nov 3, 2018
d98c8a9
TST - parameterize tests
sds9995 Nov 3, 2018
8a1906e
CLN - pep8 adherence
sds9995 Nov 3, 2018
08c120f
CLN - pep8 adherence
sds9995 Nov 3, 2018
8353c3f
DOC/CLN - add versionadded tags, add to whatsnew page, and clean up i…
sds9995 Nov 4, 2018
9df3c11
CLN - squeeze -> _squeeze
sds9995 Nov 10, 2018
6d4915e
DOC - squeeze -> _squeeze in whatsnew
ms7463 Nov 10, 2018
b5df7b2
BUG - allow repeat column names in from_frame, and falsey column name…
sds9995 Nov 11, 2018
ab3259c
DOC - whatsnew formatting
sds9995 Nov 11, 2018
cf95261
TST - reorganize and add tests for more incompatible from_frame types
sds9995 Nov 11, 2018
63051d7
Merge branch 'enhancement/from_frame' of https://github.com/ArtinSarr…
sds9995 Nov 11, 2018
a75a4a5
CLN - remove squeeze tests
sds9995 Nov 12, 2018
8d23df9
CLN - remove squeeze parameter from from_frame
sds9995 Nov 12, 2018
c8d696d
Merge branch 'master' into enhancement/from_frame
sds9995 Nov 12, 2018
7cf82d1
TST - remove callable name option
sds9995 Nov 12, 2018
1a282e5
ENH - from_data initial commit
sds9995 Nov 14, 2018
b3c6a90
DOC - reduce whatsnew entry for to_frame
sds9995 Nov 19, 2018
c760359
CLN/DOC - add examples to from_frame docstring and make code more rea…
sds9995 Nov 19, 2018
bb69314
Merge branch 'master' into enhancement/from_frame
sds9995 Nov 19, 2018
9e11180
TST - use OrderedDict for dataframe construction
sds9995 Nov 20, 2018
96c6af3
Merge branch 'master' into enhancement/from_frame
sds9995 Nov 28, 2018
a5236bf
CLN - clean up code and use pytest.raises
sds9995 Dec 1, 2018
c78f364
Merge branch 'master' into enhancement/from_frame
sds9995 Dec 1, 2018
14bfea8
DOC - move to_frame breaking changes to backwards incompatible sectio…
sds9995 Dec 2, 2018
6960804
Merge branch 'master' into enhancement/from_frame
ms7463 Dec 2, 2018
11c5947
Merge branch 'master' into enhancement/from_frame
sds9995 Dec 4, 2018
904644a
Merge branch 'enhancement/from_frame' of https://github.com/ArtinSarr…
sds9995 Dec 4, 2018
30fe0df
DOC - add advanced.rst section
sds9995 Dec 5, 2018
ec60563
Merge branch 'master' into enhancement/from_frame
sds9995 Dec 5, 2018
8fc6609
Merge branch 'master' into enhancement/from_frame
sds9995 Dec 6, 2018
9b906c6
DOC/CLN - cleanup documentation
sds9995 Dec 6, 2018
e416122
CLN - fix linting error according to pandas-dev.pandas test
sds9995 Dec 6, 2018
4ef9ec4
DOC - fix docstrings
sds9995 Dec 7, 2018
4240a1e
CLN - fix import order with isort
sds9995 Dec 7, 2018
9159b2d
Merge branch 'master' into enhancement/from_frame
sds9995 Dec 7, 2018
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
71 changes: 66 additions & 5 deletions pandas/core/indexes/multi.py
Original file line number Diff line number Diff line change
Expand Up @@ -1189,11 +1189,11 @@ def to_frame(self, index=True, name=None):
else:
idx_names = self.names

result = DataFrame({(name or level):
self._get_level_values(level)
for name, level in
zip(idx_names, range(len(self.levels)))},
copy=False)
result = DataFrame(
list(self),
columns=[n or i for i, n in enumerate(idx_names)]
)

if index:
result.index = self
return result
Expand Down Expand Up @@ -1412,6 +1412,45 @@ def from_product(cls, iterables, sortorder=None, names=None):
labels = cartesian_product(labels)
return MultiIndex(levels, labels, sortorder=sortorder, names=names)

@classmethod
def from_frame(cls, df, squeeze=True):
"""
Make a MultiIndex from a dataframe
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Finish with period. Run ./scripts/validate_docstrings.py pandas.MultiIndex.from_frame to make sure the docstring follows all the standards we validate.

Use DataFrame instead of dataframe.


Parameters
----------
df : pd.DataFrame
DataFrame to be converted to MultiIndex
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you finish with a period.

squeeze : bool
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add the default.

If df is a single column, squeeze multiindex to be a regular index.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

multiiindex --> MultiIndex


Returns
-------
index : MultiIndex
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no need for giving a name, just leave the type. Add a description in the next line (indented).


Examples
--------
>>> df = pd.DataFrame([[0, u'green'], [0, u'purple'], [1, u'green'],
[1, u'purple'], [2, u'green'], [2, u'purple']],
columns=[u'number', u'color'])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is missing the ... for multine commands. Also show the content of df after creating it.

>>> pd.MultiIndex.from_frame(df)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you put a blank line between cases

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

adding a comment on the case if its not obvious

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed other examples as I have removed the callable names /squeeze parameter

MultiIndex(levels=[[0, 1, 2], [u'green', u'purple']],
labels=[[0, 0, 1, 1, 2, 2], [0, 1, 0, 1, 0, 1]],
names=[u'number', u'color'])

"""
from pandas import DataFrame
# just let column level names be the tuple of the meta df columns
# since they're not required to be strings
if not isinstance(df, DataFrame):
raise TypeError("Input must be a DataFrame")
columns = list(df)
mi = cls.from_tuples(list(df.values), names=columns)
Copy link
Contributor

@TomAugspurger TomAugspurger Nov 2, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, I would expect that the dtypes are preserved here, but it seems like they aren't.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will try out something along these lines when I’m at a computer.

pd.MultiIndex.from_arrays([df[x] for x in df])

if squeeze:
return mi.squeeze()
else:
return mi
Copy link
Member

@gfyoung gfyoung Oct 28, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Personal nit: change the four lines of squeeze logic to one:

return mi.squeeze() if squeeze else mi


def _sort_levels_monotonic(self):
"""
.. versionadded:: 0.20.0
Expand Down Expand Up @@ -1474,6 +1513,28 @@ def _sort_levels_monotonic(self):
names=self.names, sortorder=self.sortorder,
verify_integrity=False)

def squeeze(self):
"""
Squeeze a single level multiindex to be a regular Index instane. If
the MultiIndex is more than a single level, return a copy of the
MultiIndex.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you run validate_docstrings.py for this docstring, and also read the contributing docstring documentation too. There are several issues.


Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we need to make this a public method.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changed squeeze -> _squeeze

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am re-thinking if this should be public, see #22866

Returns
-------
index : Index | MultiIndex

Examples
--------
>>> mi = pd.MultiIndex.from_tuples([('a',), ('b',), ('c',)])
>>> mi.squeeze()
Index(['a', 'b', 'c'], dtype='object')

"""
if len(self.levels) == 1:
return self.levels[0][self.labels[0]]
else:
return self.copy()

def remove_unused_levels(self):
"""
Create a new MultiIndex from the current that removes
Expand Down
42 changes: 42 additions & 0 deletions pandas/tests/indexes/multi/test_constructor.py
Original file line number Diff line number Diff line change
Expand Up @@ -472,3 +472,45 @@ def test_from_tuples_with_tuple_label():
idx = pd.MultiIndex.from_tuples([(2, 1), (4, (1, 2))], names=('a', 'b'))
result = pd.DataFrame([2, 3], columns=['c'], index=idx)
tm.assert_frame_equal(expected, result)


def test_from_frame():
expected = pd.MultiIndex.from_tuples([('a', 'a'), ('a', 'b'), ('a', 'c'),
('b', 'a'), ('b', 'b'), ('c', 'a'),
('c', 'b')],
names=['L1', 'L2'])
df = pd.DataFrame([['a', 'a'], ['a', 'b'], ['a', 'c'], ['b', 'a'],
['b', 'b'], ['c', 'a'], ['c', 'b']],
columns=['L1', 'L2'])
result = pd.MultiIndex.from_frame(df)
tm.assert_index_equal(expected, result)


def test_from_frame():
df = pd.DataFrame([['a', 'a'], ['a', 'b'], ['b', 'a'], ['b', 'b']],
columns=['L1', 'L2'])
expected = pd.MultiIndex.from_tuples([('a', 'a'), ('a', 'b'),
('b', 'a'), ('b', 'b')],
names=['L1', 'L2'])
result = pd.MultiIndex.from_frame(df)
tm.assert_index_equal(expected, result)


def test_from_frame_with_squeeze():
df = pd.DataFrame([['a'], ['a'], ['b'], ['b']], columns=['L1'])
expected = pd.Index(['a', 'a', 'b', 'b'], name='L1')
result = pd.MultiIndex.from_frame(df)
tm.assert_index_equal(expected, result)


def test_from_frame_with_no_squeeze():
df = pd.DataFrame([['a'], ['a'], ['b'], ['b']], columns=['L1'])
expected = pd.MultiIndex.from_tuples([('a',), ('a',), ('b',), ('b',)],
names=['L1'])
result = pd.MultiIndex.from_frame(df, squeeze=False)
tm.assert_index_equal(expected, result)


def test_from_frame_non_frame():
with pytest.raises(TypeError):
pd.MultiIndex.from_frame([1, 2, 3, 4])
15 changes: 15 additions & 0 deletions pandas/tests/indexes/multi/test_conversion.py
Original file line number Diff line number Diff line change
Expand Up @@ -169,3 +169,18 @@ def test_to_series_with_arguments(idx):
assert s.values is not idx.values
assert s.index is not idx
assert s.name != idx.name


def test_squeeze():
mi = pd.MultiIndex.from_tuples([('a',), ('a',), ('b',), ('b',)],
names=['L1'])
expected = pd.Index(['a', 'a', 'b', 'b'], name='L1')
result = mi.squeeze()
tm.assert_index_equal(expected, result)

mi = pd.MultiIndex.from_tuples([('a', 'a'), ('a', 'b'), ('b', 'a'),
('b', 'b')],
names=['L1', 'L2'])
expected = mi.copy()
result = mi.squeeze()
tm.assert_index_equal(expected, result)