[WIP] Test (and more fixes) for duplicate indices with concat #38745

ivirshup · 2020-12-28T07:16:22Z

closes pd.concat inconsistent with non-unique index #31308, pd.concat() crashes if dataframe contains duplicate indices but not df.join() #36263
tests added / passed
passes black pandas
passes git diff upstream/master -u -- "*.py" | flake8 --diff
whatsnew entry

Follow up to #38654, currently WIP.

It's a work in progress because it turns out union and intersection act differently for different index types. intersection seems mostly normal, except for IntervalIndex #38743), while union is sometimes a set union (for Index) and sometimes keeps duplicates (for many Index subclasses, though order matters sometimes).

This effects concat in get_objs_combined_axis. Once I figured out where the problems were coming from, I figured I could maybe side step this by adding checks to get_objs_combined_axis for equality and uniqueness. All checks pass reshape on my machine, but I'm done working on this for the day, so I'll rely on CI to do the checks on the rest of the codebase.

This test could be expanded with:

More types of indexes (there is no object index currently)
Checks for commutativity, i.e. results should have set-equal columns regardless of order
Interactions with other arguments (sort, axis?)

TODO:

Move test for concat where duplicates don't error
Decide whether error should be DuplicateLabelErrors
Have error report which duplicates are used
Check that error reports which duplicates are used

pandas/core/indexes/api.py

pandas/tests/reshape/concat/test_concat.py

jreback · 2020-12-28T16:43:32Z

pandas/core/indexes/api.py

@@ -89,6 +89,19 @@ def get_objs_combined_axis(
    Index
    """
    obs_idxes = [obj._get_axis(axis) for obj in objs]
+    if all_indexes_same(obs_idxes):


this is not the place to do at all. should be in concat itself.

ideally we want a nice error message if this happens (and the indices are not equal).

I agree I need to add the error message, but I do think this function is where the error should be thrown.

Every function which uses get_objs_combined_axis throws an error if there are duplicate indices. Here is a demonstration of this from pandas 1.1.5:

import numpy as np import pandas as pd import pandas._testing as tm def list_of_series_constructor(*args): return pd.DataFrame(list(args)) def crosstab(*args): return pd.crosstab(*args) def concat_series(*args): return pd.concat(list(args), axis=1) def concat_dfs(*args): return pd.concat([pd.DataFrame(x) for x in args], axis=1) def duplicated_indices_errors(func, index_makers): records = [] for i, index_maker in enumerate(index_makers): rec = {"index_maker": index_maker.__name__} uniq = index_maker(k=4) non_uniq = uniq[[0, 0, 1, 2]] s_uniq = pd.Series(np.ones(len(uniq)), index=uniq) s_non_uniq = pd.Series(np.ones(len(non_uniq)), index=non_uniq) try: res = func(s_uniq, s_non_uniq) except Exception as e: rec["error_type"] = str(type(e)) rec["error_message"] = str(e) records.append(rec) return records funcs = [list_of_series_constructor, crosstab, concat_series, concat_dfs] index_makers = [ tm.makeStringIndex, tm.makeIntIndex, tm.makeUIntIndex, tm.makeFloatIndex, tm.makeDateIndex, tm.makeTimedeltaIndex, tm.makePeriodIndex, tm.makeMultiIndex, tm.makeBoolIndex, ] pd.concat( { f.__name__: pd.DataFrame(duplicated_indices_errors(f, index_makers)) for f in funcs } ).to_markdown()

index_maker error_type error_message

('list_of_series_constructor', 0) makeStringIndex <class 'pandas.errors.InvalidIndexError'> Reindexing only valid with uniquely valued Index objects

('list_of_series_constructor', 1) makeIntIndex <class 'pandas.errors.InvalidIndexError'> Reindexing only valid with uniquely valued Index objects

('list_of_series_constructor', 2) makeUIntIndex <class 'pandas.errors.InvalidIndexError'> Reindexing only valid with uniquely valued Index objects

('list_of_series_constructor', 3) makeFloatIndex <class 'pandas.errors.InvalidIndexError'> Reindexing only valid with uniquely valued Index objects

('list_of_series_constructor', 4) makeDateIndex <class 'pandas.errors.InvalidIndexError'> Reindexing only valid with uniquely valued Index objects

('list_of_series_constructor', 5) makeTimedeltaIndex <class 'pandas.errors.InvalidIndexError'> Reindexing only valid with uniquely valued Index objects

('list_of_series_constructor', 6) makePeriodIndex <class 'pandas.errors.InvalidIndexError'> Reindexing only valid with uniquely valued Index objects

('list_of_series_constructor', 7) makeMultiIndex <class 'ValueError'> Reindexing only valid with uniquely valued Index objects

('list_of_series_constructor', 8) makeBoolIndex <class 'pandas.errors.InvalidIndexError'> Reindexing only valid with uniquely valued Index objects

('crosstab', 0) makeStringIndex <class 'ValueError'> cannot reindex from a duplicate axis

('crosstab', 1) makeIntIndex <class 'ValueError'> cannot reindex from a duplicate axis

('crosstab', 2) makeUIntIndex <class 'ValueError'> cannot reindex from a duplicate axis

('crosstab', 3) makeFloatIndex <class 'ValueError'> cannot reindex from a duplicate axis

('crosstab', 4) makeDateIndex <class 'ValueError'> cannot reindex from a duplicate axis

('crosstab', 5) makeTimedeltaIndex <class 'ValueError'> cannot reindex from a duplicate axis

('crosstab', 6) makePeriodIndex <class 'ValueError'> cannot reindex from a duplicate axis

('crosstab', 7) makeMultiIndex <class 'ValueError'> cannot handle a non-unique multi-index!

('crosstab', 8) makeBoolIndex <class 'ValueError'> cannot reindex from a duplicate axis

('concat_series', 0) makeStringIndex <class 'ValueError'> cannot reindex from a duplicate axis

('concat_series', 1) makeIntIndex <class 'ValueError'> cannot reindex from a duplicate axis

('concat_series', 2) makeUIntIndex <class 'ValueError'> cannot reindex from a duplicate axis

('concat_series', 3) makeFloatIndex <class 'ValueError'> cannot reindex from a duplicate axis

('concat_series', 4) makeDateIndex <class 'ValueError'> cannot reindex from a duplicate axis

('concat_series', 5) makeTimedeltaIndex <class 'ValueError'> cannot reindex from a duplicate axis

('concat_series', 6) makePeriodIndex <class 'ValueError'> cannot reindex from a duplicate axis

('concat_series', 7) makeMultiIndex <class 'ValueError'> cannot handle a non-unique multi-index!

('concat_series', 8) makeBoolIndex <class 'ValueError'> cannot reindex from a duplicate axis

('concat_dfs', 0) makeStringIndex <class 'ValueError'> Shape of passed values is (5, 2), indices imply (4, 2)

('concat_dfs', 1) makeIntIndex <class 'ValueError'> Shape of passed values is (7, 2), indices imply (5, 2)

('concat_dfs', 2) makeUIntIndex <class 'ValueError'> Shape of passed values is (7, 2), indices imply (5, 2)

('concat_dfs', 3) makeFloatIndex <class 'ValueError'> Shape of passed values is (7, 2), indices imply (5, 2)

('concat_dfs', 4) makeDateIndex <class 'ValueError'> Shape of passed values is (7, 2), indices imply (5, 2)

('concat_dfs', 5) makeTimedeltaIndex <class 'ValueError'> Shape of passed values is (7, 2), indices imply (5, 2)

('concat_dfs', 6) makePeriodIndex <class 'ValueError'> Shape of passed values is (7, 2), indices imply (5, 2)

('concat_dfs', 7) makeMultiIndex <class 'ValueError'> cannot handle a non-unique multi-index!

('concat_dfs', 8) makeBoolIndex <class 'ValueError'> Shape of passed values is (4, 2), indices imply (2, 2)

Instead of just giving concat a better error message, all these methods could get better (and more consistent) error messages. I think it's important for the rules about unions and intersections of indices to be defined in these more internal methods so behaviour isn't defined (and reimplemented) per top level function.

Additionally, this function essentially returns nonsense for unions when there are duplicated indices since different Index types have different definitions for unions with duplicates, so I think it's appropriate to throw errors here instead of passing on those nonsense results.

This has also made me realize there is one case where it's okay for there to be duplicates, which is when indices contain duplicates, but the intersection would not.

This has also made me realize there is one case where it's okay for there to be duplicates, which is when indices contain duplicates, but the intersection would not.

This got a bit more complicated. idx.get_indexer(keys) throws an error if idx is non-unique, regardless of whether keys have unique indices in idx.

Interestingly enough, it does not throw an error if the idx is non-unique, but keys is empty (which is the case I had initially seen).

Examples using pandas 1.1.5:

pd.crosstab( pd.Series(np.arange(4), index=[0, 1, 1, 2]), pd.Series(np.arange(2), index=[0, 2]), ) # ValueError: cannot reindex from a duplicate axis pd.crosstab( pd.Series(np.arange(4), index=[0, 1, 1, 2]), pd.Series(np.arange(2), index=[3, 4]), ) # Empty DataFrame # Columns: [] # Index: []

Examples with pd.concat

This had the same behaviour despite not using .get_indexer directly in 1.1.5.

pd.concat( [ pd.Series(np.arange(4), index=[0, 1, 1, 2]), pd.Series(np.arange(2), index=[0, 2]), ], axis=1, join="inner" ) # ValueError: cannot reindex from a duplicate axis pd.concat( [ pd.Series(np.arange(4), index=[0, 1, 1, 2]), pd.Series(np.arange(2), index=[3, 4]), ], axis=1, join="inner" ) # Empty DataFrame # Columns: [0, 1] # Index: []

Opened a more general issue for this: #38797

pep8speaks · 2020-12-29T06:05:27Z

Hello @ivirshup! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2020-12-30 03:49:30 UTC

This fixes a previous issue where `get_objs_combined_axis` would not throw and error for duplicated indices in all cases. Now we check to make sure that the duplicated values don't show up in the intersection. This allows safe indexing when there are duplicated values, but they don't show up in the intersection.

ivirshup · 2020-12-29T09:16:12Z

Should the errors here be DuplicateLabelErrors?

jreback

cc @jbrockmendel

jreback · 2020-12-29T15:19:05Z

pandas/tests/indexes/test_setops.py

@@ -463,3 +466,33 @@ def test_setop_with_categorical(index, sort, method):
    result = getattr(index, method)(other[:5], sort=sort)
    expected = getattr(index, method)(index[:5], sort=sort)
    tm.assert_index_equal(result, expected)
+
+
+@pytest.mark.parametrize("index_maker", tm.index_subclass_makers_generator())


use the index fixture instead

How should I handle cases where that fixture gives inappropriate input?

Should I define a separate fixture for this? Can I do something like a hypothesis.assume to make sure some of the cases don't get through and aren't marked as skipped?

The cases which won't work here are at least:

"empty"

"repeats"

its kind of ugly, but in many of these tests we do something like

def test_foo(index_fixture): if whatever(index_fixture): pytest.skip()

(id be psyched to see a hypothesis-based implementation of some of our tests, need a way to generate any-valid-index)

I've switched to the fixture, but it adds a lot of skipped tests. I would be nice if I could filter this before collection, or use a fixture where I can specify some features of the construction (e.g. values are unique and it has a length of n)

jreback · 2020-12-29T15:19:46Z

pandas/tests/indexes/test_setops.py

+
+    result = get_objs_combined_axis(series, intersect=True)
+
+    tm.assert_index_equal(full[[0, 2]], result, check_order=False)


always use
result =
expected =
tm.assert_index(result, expected)

why is check_order=False?

check_order is false because the MultiIndex comes back in a different order.

More conceptually, I don't think this function promises any ordering unless you pass sort.

pandas/tests/indexes/test_setops.py

pandas/tests/reshape/concat/test_concat.py

jreback · 2020-12-29T15:20:24Z

pandas/tests/reshape/concat/test_concat.py

+@pytest.mark.parametrize("join", ["inner", "outer"])
+def test_concat_duplicates_error(index_maker, join):
+    # if index_maker is tm.makeMultiIndex:
+    # TODO: This generator only makes Indexs of size 4


what is this about?

pandas/pandas/_testing/__init__.py

Lines 491 to 492 in 0976c4c

def makeMultiIndex(k=10, names=None, **kwargs):

return MultiIndex.from_product((("foo", "bar"), (1, 2)), names=names, **kwargs)

This function only makes a MultiIndex with length 4, ignoring the k parameter

jreback · 2020-12-29T15:20:48Z

pandas/tests/reshape/concat/test_concat.py

+
+
+@pytest.mark.parametrize("index_maker", tm.index_subclass_makers_generator())
+@pytest.mark.xfail(reason="Not implemented")


is there an issue for this? what is this case?

I was expecting this to be allowed:

import pandas as pd result = pd.concat( [ pd.Series(0, index=[0, 0, 1, 2]), pd.Series(1, index=[1, 2]), ], join="inner", ) expected = pd.DataFrame({0: [0, 0], 1: [1, 1]}, index=[1, 2]) pd.testing.assert_frame_equal(result, expected)

Because the intersection of those indices is well defined. However, it turns out this does not work, and also doesn't work in 1.1.5. I sort of opened this issue here: #38773, but that was a more low-level issue.

jreback · 2020-12-29T15:21:03Z

pandas/tests/reshape/concat/test_concat.py

+@pytest.mark.parametrize("index_maker", tm.index_subclass_makers_generator())
+@pytest.mark.parametrize("join", ["inner", "outer"])
+def test_concat_duplicates_error(index_maker, join):
+    # if index_maker is tm.makeMultiIndex:


add the issue number as a comment in all of the new tests

pandas/tests/reshape/concat/test_concat.py

jreback · 2020-12-29T15:21:32Z

pandas/core/indexes/api.py

@@ -135,13 +135,20 @@ def _get_combined_index(
    indexes = _get_distinct_objs(indexes)
    if len(indexes) == 0:
        index = Index([])
-    elif len(indexes) == 1:
+    elif len(indexes) == 1 or all_indexes_same(indexes):


add some comments here as it is non-obvious what is happening

How about this?

Suggested change

elif len(indexes) == 1 or all_indexes_same(indexes):

# if unique by id or unique by value

elif len(indexes) == 1 or all_indexes_same(indexes):

jbrockmendel · 2020-12-30T03:23:58Z

because it turns out union and intersection act differently for different index types

@phofl has been working on this recently, might shed some light on the intended behavior medium-long term

phofl · 2020-12-30T09:06:12Z

union and intersection should both be unique, but as you mentioned this may not be the case for all Index types right now.

Additionally to your issue #38623 also is problematic. Do you have a list of Index classes where this is not the case for union?

phofl · 2020-12-30T22:47:19Z

Put up #38834 to fix the IntervalIndex.intersection bug

ivirshup · 2021-01-04T02:23:23Z

Do you have a list of Index classes where this is not the case for union?

@phofl, let me know if there is an open issue I should post this to. It seems like it's most of them except MultiIndex (but don't know if this depends on the "inner types"). For Index it depends on the order. RangeIndex is a bit of a different case since it should always be unique – I think.

Code used to check

import pandas as pd
import numpy as np

import pandas._testing as tm

index_makers = [
    tm.makeStringIndex,
    tm.makeIntIndex,
    tm.makeUIntIndex,
    tm.makeFloatIndex,
    tm.makeDateIndex,
    tm.makeTimedeltaIndex,
    tm.makePeriodIndex,
    tm.makeMultiIndex,
    tm.makeBoolIndex,
]

records = []

union = lambda x, y: x.union(y)
intersection = lambda x, y: x.intersection(y)

# union = lambda x, y: pd.core.indexes.api._get_combined_index([x, y], intersect=False)
# intersection = lambda x, y: pd.core.indexes.api._get_combined_index([x, y], intersect=True)

for index_maker in index_makers:
    idx1 = index_maker(k=4)
    idx2 = idx1[np.array([0, 0, 1, 2, 3])]
    assert type(idx1) == type(idx2)
    rec = {"index_type": type(idx1).__name__, "index_generator": index_maker.__name__}
    
    # idx_intersect = intersection(idx1, idx2)
    # rec["intersect_len"] = len(idx_intersect)
    # rec["intersect_commutative"] = idx_intersect.sort_values().equals(intersection(idx2, idx1).sort_values())

    idx_union = union(idx1, idx2)
    rec["union_len"] = len(idx_union)
    rec["union_commutative"] = idx_union.sort_values().equals(union(idx2, idx1).sort_values())

    rec["unique_len"] = len(idx_union.unique())
    records.append(rec)

print(pd.DataFrame.from_records(records).to_markdown(index=False))

index_type	index_generator	union_len	union_commutative	unique_len
Index	makeStringIndex	4	False	4
Int64Index	makeIntIndex	5	True	4
UInt64Index	makeUIntIndex	5	True	4
Float64Index	makeFloatIndex	5	True	4
DatetimeIndex	makeDateIndex	5	True	4
TimedeltaIndex	makeTimedeltaIndex	5	True	4
PeriodIndex	makePeriodIndex	5	True	4
MultiIndex	makeMultiIndex	4	True	4
Index	makeBoolIndex	4	False	2

This is using 1.3.0.dev0+218.g3a066feb08

phofl · 2021-01-04T12:39:39Z

Sorry I said something wrong, intersection should be unique, union not.
#36299 will make this consistent for every class except MultiIndex. Will look into this

| index_type     | index_generator    |   union_len | union_commutative   |   unique_len |
|:---------------|:-------------------|------------:|:--------------------|-------------:|
| Index          | makeStringIndex    |           5 | True                |            4 |
| Int64Index     | makeIntIndex       |           5 | True                |            4 |
| UInt64Index    | makeUIntIndex      |           5 | True                |            4 |
| Float64Index   | makeFloatIndex     |           5 | True                |            4 |
| DatetimeIndex  | makeDateIndex      |           5 | True                |            4 |
| TimedeltaIndex | makeTimedeltaIndex |           5 | True                |            4 |
| PeriodIndex    | makePeriodIndex    |           5 | True                |            4 |
| MultiIndex     | makeMultiIndex     |           4 | True                |            4 |
| Index          | makeBoolIndex      |           7 | True                |            2 |

This is what I am getting on top of my pr.

The bool Index case might be fishy, but you get this on master too, if the BoolIndex is sorted

idx1 = Index([False, False, False, True])
idx2 = Index([False, False, False, False, True])

idx_union = idx1.union(idx2)
idx_union

returns

Index([False, False, False, False, False, False, True], dtype='object')

which is consistent with non-Bool duplictaes on both sides like

idx1 = Index([0, 1, 1])
idx2 = Index([0, 1, 1, 2])

idx_union = idx1.union(idx2)
Int64Index([0, 1, 1, 1, 2], dtype='int64')

But not consistent if both are equal

idx1 = Index([0, 1, 1])
idx2 = Index([0, 1, 1])

idx_union = idx1.union(idx2)
Int64Index([0, 1, 1], dtype='int64')

Probably worth looking into how to avoid this when performing the outer join in union.

ivirshup · 2021-01-05T01:26:29Z

Why wouldn't unions have unique values?

And related to the bools: I think – as you note – most of the issues here are from bugs dealing with the order of the values. I think it would be good to add a testing strategy which permutes how duplicates occur (e.g. at the start, at the end, in the middle, next to each other, combinations of these cases) to check this.

ivirshup · 2021-01-05T07:02:04Z

Currently running into issues with intersections of timestamp objects. They seem to be losing their offset field sometimes, not sure what's up with this yet.

phofl · 2021-01-05T11:23:44Z

Among others the union case is discussed here: #31326

I think a logical interpretation of a union is, that the union is the smallest index, so that both indexes are part of this index.

Currently the union is handled differently based on if the indexes are monotonic or not. My pr aims to handle both cases the same and resort afterwards, this would leave us there with more consistency. The unwanted duplicates described above can be solved thereafter I think.

simonjayhawkins · 2021-01-06T13:21:41Z

This PR closes #36263 which is milestoned 1.2.1 whereas this PR is milestoned 1.3. what's the best way to resolve this inconsistency?

jreback · 2021-01-06T13:23:30Z

this for 1.3

github-actions · 2021-02-06T00:12:24Z

This pull request is stale because it has been open for thirty days with no activity. Please update or respond to this comment if you're still interested in working on this.

simonjayhawkins · 2021-05-24T17:27:34Z

@ivirshup closing as stale. ping if you want to continue.

Initial test and fix (WIP)

c6f1677

arw2019 reviewed Dec 28, 2020

View reviewed changes

pandas/core/indexes/api.py Outdated Show resolved Hide resolved

jreback requested changes Dec 28, 2020

View reviewed changes

jreback added Error Reporting Incorrect or improved errors from pandas Reshaping Concat, Merge/Join, Stack/Unstack, Explode labels Dec 28, 2020

jreback added this to the 1.3 milestone Dec 28, 2020

ivirshup added 4 commits December 29, 2020 16:21

Move dataframe def out of error check

19c95f0

Formatting error

bb33098

Test allowing duplicate indices if they aren't in intersection (xfail)

11378a8

Better error messages + organization for duplicate errors

ca316ce

ivirshup mentioned this pull request Dec 29, 2020

API: idx.get_indexer(keys) fails if idx is non-unique, even if keys in idx are unique #38773

Open

jreback requested changes Dec 29, 2020

View reviewed changes

Fixes from review and CI

f176ad3

ivirshup mentioned this pull request Dec 30, 2020

API/ ENH: Unambiguous indexing should be allowed, even if duplicates are present #38797

Open

phofl mentioned this pull request Jan 5, 2021

BUG: MultiIndex.union dropping duplicates from result #38977

Merged

4 tasks

github-actions bot added the Stale label Feb 6, 2021

simonjayhawkins closed this May 24, 2021

simonjayhawkins removed this from the 1.3 milestone May 24, 2021

	index_maker	error_type	error_message
('list_of_series_constructor', 0)	makeStringIndex	<class 'pandas.errors.InvalidIndexError'>	Reindexing only valid with uniquely valued Index objects
('list_of_series_constructor', 1)	makeIntIndex	<class 'pandas.errors.InvalidIndexError'>	Reindexing only valid with uniquely valued Index objects
('list_of_series_constructor', 2)	makeUIntIndex	<class 'pandas.errors.InvalidIndexError'>	Reindexing only valid with uniquely valued Index objects
('list_of_series_constructor', 3)	makeFloatIndex	<class 'pandas.errors.InvalidIndexError'>	Reindexing only valid with uniquely valued Index objects
('list_of_series_constructor', 4)	makeDateIndex	<class 'pandas.errors.InvalidIndexError'>	Reindexing only valid with uniquely valued Index objects
('list_of_series_constructor', 5)	makeTimedeltaIndex	<class 'pandas.errors.InvalidIndexError'>	Reindexing only valid with uniquely valued Index objects
('list_of_series_constructor', 6)	makePeriodIndex	<class 'pandas.errors.InvalidIndexError'>	Reindexing only valid with uniquely valued Index objects
('list_of_series_constructor', 7)	makeMultiIndex	<class 'ValueError'>	Reindexing only valid with uniquely valued Index objects
('list_of_series_constructor', 8)	makeBoolIndex	<class 'pandas.errors.InvalidIndexError'>	Reindexing only valid with uniquely valued Index objects
('crosstab', 0)	makeStringIndex	<class 'ValueError'>	cannot reindex from a duplicate axis
('crosstab', 1)	makeIntIndex	<class 'ValueError'>	cannot reindex from a duplicate axis
('crosstab', 2)	makeUIntIndex	<class 'ValueError'>	cannot reindex from a duplicate axis
('crosstab', 3)	makeFloatIndex	<class 'ValueError'>	cannot reindex from a duplicate axis
('crosstab', 4)	makeDateIndex	<class 'ValueError'>	cannot reindex from a duplicate axis
('crosstab', 5)	makeTimedeltaIndex	<class 'ValueError'>	cannot reindex from a duplicate axis
('crosstab', 6)	makePeriodIndex	<class 'ValueError'>	cannot reindex from a duplicate axis
('crosstab', 7)	makeMultiIndex	<class 'ValueError'>	cannot handle a non-unique multi-index!
('crosstab', 8)	makeBoolIndex	<class 'ValueError'>	cannot reindex from a duplicate axis
('concat_series', 0)	makeStringIndex	<class 'ValueError'>	cannot reindex from a duplicate axis
('concat_series', 1)	makeIntIndex	<class 'ValueError'>	cannot reindex from a duplicate axis
('concat_series', 2)	makeUIntIndex	<class 'ValueError'>	cannot reindex from a duplicate axis
('concat_series', 3)	makeFloatIndex	<class 'ValueError'>	cannot reindex from a duplicate axis
('concat_series', 4)	makeDateIndex	<class 'ValueError'>	cannot reindex from a duplicate axis
('concat_series', 5)	makeTimedeltaIndex	<class 'ValueError'>	cannot reindex from a duplicate axis
('concat_series', 6)	makePeriodIndex	<class 'ValueError'>	cannot reindex from a duplicate axis
('concat_series', 7)	makeMultiIndex	<class 'ValueError'>	cannot handle a non-unique multi-index!
('concat_series', 8)	makeBoolIndex	<class 'ValueError'>	cannot reindex from a duplicate axis
('concat_dfs', 0)	makeStringIndex	<class 'ValueError'>	Shape of passed values is (5, 2), indices imply (4, 2)
('concat_dfs', 1)	makeIntIndex	<class 'ValueError'>	Shape of passed values is (7, 2), indices imply (5, 2)
('concat_dfs', 2)	makeUIntIndex	<class 'ValueError'>	Shape of passed values is (7, 2), indices imply (5, 2)
('concat_dfs', 3)	makeFloatIndex	<class 'ValueError'>	Shape of passed values is (7, 2), indices imply (5, 2)
('concat_dfs', 4)	makeDateIndex	<class 'ValueError'>	Shape of passed values is (7, 2), indices imply (5, 2)
('concat_dfs', 5)	makeTimedeltaIndex	<class 'ValueError'>	Shape of passed values is (7, 2), indices imply (5, 2)
('concat_dfs', 6)	makePeriodIndex	<class 'ValueError'>	Shape of passed values is (7, 2), indices imply (5, 2)
('concat_dfs', 7)	makeMultiIndex	<class 'ValueError'>	cannot handle a non-unique multi-index!
('concat_dfs', 8)	makeBoolIndex	<class 'ValueError'>	Shape of passed values is (4, 2), indices imply (2, 2)


		result = get_objs_combined_axis(series, intersect=True)

		tm.assert_index_equal(full[[0, 2]], result, check_order=False)

	def makeMultiIndex(k=10, names=None, **kwargs):
	return MultiIndex.from_product((("foo", "bar"), (1, 2)), names=names, **kwargs)



		@pytest.mark.parametrize("index_maker", tm.index_subclass_makers_generator())
		@pytest.mark.xfail(reason="Not implemented")

	elif len(indexes) == 1 or all_indexes_same(indexes):
	# if unique by id or unique by value
	elif len(indexes) == 1 or all_indexes_same(indexes):

Uh oh!

[WIP] Test (and more fixes) for duplicate indices with concat #38745

[WIP] Test (and more fixes) for duplicate indices with concat #38745

Uh oh!

Conversation

ivirshup commented Dec 28, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

pep8speaks commented Dec 29, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Comment last updated at 2020-12-30 03:49:30 UTC

Uh oh!

ivirshup commented Dec 29, 2020

Uh oh!

jreback left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jbrockmendel commented Dec 30, 2020

Uh oh!

phofl commented Dec 30, 2020

Uh oh!

phofl commented Dec 30, 2020

Uh oh!

ivirshup commented Jan 4, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

phofl commented Jan 4, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ivirshup commented Jan 5, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ivirshup commented Jan 5, 2021

Uh oh!

phofl commented Jan 5, 2021

Uh oh!

ivirshup commented Dec 28, 2020 •

edited

Loading

pep8speaks commented Dec 29, 2020 •

edited

Loading

ivirshup commented Jan 4, 2021 •

edited

Loading

phofl commented Jan 4, 2021 •

edited

Loading

ivirshup commented Jan 5, 2021 •

edited

Loading