Skip to content

BUG: Concat returns inconsistent MultiIndex #44786

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
2 of 3 tasks
pkohlmann opened this issue Dec 6, 2021 · 4 comments · Fixed by #45822
Closed
2 of 3 tasks

BUG: Concat returns inconsistent MultiIndex #44786

pkohlmann opened this issue Dec 6, 2021 · 4 comments · Fixed by #45822
Assignees
Labels
good first issue Needs Tests Unit test(s) needed to prevent regressions Reshaping Concat, Merge/Join, Stack/Unstack, Explode
Milestone

Comments

@pkohlmann
Copy link

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • I have confirmed this bug exists on the master branch of pandas.

Reproducible Example

import pandas as pd

df1 = pd.DataFrame({'col': ['a','b','c']}, index=['1','2','2'])
df2 = pd.concat([df1], keys=['X'])
print(df2.index)
print(f"Index of df2 has duplicates: {df2.index.has_duplicates}")
print(f"Index of df2 is unique: {df2.index.is_unique}")

Issue Description

DataFrame df1 has an index with duplicates, i.e. the index of df1 is not unique. When df1 is used as the only DataFrame in pandas's conat(), i.e. the concat() is applied to a single DataFrame, the resulting DataFrame has a MultiiIndex which has duplicates but the metadata of this MultiIndex say it is unique and does not have duplicates. When concat() is applied to a list of two or more DataFrames the resulting MultiIndex is correct. The scenario with only one DataFrame takes a short-cut for creating the MultiIndex which apparently has a flaw.
Deploying concat() on a single dataframe is a valid scenario which ,for example, can occur in a GroupBy.apply(). The inconsistency in the MultiIndex makes other functionality like Index.drop_duplicates() fail because they rely on the metadata of MultiIndex.

Expected Behavior

Concat() returns a DataFrame with a consistent MultiIndex when deployed to a single DataFrame with duplicates.

Installed Versions

2021-12-06T12:40:32.184 DEBUG matplotlib.wrapper (private) matplotlib data path: /Users/kohlmann/PycharmProjects/env3.7/lib/python3.7/site-packages/matplotlib/mpl-data
2021-12-06T12:40:32.185 DEBUG matplotlib.wrapper matplotlib data path: /Users/kohlmann/PycharmProjects/env3.7/lib/python3.7/site-packages/matplotlib/mpl-data
2021-12-06T12:40:32.197 DEBUG matplotlib.wrapper CONFIGDIR=/Users/kohlmann/.matplotlib
2021-12-06T12:40:32.203 DEBUG matplotlib. matplotlib version 3.3.2
2021-12-06T12:40:32.203 DEBUG matplotlib. interactive is False
2021-12-06T12:40:32.204 DEBUG matplotlib. platform is darwin
2021-12-06T12:40:32.204 DEBUG matplotlib. loaded modules: ['sys', 'builtins', '_frozen_importlib', '_imp', '_thread', '_warnings', '_weakref', 'zipimport', '_frozen_importlib_external', '_io', 'marshal', 'posix', 'encodings', 'codecs', '_codecs', 'encodings.aliases', 'encodings.utf_8', '_signal', 'main', 'encodings.latin_1', 'io', 'abc', '_abc', 'site', 'os', 'stat', '_stat', '_collections_abc', 'posixpath', 'genericpath', 'os.path', '_sitebuiltins', '_bootlocale', '_locale', '_virtualenv', 'functools', '_functools', 'collections', 'operator', '_operator', 'keyword', 'heapq', '_heapq', 'itertools', 'reprlib', '_collections', 'importlib', 'importlib._bootstrap', 'importlib._bootstrap_external', 'types', 'warnings', 'importlib.abc', 'importlib.machinery', 'importlib.util', 'contextlib', 'google', 'mpl_toolkits', 'wiotp', 'weakref', '_weakrefset', '_pydevd_bundle', '_pydevd_bundle.pydevd_collect_try_except_info', 'opcode', '_opcode', 'dis', 'atexit', 'traceback', 'linecache', 'tokenize', 're', 'enum', 'sre_compile', '_sre', 'sre_parse', 'sre_constants', 'copyreg', 'token', '_pydevd_bundle.pydevd_constants', 'future', 'platform', 'subprocess', 'time', 'signal', 'errno', '_posixsubprocess', 'select', 'selectors', 'collections.abc', 'math', 'threading', '_pydevd_bundle.pydevd_vm_type', '_pydev_imps', '_pydev_imps._pydev_saved_modules', 'socket', '_socket', 'queue', '_queue', 'xmlrpc', 'xmlrpc.client', 'base64', 'struct', '_struct', 'binascii', 'datetime', '_datetime', 'decimal', 'numbers', '_decimal', 'http', 'http.client', 'email', 'email.parser', 'email.feedparser', 'email.errors', 'email._policybase', 'email.header', 'email.quoprimime', 'string', '_string', 'email.base64mime', 'email.charset', 'email.encoders', 'quopri', 'email.utils', 'random', 'hashlib', '_hashlib', '_blake2', '_sha3', 'bisect', '_bisect', '_random', 'urllib', 'urllib.parse', 'email._parseaddr', 'calendar', 'locale', 'email.message', 'uu', 'email._encoded_words', 'email.iterators', 'ssl', '_ssl', 'xml', 'xml.parsers', 'xml.parsers.expat', 'pyexpat.errors', 'pyexpat.model', 'pyexpat', 'xml.parsers.expat.model', 'xml.parsers.expat.errors', 'gzip', 'zlib', '_compression', 'xmlrpc.server', 'http.server', 'copy', 'html', 'html.entities', 'mimetypes', 'shutil', 'fnmatch', 'bz2', '_bz2', 'lzma', '_lzma', 'pwd', 'grp', 'socketserver', 'inspect', 'pydoc', 'pkgutil', 'sysconfig', '_sysconfigdata_m_darwin_darwin', '_osx_support', 'fcntl', '_pydev_bundle', '_pydev_bundle.fix_getpass', '_pydev_bundle.pydev_imports', '_pydev_imps._pydev_execfile', '_pydevd_bundle.pydevd_exec2', '_pydev_bundle.pydev_log', '_pydev_bundle._pydev_filesystem_encoding', '_pydev_bundle.pydev_is_thread_alive', '_pydevd_bundle.pydevd_io', 'pydevd_tracing', 'ctypes', '_ctypes', 'ctypes._endian', '_pydev_bundle.pydev_monkey', '_pydevd_bundle.pydevd_utils', 'pydevd_file_utils', '_pydevd_bundle.pydevd_comm_constants', 'json', 'json.decoder', 'json.scanner', '_json', 'json.encoder', '_pydevd_bundle.pydevd_vars', 'pickle', '_compat_pickle', '_pickle', '_pydevd_bundle.pydevd_custom_frames', '_pydevd_bundle.pydevd_xml', '_pydevd_bundle.pydevd_extension_utils', 'pydevd_plugins', 'pkg_resources', 'zipfile', 'plistlib', 'tempfile', 'textwrap', 'ntpath', 'pkg_resources.extern', 'pkg_resources._vendor', 'pkg_resources._vendor.six', 'pkg_resources.extern.six', 'pkg_resources._vendor.six.moves', 'pkg_resources.extern.six.moves', 'pkg_resources._vendor.appdirs', 'pkg_resources.extern.appdirs', 'pkg_resources._vendor.packaging', 'pkg_resources._vendor.packaging.about', 'pkg_resources.extern.packaging', 'pkg_resources.extern.packaging.version', 'pkg_resources.extern.packaging._structures', 'pkg_resources.extern.packaging.specifiers', 'pkg_resources.extern.packaging._compat', 'pkg_resources.extern.packaging.requirements', 'pkg_resources._vendor.pyparsing', 'pprint', 'pkg_resources.extern.pyparsing', 'pkg_resources._vendor.six.moves.urllib', 'pkg_resources.extern.six.moves.urllib', 'pkg_resources.extern.packaging.markers', 'pkg_resources.py2_warn', 'pydevd_plugins.extensions', '_pydevd_bundle.pydevd_resolver', '_pydevd_bundle.pydevd_extension_api', '_pydevd_bundle.pydevd_save_locals', '_pydev_bundle.pydev_override', '_pydevd_bundle.pydevd_breakpoints', '_pydevd_bundle.pydevd_import_class', '_pydevd_bundle.pydevd_frame_utils', '_pydevd_bundle.pydevd_comm', '_pydevd_bundle.pydevd_console_integration', 'code', 'codeop', '_pydev_bundle.pydev_code_executor', '_pydev_bundle._pydev_calltip_util', '_pydev_bundle._pydev_imports_tipper', '_pydev_bundle._pydev_tipper_common', '_pydev_bundle.pydev_stdin', '_pydev_bundle.pydev_console_types', '_pydevd_bundle.pydevd_console_pytest', '_pydevd_bundle.pydevd_bytecode_utils', '_pydev_bundle._pydev_completer', '_pydevd_bundle.pydevd_tables', '_pydevd_bundle.pydevd_console', '_pydevd_bundle.pydevd_dont_trace_files', '_pydevd_bundle.pydevd_kill_all_pydevd_threads', '_pydevd_bundle.pydevd_trace_dispatch', '_pydevd_bundle.pydevd_cython_wrapper', '_pydevd_bundle.pydevd_additional_thread_info_regular', '_pydevd_bundle.pydevd_frame', '_pydevd_bundle.pydevd_dont_trace', '_pydevd_bundle.pydevd_signature', 'trace', 'gc', '_pydevd_bundle.pydevd_cython_darwin_37_64', '_cython_0_29_21', 'cython_runtime', '_pydevd_bundle.pydevd_cython', '_pydevd_frame_eval', '_pydevd_frame_eval.pydevd_frame_eval_main', '_pydevd_frame_eval.pydevd_frame_eval_cython_wrapper', '_pydevd_frame_eval.pydevd_frame_evaluator_common', '_pydevd_frame_eval.pydevd_frame_tracing', '_pydevd_frame_eval.pydevd_modify_bytecode', '_pydevd_bundle.pydevd_additional_thread_info', '_pydevd_frame_eval.pydevd_frame_evaluator_darwin_37_64', 'pydevd_concurrency_analyser', 'pydevd_concurrency_analyser.pydevd_concurrency_logger', 'pydevd_concurrency_analyser.pydevd_thread_wrappers', 'asyncio', 'asyncio.base_events', 'concurrent', 'concurrent.futures', 'concurrent.futures._base', 'logging', 'asyncio.constants', 'asyncio.coroutines', 'asyncio.base_futures', 'asyncio.format_helpers', 'asyncio.log', 'asyncio.events', 'contextvars', '_contextvars', 'asyncio.base_tasks', '_asyncio', 'asyncio.futures', 'asyncio.protocols', 'asyncio.sslproto', 'asyncio.transports', 'asyncio.tasks', 'asyncio.locks', 'asyncio.runners', 'asyncio.queues', 'asyncio.streams', 'asyncio.subprocess', 'asyncio.unix_events', 'asyncio.base_subprocess', 'asyncio.selector_events', 'pydev_ipython', '_pydevd_bundle.pydevd_breakpointhook', '_pydevd_bundle.pydevd_plugin_utils', '_pydevd_bundle.pydevd_trace_api', 'pydevd_plugins.django_debug', 'pydevd_plugins.jinja2_debug', 'pydevd_plugins.extensions.types', 'pydevd_plugins.extensions.types.pydevd_plugin_numpy_types', 'pydevd_plugins.extensions.types.pydevd_helpers', 'pydevd_plugins.extensions.types.pydevd_plugins_django_form_str', '_pydevd_bundle.pydevd_command_line_handling', 'getpass', 'termios', '_pydevd_bundle.pydevd_process_net_command', '_pydevd_bundle.pydevd_traceproperty', '_pydev_bundle.pydev_monkey_qt', 'analytics_service_cloud_functions', 'pydevd', 'pydev_ipython.matplotlibtools', 'pydev_ipython.inputhook', 'runpy', 'numpy', 'numpy._globals', 'numpy.config', 'numpy.version', 'numpy._distributor_init', 'numpy.core', 'numpy.core.multiarray', 'numpy.core.overrides', 'numpy.core._multiarray_umath', 'numpy.compat', 'numpy.compat._inspect', 'numpy.compat.py3k', 'pathlib', 'numpy.core.umath', 'numpy.core.numerictypes', 'numpy.core._string_helpers', 'numpy.core._type_aliases', 'numpy.core._dtype', 'numpy.core.numeric', 'numpy.core.shape_base', 'numpy.core._asarray', 'numpy.core.fromnumeric', 'numpy.core._methods', 'numpy.core._exceptions', 'numpy.core._ufunc_config', 'numpy.core.arrayprint', 'numpy.core.defchararray', 'numpy.core.records', 'numpy.core.memmap', 'numpy.core.function_base', 'numpy.core.machar', 'numpy.core.getlimits', 'numpy.core.einsumfunc', 'numpy.core._add_newdocs', 'numpy.core._multiarray_tests', 'numpy.core._dtype_ctypes', 'numpy.core._internal', 'numpy._pytesttester', 'numpy.lib', 'numpy.lib.mixins', 'numpy.lib.scimath', 'numpy.lib.type_check', 'numpy.lib.ufunclike', 'numpy.lib.index_tricks', 'numpy.matrixlib', 'numpy.matrixlib.defmatrix', 'ast', '_ast', 'numpy.linalg', 'numpy.linalg.linalg', 'numpy.lib.twodim_base', 'numpy.linalg.lapack_lite', 'numpy.linalg._umath_linalg', 'numpy.lib.function_base', 'numpy.lib.histograms', 'numpy.lib.stride_tricks', 'numpy.lib.nanfunctions', 'numpy.lib.shape_base', 'numpy.lib.polynomial', 'numpy.lib.utils', 'numpy.lib.arraysetops', 'numpy.lib.npyio', 'numpy.lib.format', 'numpy.lib._datasource', 'numpy.lib._iotools', 'numpy.lib.financial', 'numpy.lib.arrayterator', 'numpy.lib.arraypad', 'numpy.lib._version', 'numpy.fft', 'numpy.fft._pocketfft', 'numpy.fft._pocketfft_internal', 'numpy.fft.helper', 'numpy.polynomial', 'numpy.polynomial.polynomial', 'numpy.polynomial.polyutils', 'numpy.polynomial._polybase', 'numpy.polynomial.chebyshev', 'numpy.polynomial.legendre', 'numpy.polynomial.hermite', 'numpy.polynomial.hermite_e', 'numpy.polynomial.laguerre', 'numpy.random', 'numpy.random._pickle', 'numpy.random.mtrand', 'numpy.random._bit_generator', '_cython_0_29_19', 'numpy.random._common', 'secrets', 'hmac', 'numpy.random._bounded_integers', 'numpy.random._mt19937', 'numpy.random._philox', 'numpy.random._pcg64', 'numpy.random._sfc64', 'numpy.random._generator', 'numpy.ctypeslib', 'numpy.ma', 'numpy.ma.core', 'numpy.ma.extras', 'pandas', 'pytz', 'pytz.exceptions', 'pytz.lazy', 'pytz.tzinfo', 'pytz.tzfile', 'dateutil', 'dateutil._version', 'pandas.compat', 'pandas._typing', 'mmap', 'typing', 'typing.io', 'typing.re', 'pandas.compat.numpy', 'pandas.util', 'pandas.util._decorators', 'pandas._libs', 'pandas._libs.interval', '_cython_0_29_24', 'pandas._libs.hashtable', 'pandas._libs.missing', 'pandas._libs.tslibs', 'pandas._libs.tslibs.dtypes', 'pandas._libs.tslibs.conversion', 'pandas._libs.tslibs.base', 'pandas._libs.tslibs.nattype', 'pandas._libs.tslibs.np_datetime', 'pandas._libs.tslibs.timezones', 'dateutil.tz', 'dateutil.tz.tz', 'six', 'six.moves', 'dateutil.tz._common', 'dateutil.tz._factories', 'pandas._libs.tslibs.tzconversion', 'pandas._libs.tslibs.ccalendar', 'pandas._libs.tslibs.parsing', 'pandas._libs.tslibs.offsets', 'pandas._libs.tslibs.timedeltas', 'pandas._libs.tslibs.fields', 'pandas._config', 'pandas._config.config', 'pandas._config.dates', 'pandas._config.display', 'pandas._config.localization', 'pandas._libs.tslibs.strptime', 'pandas._libs.tslibs.timestamps', 'dateutil.easter', 'dateutil.relativedelta', 'dateutil._common', 'pandas._libs.properties', 'dateutil.parser', 'dateutil.parser._parser', 'dateutil.parser.isoparser', 'pandas._libs.tslibs.period', 'pandas._libs.tslibs.vectorized', 'pandas._libs.ops_dispatch', 'pandas._libs.algos', 'pandas.core', 'pandas.core.util', 'pandas.core.util.hashing', 'pandas._libs.lib', 'pandas._libs.tslib', 'pandas._libs.hashing', 'pandas.core.dtypes', 'pandas.core.dtypes.common', 'pandas.core.dtypes.base', 'pandas.errors', 'pandas.core.dtypes.generic', 'pandas.core.dtypes.dtypes', 'pandas.core.dtypes.inference', 'pandas.util.version', 'pandas.compat.pyarrow', 'pyarrow', 'pyarrow._generated_version', 'pyarrow.lib', 'pyarrow.util', 'pyarrow.hdfs', 'pyarrow.filesystem', 'pyarrow.ipc', 'pyarrow.serialization', 'pyarrow.types', 'pandas.core.config_init', 'pandas.core.api', 'pandas.core.dtypes.missing', 'pandas.core.algorithms', 'pandas.core.dtypes.cast', 'pandas.util._exceptions', 'pandas.util._validators', 'pandas.core.array_algos', 'pandas.core.array_algos.take', 'pandas.core.construction', 'pandas.core.common', 'pandas.core.indexers', 'pandas.core.arrays', 'pandas.core.arrays.base', 'pandas.compat.numpy.function', 'pandas.core.missing', 'pandas.compat._optional', 'pandas.core.ops', 'pandas.core.roperator', 'pandas.core.ops.array_ops', 'pandas._libs.ops', 'pandas.core.computation', 'pandas.core.computation.expressions', 'pandas.core.computation.check', 'pandas.core.ops.missing', 'pandas.core.ops.dispatch', 'pandas.core.ops.invalid', 'pandas.core.ops.common', 'pandas.core.ops.docstrings', 'pandas.core.ops.mask_ops', 'pandas.core.ops.methods', 'pandas.core.sorting', 'pandas.core.arrays.boolean', 'pandas.core.arrays.masked', 'pandas.core.nanops', 'pandas.core.array_algos.masked_reductions', 'pandas.core.arraylike', 'pandas.core.arrays.categorical', 'csv', '_csv', 'pandas._libs.arrays', 'pandas.core.accessor', 'pandas.core.arrays._mixins', 'pandas.core.array_algos.transforms', 'pandas.core.base', 'pandas.core.strings', 'pandas.core.strings.accessor', 'pandas.core.strings.base', 'pandas.core.strings.object_array', 'unicodedata', 'pandas.io', 'pandas.io.formats', 'pandas.io.formats.console', 'pandas.core.arrays.datetimes', 'pandas.core.arrays.datetimelike', 'pandas.tseries', 'pandas.tseries.frequencies', 'pandas.core.arrays._ranges', 'pandas.core.arrays.integer', 'pandas.core.arrays.numeric', 'pandas.core.tools', 'pandas.core.tools.numeric', 'pandas.tseries.offsets', 'pandas.core.arrays.floating', 'pandas.core.arrays.interval', 'pandas.core.indexes', 'pandas.core.indexes.base', 'pandas._libs.index', 'pandas.libs.join', 'pandas.core.dtypes.concat', 'pandas.core.arrays.sparse', 'pandas.core.arrays.sparse.accessor', 'pandas.core.arrays.sparse.array', 'pandas.libs.sparse', 'pandas.core.arrays.sparse.dtype', 'pandas.io.formats.printing', 'pandas.core.array_algos.putmask', 'pandas.core.indexes.frozen', 'pandas.core.arrays.numpy', 'pandas.core.arrays.period', 'pandas.core.arrays.string', 'pandas.core.arrays.string_arrow', 'pyarrow.compute', 'pyarrow._compute', 'pandas.core.arrays.timedeltas', 'pandas.core.flags', 'pandas.core.groupby', 'pandas.core.groupby.generic', 'pandas._libs.reduction', 'pandas.core.aggregation', 'pandas.core.indexes.api', 'pandas.core.indexes.category', 'pandas.core.indexes.extension', 'pandas.core.indexes.datetimes', 'pandas.core.indexes.datetimelike', 'pandas.core.indexes.numeric', 'pandas.core.tools.timedeltas', 'pandas.core.tools.times', 'pandas.core.indexes.interval', 'pandas.core.indexes.multi', 'pandas.core.indexes.timedeltas', 'pandas.core.indexes.period', 'pandas.core.indexes.range', 'pandas.core.apply', 'pandas.core.frame', 'pandas.core.generic', 'pandas.core.indexing', 'pandas._libs.indexing', 'pandas.core.describe', 'pandas.core.reshape', 'pandas.core.reshape.concat', 'pandas.core.internals', 'pandas.core.internals.api', 'pandas._libs.internals', 'pandas.core.internals.blocks', 'pandas._libs.writers', 'pandas.core.array_algos.quantile', 'pandas.core.array_algos.replace', 'pandas.core.internals.array_manager', 'pandas.core.internals.base', 'pandas.core.internals.concat', 'pandas.core.internals.managers', 'pandas.core.internals.ops', 'pandas.io.formats.format', 'pandas.io.common', 'dataclasses', 'pandas.core.internals.construction', 'pandas.core.shared_docs', 'pandas.core.window', 'pandas.core.window.ewm', 'pandas._libs.window', 'pandas.libs.window.aggregations', 'pandas.core.util.numba', 'pandas.core.window.common', 'pandas.core.window.doc', 'pandas.core.window.indexers', 'pandas.libs.window.indexers', 'pandas.core.window.numba', 'pandas.core.window.online', 'pandas.core.window.rolling', 'pandas.core.window.expanding', 'pandas.core.reshape.melt', 'pandas.core.reshape.util', 'pandas.core.series', 'pandas._libs.reshape', 'pandas.core.indexes.accessors', 'pandas.core.tools.datetimes', 'pandas.arrays', 'pandas.plotting', 'pandas.plotting._core', 'pandas.plotting._misc', 'pandas.io.formats.info', 'pandas.core.groupby.base', 'pandas.core.groupby.groupby', 'pandas.libs.groupby', 'pandas.core.groupby.numba', 'pandas.core.groupby.ops', 'pandas.core.groupby.grouper', 'pandas.core.groupby.categorical', 'pandas.tseries.api', 'pandas.core.computation.api', 'pandas.core.computation.eval', 'pandas.core.computation.engines', 'pandas.core.computation.align', 'pandas.core.computation.common', 'pandas.core.computation.expr', 'pandas.core.computation.ops', 'pandas.core.computation.scope', 'pandas.compat.chainmap', 'pandas.core.computation.parsing', 'pandas.core.reshape.api', 'pandas.core.reshape.merge', 'pandas.core.reshape.pivot', 'pandas.core.reshape.reshape', 'pandas.core.reshape.tile', 'pandas.api', 'pandas.api.extensions', 'pandas.api.indexers', 'pandas.api.types', 'pandas.core.dtypes.api', 'pandas.util._print_versions', 'pandas.io.api', 'pandas.io.clipboards', 'pandas.io.excel', 'pandas.io.excel._base', 'pandas._libs.parsers', 'pandas.io.excel._util', 'pandas.io.parsers', 'pandas.io.parsers.readers', 'pandas.io.parsers.base_parser', 'pandas.io.date_converters', 'pandas.io.parsers.c_parser_wrapper', 'pandas.io.parsers.python_parser', 'pandas.io.excel._odfreader', 'pandas.io.excel._openpyxl', 'pandas.io.excel._pyxlsb', 'pandas.io.excel._xlrd', 'pandas.io.excel._odswriter', 'pandas._libs.json', 'pandas.io.formats.excel', 'pandas.io.formats._color_data', 'pandas.io.formats.css', 'pandas.io.excel._xlsxwriter', 'pandas.io.excel._xlwt', 'pandas.io.feather_format', 'pandas.io.gbq', 'pandas.io.html', 'pandas.io.json', 'pandas.io.json._json', 'pandas.io.json._normalize', 'pandas.io.json._table_schema', 'pandas.io.orc', 'pandas.io.parquet', 'pandas.io.pickle', 'pandas.compat.pickle_compat', 'pandas.io.pytables', 'pandas.core.computation.pytables', 'pandas.io.sas', 'pandas.io.sas.sasreader', 'pandas.io.spss', 'pandas.io.sql', 'pandas.io.stata', 'pandas.io.xml', 'pandas.util._tester', 'pandas.testing', 'pandas._testing', 'pandas._testing._io', 'pandas._testing._random', 'pandas._testing.contexts', 'pandas._testing._warnings', 'pandas._testing.asserters', 'pandas._libs.testing', 'cmath', 'pandas._testing.compat', 'pandas._version', 'pyarrow.parquet', 'pyarrow._parquet', 'pyarrow.fs', 'pyarrow._fs', 'pyarrow._hdfs', 'pyarrow._s3fs', 'logging.config', 'logging.handlers', 'dill', 'dill.info', 'dill._dill', '_pyio', 'dill.settings', 'dill.source', 'dill.temp', 'dill.detect', 'dill.pointers', 'dill.objtypes', 'urllib3', 'urllib3.exceptions', 'urllib3.packages', 'urllib3.packages.ssl_match_hostname', 'urllib3.packages.six', 'urllib3.packages.six.moves', 'urllib3.packages.six.moves.http_client', 'urllib3._version', 'urllib3.connectionpool', 'urllib3.connection', 'urllib3.util', 'urllib3.util.connection', 'urllib3.contrib', 'urllib3.contrib.appengine_environ', 'urllib3.util.wait', 'urllib3.util.request', 'urllib3.util.response', 'urllib3.util.retry', 'urllib3.util.ssl', 'urllib3.util.url', 'urllib3.util.ssltransport', 'urllib3.util.timeout', 'urllib3.util.proxy', 'urllib3._collections', 'urllib3.request', 'urllib3.filepost', 'urllib3.fields', 'urllib3.packages.six.moves.urllib', 'urllib3.packages.six.moves.urllib.parse', 'urllib3.response', 'urllib3.util.queue', 'urllib3.poolmanager', 'certifi', 'certifi.core', 'importlib.resources', 'tabulate', 'pip', 'setuptools', 'distutils', 'distutils.core', 'distutils.debug', 'distutils.errors', 'distutils.dist', 'distutils.fancy_getopt', 'getopt', 'gettext', 'distutils.util', 'distutils.dep_util', 'distutils.spawn', 'distutils.log', 'distutils.sysconfig', 'distutils.cmd', 'distutils.dir_util', 'distutils.file_util', 'distutils.archive_util', 'distutils.config', 'configparser', 'distutils.extension', 'distutils.filelist', 'setuptools._deprecation_warning', 'setuptools.extern', 'setuptools._vendor', 'setuptools._vendor.six', 'setuptools.extern.six', 'setuptools._vendor.six.moves', 'setuptools.extern.six.moves', 'setuptools.version', 'setuptools.extension', 'setuptools.monkey', 'setuptools.dist', 'distutils.version', 'setuptools._vendor.packaging', 'setuptools._vendor.packaging.about', 'setuptools.extern.packaging', 'setuptools._vendor.ordered_set', 'setuptools.extern.ordered_set', 'setuptools.windows_support', 'setuptools.config', 'setuptools.extern.packaging.version', 'setuptools.extern.packaging._structures', 'setuptools.extern.packaging.specifiers', 'setuptools.extern.packaging._compat', 'setuptools.depends', 'setuptools.py33compat', 'array', 'html.parser', '_markupbase', 'setuptools.py27compat', 'setuptools._imp', 'setuptools.py34compat', 'setuptools.msvc', 'distutils.ccompiler', 'lxml', 'lxml.etree', '_cython_0_29_22', 'lxml._elementpath', 'psycopg2', 'psycopg2.errors', 'psycopg2._psycopg', 'psycopg2.tz', 'psycopg2.extensions', 'psycopg2._json', 'psycopg2.compat', 'psycopg2._range', 'matplotlib', 'matplotlib.cbook', 'shlex', 'matplotlib.cbook.deprecation', 'matplotlib.rcsetup', 'matplotlib.animation', 'uuid', '_uuid', 'matplotlib._animation_data', 'matplotlib.fontconfig_pattern', 'pyparsing', 'pyparsing.util', 'pyparsing.exceptions', 'pyparsing.unicode', 'pyparsing.actions', 'pyparsing.core', 'pyparsing.results', 'pyparsing.helpers', 'pyparsing.testing', 'pyparsing.common', 'matplotlib.colors', 'matplotlib.docstring', 'matplotlib._color_data', 'cycler', 'matplotlib._version', 'matplotlib.ft2font', 'kiwisolver']
INSTALLED VERSIONS

commit : 945c9ed
python : 3.7.12.final.0
python-bits : 64
OS : Darwin
OS-release : 21.1.0
Version : Darwin Kernel Version 21.1.0: Wed Oct 13 17:33:23 PDT 2021; root:xnu-8019.41.5~1/RELEASE_X86_64
machine : x86_64
processor : i386
byteorder : little
LC_ALL : None
LANG : None
LOCALE : None.UTF-8
pandas : 1.3.4
numpy : 1.18.5
pytz : 2021.3
dateutil : 2.8.2
pip : 21.1.2
setuptools : 47.3.1
Cython : None
pytest : None
hypothesis : None
sphinx : None
blosc : None
feather : None
xlsxwriter : None
lxml.etree : 4.6.3
html5lib : None
pymysql : None
psycopg2 : 2.8.6 (dt dec pq3 ext lo64)
jinja2 : None
IPython : None
pandas_datareader: None
bs4 : None
bottleneck : None
fsspec : None
fastparquet : None
gcsfs : None
matplotlib : 3.3.2
numexpr : None
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : 3.0.0
pyxlsb : None
s3fs : None
scipy : 1.5.0
sqlalchemy : 1.3.17
tables : None
tabulate : 0.8.5
xarray : None
xlrd : None
xlwt : None
numba : 0.54.1

@pkohlmann pkohlmann added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Dec 6, 2021
@phofl
Copy link
Member

phofl commented Dec 10, 2021

This is fixed on master. May need tests

MultiIndex([('X', '1'),
            ('X', '2'),
            ('X', '2')],
           )
Index of df2 has duplicates: True
Index of df2 is unique: False

@phofl phofl added good first issue Needs Tests Unit test(s) needed to prevent regressions Reshaping Concat, Merge/Join, Stack/Unstack, Explode and removed Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Dec 10, 2021
@Ganeshpadmanaban
Copy link

Hi, @phofl can you please check if the fix works in versions 1.3.4 & 1.3.5? I get the same results as you in 1.3.3, but the bug persists in latter versions.

Also, I would like to pick it up as my first issue if it's okay

@sukriti1
Copy link
Contributor

take

@simonjayhawkins
Copy link
Member

removing milestone

@simonjayhawkins simonjayhawkins removed this from the 1.4 milestone Jan 20, 2022
@jreback jreback added this to the 1.5 milestone Feb 5, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue Needs Tests Unit test(s) needed to prevent regressions Reshaping Concat, Merge/Join, Stack/Unstack, Explode
Projects
None yet
6 participants