Skip to content

Abbreviate MultiIndex representation #12423

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
tdoughty1 opened this issue Feb 23, 2016 · 9 comments · Fixed by #22511
Closed

Abbreviate MultiIndex representation #12423

tdoughty1 opened this issue Feb 23, 2016 · 9 comments · Fixed by #22511
Labels
Milestone

Comments

@tdoughty1
Copy link

We are using pandas to do analysis on a moderately large size dataset (10s of millions of events). In the process, we create a MultiIndex object in order to store the data. While I was debugging, I repeatedly ran into an issue where the MultiIndex object tried to print out the entire very long array. I've included code for a short test I did just to confirm the behavior.

Ideally, I think it would be good for the string representation of the MultiIndex object to have similar abbreviation strategy to the Series and DataFrame objects it indexes.

Code Sample, a copy-pastable example if possible

>>> import pandas as pd
>>> index1=range(100)
>>> index2=range(100)
>>> testIndex = pd.MultiIndex.from_arrays([index1, index2])

Expected Output

MultiIndex(levels=[[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99], [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99]],
           labels=[[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99], [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99]])

output of pd.show_versions()

INSTALLED VERSIONS
------------------
commit: None
python: 2.7.11.final.0
python-bits: 64
OS: Linux
OS-release: 3.13.0-77-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8

pandas: 0.17.1
nose: 1.3.7
pip: 8.0.2
setuptools: 18.0.1
Cython: 0.23.4
numpy: 1.10.2
scipy: 0.16.1
statsmodels: 0.6.1
IPython: 4.0.3
sphinx: 1.3.4
patsy: 0.4.0
dateutil: 2.4.2
pytz: 2015.7
blosc: None
bottleneck: None
tables: 3.2.2
numexpr: 2.4.4
matplotlib: 1.5.1
openpyxl: 2.3.2
xlrd: 0.9.4
xlwt: 1.0.0
xlsxwriter: 0.8.2
lxml: 3.5.0
bs4: 4.3.2
html5lib: None
httplib2: 0.9.1
apiclient: 1.4.1
sqlalchemy: 1.0.12
pymysql: None
psycopg2: None
Jinja2: None
@jorisvandenbossche
Copy link
Member

@tdoughty1 Thanks for the report

The single index follows the pd.options.display.max_seq_items option (default of 100 items to show), but apparently this setting is not used for a MultiIndex

@jorisvandenbossche jorisvandenbossche added the Output-Formatting __repr__ of pandas objects, to_string label Feb 23, 2016
@jorisvandenbossche jorisvandenbossche changed the title Abbreviate Index string representations Abbreviate MulitIndex representation Feb 23, 2016
@jorisvandenbossche jorisvandenbossche changed the title Abbreviate MulitIndex representation Abbreviate MultiIndex representation Feb 23, 2016
@jreback
Copy link
Contributor

jreback commented Feb 23, 2016

yes this was noted in #9901

its a pretty straightforward fix to truncate these.

@tdoughty1
Copy link
Author

I think I found a similar issue with a pandas.core.internal.BlockManager object that was printed out in our debugging scripts.

@jreback
Copy link
Contributor

jreback commented Feb 23, 2016

the BlockManager doesn't have any special repr. I suppose you could add this on, but in general people shouldn't be messing around there :)

@jreback
Copy link
Contributor

jreback commented Feb 23, 2016

The indexes have a nice well-defined way of doing repr. MultiIndex was never 'fixed' to not print long lines when I fixed procedure.

@tdoughty1
Copy link
Author

Ok, thanks. I'm trying to figure out why the Block manager is printing out in the first place right now.

@jreback
Copy link
Contributor

jreback commented Feb 23, 2016

this actually I think is a trivial change in pandas.indexes.multi.MultiIndex._format_attrs

@tdoughty1
Copy link
Author

After a little checking I found out the entire Block manager is printed out in full when running the logging library in DEBUG mode. This may be the desired functionality.

@jreback
Copy link
Contributor

jreback commented Feb 23, 2016

we don't have a DEBUG mode. yeh if you actually look at the BlockManager objects then they are simply printed out. Normally you shouldn't do this anyhow. However easy to wrap the index displays for this.

jreback added a commit to jreback/pandas that referenced this issue Apr 26, 2016
@jreback jreback modified the milestones: 0.18.2, 0.18.1 Apr 26, 2016
@jreback jreback modified the milestones: 0.20.0, 0.19.0 Jul 13, 2016
@jreback jreback modified the milestones: Next Major Release, 0.20.0 Jul 13, 2016
topper-123 pushed a commit to topper-123/pandas that referenced this issue May 20, 2018
topper-123 pushed a commit to topper-123/pandas that referenced this issue May 20, 2018
topper-123 pushed a commit to topper-123/pandas that referenced this issue May 20, 2018
topper-123 pushed a commit to topper-123/pandas that referenced this issue May 20, 2018
topper-123 pushed a commit to topper-123/pandas that referenced this issue May 21, 2018
@jreback jreback modified the milestones: Contributions Welcome, 0.24.0 Sep 18, 2018
@jreback jreback modified the milestones: 0.24.0, 0.25.0 Jan 4, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
4 participants