TODO: more pprint imporvements #3426

ghost · 2013-04-23T02:11:49Z

TODO

dicts with many keys should be summarized (Update: I can no longer remember the use case for dicts
printed via pprint_thing).
summarization should match numpy [a.a.a ... b b b] not [a a a ...]
add np.set_printoptions(edgeitems) analogue.

~~- [ ] options.display.max_seq_items should have a default value != None #3391, #5120 , #5629~~

via #5753

cpcloud · 2013-07-29T05:30:30Z

@y-p i can take this if you want

ghost · 2013-07-29T05:36:17Z

please.

d1manson · 2015-03-10T20:01:05Z

pprint_thing is REALLY slow when used with numpy arrays. For example, if I have a DataFrame with 100 100x100 masked arrays, it takes about 10 seconds to print it, which is really irritating.

some_data = np.ma.array(np.random.rand(100,100),mask=np.random.rand(100,100)>0.2)
df = pd.DataFrame(dict(example=[some_data]*100))
print df

Is there some scope for calling str(ndarray) somewhere in pprint_thing, and perhaps simply doing .replace('\n',',')? Obviously, it would need to work within the recursive framework of pprint_thing, but wouldn't implement any special recursion itself (i.e. a tuple of ndarrays should be iterated over by pprint_thing and have their numpy __str__ methods invoked).

I'm still pretty new to pandas, so maybe I've missed something.

(For reference, I'm doing neuroscience: I have two or three "levels" of analysis, the top most of which, ie. the most meta-level, I would like to be doing in pandas, but I would very much like to store some of the lower level stuff in a DataFrame alongside the meta-stuff.)

jreback · 2015-03-10T22:41:59Z

What you are doing is extremely inefficient. Pandas (and numpy) are generally best used to hold a single scalar in a cell, which can be represented by a base type (e.g. a float).

Try this

In [9]: df = DataFrame(np.random.randn(100,100))

In [10]: df = df.where(df>0.2)

If you need multiple levels, simply add a multi-index.

Changing a printing routine to handle this use case is not likely to happen as it would increase the code complexity (over which it already is pretty crazy)

d1manson · 2015-03-10T23:11:22Z

I think I explained that rather poorly.

Imagine my 100 arrays is a list of images of random shapes, with each image having a multiindex tuple of (person, day, hour). I compute a bunch of metrics for each image and put the results as columns in my dataframe, but also putting the images themselves as a column. I can then do my meta analysis on both groups of the raw images and/or on the scalar-valued columns...I need to be able to do both. I have already submitted a pull request which makes it easyish to render numpy arrays as images in base64 data within html img tags, but the default display mechanism is still this slow pprint call.

jreback · 2015-03-10T23:15:50Z

well if you really really want to do this. I would simply wrap an object around the array and give it a custom printing method. Then you can do whatever you want. Pandas tries to do the right thing by printing nested things, but in this case you are putting something which pandas can render there.

d1manson · 2015-03-10T23:25:28Z

yes, that had occurred to me, but it's not as convenient and Id hoped that this kind of blob-like usage was mainstream enough to merit a line or two in the right place!

jreback · 2015-03-10T23:29:13Z

this is not the right way to harness the power of pandas

you are almost certainly better off keeping your images as numpy arrays or whatever (or frames) and simply having a references (eg a string to them in a particular column)

or use the object soln

you are trying to shoohorn s frame onto what you need
but it's really suited for that

d1manson · 2015-03-10T23:56:10Z

Shoohorning maybe, but the result is actually quite good - I'd recommend it to anyone else doing a similar kind of analysis.
And, strictly speaking I think what is being stored in the DataFrame is a reference - it's a reference to the ndarray instance, right? It's not like the DataFrame has the ndarray's data stored contiguously within it?

WillAyd · 2018-07-06T22:16:11Z

Closed as ambiguous

ghost self-assigned this Apr 23, 2013

hayd mentioned this issue May 7, 2013

max_rows seems to have max value of 60 #3541

Closed

ghost mentioned this issue Jun 19, 2013

ENH: add last element to repring of sequences #3941

Closed

ghost assigned cpcloud Jul 29, 2013

ghost mentioned this issue Jan 3, 2014

Assign a default value for display max_seq_items #3391

Closed

ghost assigned ghost and cpcloud Jan 10, 2014

jreback modified the milestones: 0.15.0, 0.14.0 Feb 18, 2014

jreback modified the milestones: 0.16.0, 0.17.0 Jan 26, 2015

WillAyd closed this as completed Jul 6, 2018

WillAyd modified the milestones: Contributions Welcome, No action Jul 6, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TODO: more pprint imporvements #3426

TODO: more pprint imporvements #3426

ghost commented Apr 23, 2013

cpcloud commented Jul 29, 2013

ghost commented Jul 29, 2013

d1manson commented Mar 10, 2015

jreback commented Mar 10, 2015

d1manson commented Mar 10, 2015

jreback commented Mar 10, 2015

d1manson commented Mar 10, 2015

jreback commented Mar 10, 2015

d1manson commented Mar 10, 2015

WillAyd commented Jul 6, 2018

TODO: more pprint imporvements #3426

TODO: more pprint imporvements #3426

Comments

ghost commented Apr 23, 2013

TODO

cpcloud commented Jul 29, 2013

ghost commented Jul 29, 2013

d1manson commented Mar 10, 2015

jreback commented Mar 10, 2015

d1manson commented Mar 10, 2015

jreback commented Mar 10, 2015

d1manson commented Mar 10, 2015

jreback commented Mar 10, 2015

d1manson commented Mar 10, 2015

WillAyd commented Jul 6, 2018