Skip to content

TODO: more pprint imporvements #3426

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
3 tasks
ghost opened this issue Apr 23, 2013 · 10 comments
Closed
3 tasks

TODO: more pprint imporvements #3426

ghost opened this issue Apr 23, 2013 · 10 comments
Assignees
Labels
Enhancement Output-Formatting __repr__ of pandas objects, to_string

Comments

@ghost
Copy link

ghost commented Apr 23, 2013

TODO

  • dicts with many keys should be summarized (Update: I can no longer remember the use case for dicts
    printed via pprint_thing).
  • summarization should match numpy [a.a.a ... b b b] not [a a a ...]
  • add np.set_printoptions(edgeitems) analogue.
- [ ] options.display.max_seq_items should have a default value != None #3391, #5120 , #5629

via #5753

@cpcloud
Copy link
Member

cpcloud commented Jul 29, 2013

@y-p i can take this if you want

@ghost
Copy link
Author

ghost commented Jul 29, 2013

please.

@ghost ghost assigned cpcloud Jul 29, 2013
@ghost ghost assigned ghost and cpcloud Jan 10, 2014
@jreback jreback modified the milestones: 0.15.0, 0.14.0 Feb 18, 2014
@jreback jreback modified the milestones: 0.16.0, 0.17.0 Jan 26, 2015
@d1manson
Copy link

pprint_thing is REALLY slow when used with numpy arrays. For example, if I have a DataFrame with 100 100x100 masked arrays, it takes about 10 seconds to print it, which is really irritating.

some_data = np.ma.array(np.random.rand(100,100),mask=np.random.rand(100,100)>0.2)
df = pd.DataFrame(dict(example=[some_data]*100))
print df 

Is there some scope for calling str(ndarray) somewhere in pprint_thing, and perhaps simply doing .replace('\n',',')? Obviously, it would need to work within the recursive framework of pprint_thing, but wouldn't implement any special recursion itself (i.e. a tuple of ndarrays should be iterated over by pprint_thing and have their numpy __str__ methods invoked).

I'm still pretty new to pandas, so maybe I've missed something.

(For reference, I'm doing neuroscience: I have two or three "levels" of analysis, the top most of which, ie. the most meta-level, I would like to be doing in pandas, but I would very much like to store some of the lower level stuff in a DataFrame alongside the meta-stuff.)

@jreback
Copy link
Contributor

jreback commented Mar 10, 2015

What you are doing is extremely inefficient. Pandas (and numpy) are generally best used to hold a single scalar in a cell, which can be represented by a base type (e.g. a float).

Try this

In [9]: df = DataFrame(np.random.randn(100,100))

In [10]: df = df.where(df>0.2)

If you need multiple levels, simply add a multi-index.

Changing a printing routine to handle this use case is not likely to happen as it would increase the code complexity (over which it already is pretty crazy)

@d1manson
Copy link

I think I explained that rather poorly.

Imagine my 100 arrays is a list of images of random shapes, with each image having a multiindex tuple of (person, day, hour). I compute a bunch of metrics for each image and put the results as columns in my dataframe, but also putting the images themselves as a column. I can then do my meta analysis on both groups of the raw images and/or on the scalar-valued columns...I need to be able to do both. I have already submitted a pull request which makes it easyish to render numpy arrays as images in base64 data within html img tags, but the default display mechanism is still this slow pprint call.

@jreback
Copy link
Contributor

jreback commented Mar 10, 2015

well if you really really want to do this. I would simply wrap an object around the array and give it a custom printing method. Then you can do whatever you want. Pandas tries to do the right thing by printing nested things, but in this case you are putting something which pandas can render there.

@d1manson
Copy link

yes, that had occurred to me, but it's not as convenient and Id hoped that this kind of blob-like usage was mainstream enough to merit a line or two in the right place!

@jreback
Copy link
Contributor

jreback commented Mar 10, 2015

this is not the right way to harness the power of pandas

you are almost certainly better off keeping your images as numpy arrays or whatever (or frames) and simply having a references (eg a string to them in a particular column)

or use the object soln

you are trying to shoohorn s frame onto what you need
but it's really suited for that

@d1manson
Copy link

Shoohorning maybe, but the result is actually quite good - I'd recommend it to anyone else doing a similar kind of analysis.
And, strictly speaking I think what is being stored in the DataFrame is a reference - it's a reference to the ndarray instance, right? It's not like the DataFrame has the ndarray's data stored contiguously within it?

@WillAyd
Copy link
Member

WillAyd commented Jul 6, 2018

Closed as ambiguous

@WillAyd WillAyd closed this as completed Jul 6, 2018
@WillAyd WillAyd modified the milestones: Contributions Welcome, No action Jul 6, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Enhancement Output-Formatting __repr__ of pandas objects, to_string
Projects
None yet
Development

No branches or pull requests

4 participants