-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
Bloomberg Hackathon #8323
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
most of these are doc/testing things. I looked thru the Good as first PR. Anyone have any issues to add that are not on that list? |
More customization of Excel input/output could be great, i.e. making it easier to specify per-column colors/formatting, float formats, etc. The code base isn't too complicated there (just a mixture of the formatter and the ExcelWriter stuff) and you could make rapid progress because it's really easy to test and create samples. I think the result would be very immediately rewarding (better looking things, easier to make reports, etc.). Plus for #4679 and #8272 you'd get a better sense of pandas internals too. List of PRs (ordered from most interesting/most impact to least interesting):
|
@jtratner thanks! i'll update! |
One other great (but self-contained) project, would be to convert a pandas DataFrame into a new BigQuery table when writing. I've been working with BigQuery quite a bit and it would be pretty simple to do and be a nice way to dig into dealing with column metadata. I'll put up an issue right now with more details. |
@jtratner thanks! that would be great! |
|
thanks @TomAugspurger @rockg @jtratner |
@jorisvandenbossche thanks! |
I was also thinking, some utility function that can 'read' the output of a DataFrame back in, would be something nice (for simple situations, you can use |
@jorisvandenbossche not sure what you mean. except for the column index losing its name (not a multi-index though), csv round-tripping is preserving. |
but I do not mean csv roundtripping, I mean console print roundtripping |
Is there an easy way to read this in (the output as a string)?
Dealing with the multi-index, dealing with the sparse index, index names, ... (or to start with, not flipping on those) |
I think the clipboard is pretty robust (its just
|
Yes, but what I mean is: if you have this output as a string, or you can copy it (eg from an example in the docs, from a question on stackoverflow, ...), can you convert this easily to a DataFrame in a new session. And using |
@jorisvandenbossche hmm works for me on master. I usually just copy-paste from a question and do this:
FYI, I tried making this work from just a string (e.g. |
Yep, that is what I also do, but still, I mostly have to adapt something to the original data to get it working. It would be fine if there is some utility that can read all output.
Can you read this in with |
@jorisvandenbossche I suppose you could have a wrapper that 'tries' various things, but its non-trivial to simply guess, well you can, but their are so many edge cases its MUCH easier to just have the user specify it. |
Are there that many edge cases? The output of the pandas |
ahh, you are proposing a |
@jorisvandenbossche I updated in the Enhancements section. |
maybe it doesn't need to be in a top level, other possibility is something like |
Oh, #5563 would be a good one (Series HTML repr) |
nice posts you have here: tomaugspurger.github.io/blog/2014/09/04/practical-pandas-part-2-more-tidying-more-data-and-merging about 1/2 down you pass method='table' in to_hdf which is ignored (and means u get a perf warning) |
Would #8162 (Allowing the index to be referenced by name, like a column) be a doable? I would love to see something like this happen in SF! |
wow this is great everyone bravo! |
let's see how much gets done! of course the point of this list was to get as much dev time as possible (at the expense of other projects of course) :) |
in case a brave soul would like to venture into the land of Missing data support in numpy: #8350 |
nice-to-have: pandas + airspeed velocity demo: http://mdboom.github.io/astropy-benchmark/ adding to the top list |
that looks sexy - can u create a new issue for asv? vbench like |
yep. |
Implemeting a CategoricalIndex #7629? |
@JanSchulz I think out of scope for a 1-day event |
I should mention that Mark Wiebe (who knows a lot of numpy internals) will be there. Additionally (re: airspeed velocity), Michael Droettboom will be there. |
Is there a summary of this hackathon available online? |
no summary - a few issues worked on / closed |
@ all |
Contributing Guidlines / Help:
https://github.com/pydata/pandas/wiki
Dev Docs
http://pandas-docs.github.io/pandas-docs-travis/whatsnew.html
Docs:
Docs on ipython startup files: DOC: add section about using python/ipython startup files to set options to FAQ #5748GA docs: google analytics docs #3508Perf:
vbench on different group sizes: PERF: add vbenchs for groupby functions with different group sizes #6787Tests:
Bugs:
Enhancements:
read_csv
(or mayberead_repr/string
to allow round-triping of the repr (can also serve as a basis forread_clipboard
)accept Period in DatetimeIndex for start/end: Cannot create DatetimeIndex using Period #6780to_dict orient parm: DataFrame to_dict method should also provide orient parameter (like to_json) #7840level kw to any/all: API: add level kwarg for Series.any/.all #8302df.astype
could accept a dict of {col: type} #7271clean up code by removing core/array.py: COMPAT/CLN: remove need for core/array.py #8359IO:
to_latex
work with multi-index: to_latex with MI column and index names #8336Series.to_html
not working so well: Series do not display HTML repr #5563Excel Oriented:
SQL:
to_sql
per column: problem with to_sql with NA #8778More advanced:
Collaborative Efforts:
@jorisvandenbossche @cpcloud @TomAugspurger @hayd
cc @shoyer
cc @immerrr
The text was updated successfully, but these errors were encountered: