-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
New data format: ROOT files #9378
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
this is a format that the hdf5 files are setup in? kind of like a convention for naming and structure? |
It's a completely different format that was designed from the ground up for physics experiments. I think it might actually predate HDF5. Here's an info page on the file format: https://root.cern.ch/drupal/content/root-files-1 |
So I think a nice way to do this would be to create where we can put 'non-core' IO packages handling. So that an import would inject them into the pandas namespace but it would be optional (and updated externally to pandas), maybe something like
would work |
That sounds like the right place to put a package like this. |
@ibab if you'd be interested in this, then I can setup the repo and you can port things. |
@jreback Sure, thanks! I'll be happy to port it over. |
We should also make sure the new methods are |
This is an intriguing idea, and certainly would be a win for interactive use. And I really like supporting method chaining. But there are a few tradeoffs to consider here:
Considering these factors, it seems like creating stubs like |
Adding stubs directly to |
As discussed in pandas-dev#5487. This would also be a nice place to mention other packages that connect the pandas DataFrames with other file formats. Right now, the only one I can think of off-hand is `root_pandas` (pandas-dev#9378, CC @ibab), but I'm sure there are more.
Hi, |
as discussed above, this could be added to a |
+1. Is @ibab interested in help? |
Sure, I'd be glad to help. Does a Something like
should be doable. Instead of requiring a hard dependency on
would always trigger lots of imports, though.
would be better. What do you think? |
A pandas-io package does not currently exist. The standard way to handle hard dependency issues is to import packages inside the functions that call them, e.g., only import I don't think it's a good idea to monkey patch dataframes. I would rather suggest that you use functions, possibly with |
I've set up a repository at https://github.com/ibab/pandas-io. I've decided to go with from pandas.io.external import read_root
df = read_root('in.root')
if the package can't be found. I've removed the monkey patch on I'm very open to changing any aspect of the package, if someone has an idea how it should be changed or improved. |
I am not sure I see the benefit of having such a pandas-io package. The difference for users is not that big:
and
In any case, @ibab, thanks for exploring this! I only think we should first more thoroughly discuss it (and exploring it can help the discussion). Very quickly some ideas: Pro:
Con:
If discoverability is the main reason, I think there are also other ways to handle this (better promoting ecosystem packages in the docs, website, add it to the io docs, ..) |
Closing and adding to a tracker issue #30407 for IO format requests, can re-open if interest is expressed. |
Hi,
I've recently written a tiny python package for loading/saving ROOT files as pandas DataFrames: root_pandas.
ROOT is the main data format used by particle physicists.
Do you think it might be worth adding
read_root
andto_root
functions to pandas itself?This could convince a lot of physicists to give pandas a try.
The dependencies it would add could be handled like it's currently done with hdf5.
ROOT performs well in comparison with hdf5 and could be a useful addition to pandas, even ignoring the fact that it is very popular in physics.
If there's interest, I would polish my code and create a pull request.
The text was updated successfully, but these errors were encountered: