-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
ENH: Delta Lake file format support #35017
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I'm pretty sure Delta Lake only supports access via Spark at the moment, not through any other tools or APIs. I'll post the link in the docs where it mentions it when I can find it, but I do remember reading that. EDIT: Well, their README does say this:
|
I doubt we would support that directly, but if there is an implementation we could link to it in the ecosystem docs. |
FYI Koalas reads/writes to delta lake (link), and has easy koalas <--> pandas methods |
Happy to take an addition to the ecosystem docs when a reader / writer is available. That won't be implemented in pandas, but perhaps in / using pyarrow. |
Happy to share that we now have native support for Delta lakes outside of JVM and Spark. It's a full deltalake implementation in pure Rust. We also provide a thin python wrapper for pandas integration, see https://github.com/delta-io/delta-rs/tree/main/python#usage. @jbrockmendel perhaps we can link this to the ecosystem docs? |
fine by me |
There is a delta-rs project that makes it easy to read Delta Lakes into pandas DataFrames as mentioned by @houqp. See this snippet: from deltalake import DeltaTable
dt = DeltaTable("resources/delta/1")
df = dt.to_pandas() Python write access doesn't exist yet, but hopefully it'll be added soon cause it'd be an awesome addition for the Pandas community! |
Add to @MrPowers 's comment, delta-rs now has write support in the rust core, just waiting for someone to send us a PR to expose that write api to the python shim ;) |
Is your feature request related to a problem?
No
Describe the solution you'd like
I'd love it if Pandas could support Databricks' Delta Lake file format (https://github.com/delta-io/delta). It's a type of versioned parquet file format that supports updates/inserts/deletions.
API breaking implications
None that I'm aware of
Describe alternatives you've considered
N/A
Additional context
N/A
The text was updated successfully, but these errors were encountered: