-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
Appending parquet file from pandas to s3 #20638
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
you would be better off asking on fastparquet or pyarrow tracker this just passes thru |
This is possible using fast parquet, its working like this,
Nice to have the same in pandas. |
I still need to try it.. but it seems like it should be possible to use the the same syntax as in fastparquet.. After all the to_parquet method has a **kwargs that passes parameters to the fastparquet engine. in my case I use it as follows:
so it seems feasible that you should be able to use something like this:
See: [https://fastparquet.readthedocs.io/en/latest/filesystems.html] and [https://github.com/dask/fastparquet/issues/327] for the reason of the |
Here is my snippet in spark-shell
Problem description
Now, i am trying to do the same thing in pandas. I see pandas supports
to_parquet
without any issue, however, as per this #19429, writing in s3 is not supported yet and will be supported in 0.23.0.But, i cant find a solution to do the
to_parquet
in append mode. As per this, https://stackoverflow.com/questions/47191675/pandas-write-dataframe-to-parquet-format-with-append , the client API doesn't support it yet. But how come it works in spark? Anyone clarify this please and let me know if at all possible to do this append ?Thanks.
The text was updated successfully, but these errors were encountered: