-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
CI: Fastparquet upgrade broke CI #42588
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
can u report in that tracker as well seems to be lots of breaking changes in fastparuet lately |
Ok, investigating a bit more. The uint8 vs Uint8 change seems like a deliberate change in fastparquet, xref dask/fastparquet#623. In pandas, we make this opt in with the
|
Fastparquet uses ns precision to store timestamps as of that PR, which ought not to have changed the tz propagation - there are tests for that. I haven't looked at your tests, yet, though. |
Slightly unrelated, but perhaps fastparquet can set up a CI for the dev version of pandas? That way, we can catch bugs/issues before the release of a new pandas/fastparquet. I'd be happy to set that up for fastparquet, if you want. |
btw: I would love if there were a variant of assert_frame_equal that would make, for example, uint8 and UInt8 equivalent. |
@lithomas1 , yes that sounds very useful. We already test against dask, for example. |
I appreciate people here sticking with me, and hope that all the improvements described at https://fastparquet.readthedocs.io/en/latest/releasenotes.html are worth it! |
Thanks for developing fastparquet! :) |
Hmmm... There might be another inconsistency in between the fastparquet and pyarrow engines. We currently configure pyarrow to always go to nullable dtypes when the keyword
So, I think we need to decide whether cc @jorisvandenbossche (since last time, I think you had some reservations about making |
Since this
I had indeed reservations about doing this conditionally. IMO the keyword should give consistent behaviour, and is also meant to opt-in to all nullable dtypes, so that's how I originally implemented it. I would prefer to keep the behaviour that way. Nullable vs non-nullable dtypes can behave differently for certain operations, even if the original read data has no nulls (you can easily introduce nulls with a next operation). Yes, nullable dtypes can waste some space if there are no NAs (the mask), but there are other ways to resolve this (i.e. ensure we can not store any mask in such a case -> #30435). |
Fastparquet updated to 0.7 which is causing failure.
See these logs from database build for example.
The text was updated successfully, but these errors were encountered: