Skip to content

Revert "Pin fastparquet to leq 0.5.0" #41443

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
May 14, 2021

Conversation

lithomas1
Copy link
Member

@lithomas1 lithomas1 commented May 12, 2021

closes #41366
Reverts #41370
Looks like fastparquet released a new version.

@phofl
Copy link
Member

phofl commented May 12, 2021

I think we need 0.6.2, which isn't on conda-forge yet

@lithomas1
Copy link
Member Author

@phofl I can reproduce locally with 0.6.2
Here is the reproducible example.

import numpy as np
import pandas as pd
df = pd.DataFrame(
     {
         "a": list("abc"),
         "b": list(range(1, 4)),
         "c": np.arange(3, 6).astype("u1"),
         "d": np.arange(4.0, 7.0, dtype="float64"),
         "e": [True, False, True],
         "f": pd.date_range("20130101", periods=3),
         "g": pd.date_range("20130101", periods=3, tz="US/Eastern"),
         "h": pd.Categorical(list("abc")),
         "i": pd.Categorical(list("abc"), ordered=True),
     }
 )
df.to_parquet("example_fp.parquet", engine="fastparquet")
result = pd.read_parquet("example_fp.parquet", engine="fastparquet")

This is the error

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/thomasli/pandas/pandas/io/parquet.py", line 502, in read_parquet
    **kwargs,
  File "/Users/thomasli/pandas/pandas/io/parquet.py", line 343, in read
    result = parquet_file.to_pandas(columns=columns, **kwargs)
  File "/Users/thomasli/opt/anaconda3/lib/python3.7/site-packages/fastparquet/api.py", line 383, in to_pandas
    df, views = self.pre_allocate(size, columns, categories, index)
  File "/Users/thomasli/opt/anaconda3/lib/python3.7/site-packages/fastparquet/api.py", line 407, in pre_allocate
    self._dtypes(categories), self.tz)
  File "/Users/thomasli/opt/anaconda3/lib/python3.7/site-packages/fastparquet/api.py", line 593, in _pre_allocate
    index_types=index_types, cats=cats, timezones=tz)
  File "/Users/thomasli/opt/anaconda3/lib/python3.7/site-packages/fastparquet/dataframe.py", line 187, in empty
    values = type(bvalues)._from_sequence(values, copy=False)
  File "/Users/thomasli/pandas/pandas/core/arrays/datetimes.py", line 337, in _from_sequence
    return cls._from_sequence_not_strict(scalars, dtype=dtype, copy=copy)
  File "/Users/thomasli/pandas/pandas/core/arrays/datetimes.py", line 363, in _from_sequence_not_strict
    ambiguous=ambiguous,
  File "/Users/thomasli/pandas/pandas/core/arrays/datetimes.py", line 2034, in sequence_to_dt64ns
    data, copy = maybe_convert_dtype(data, copy)
  File "/Users/thomasli/pandas/pandas/core/arrays/datetimes.py", line 2243, in maybe_convert_dtype
    raise TypeError(f"dtype {data.dtype} cannot be converted to datetime64[ns]")
TypeError: dtype bool cannot be converted to datetime64[ns]

cc @martindurant

@lithomas1 lithomas1 closed this May 12, 2021
@lithomas1 lithomas1 reopened this May 12, 2021
@martindurant
Copy link
Contributor

Passes for me with 0.6.2

@lithomas1
Copy link
Member Author

@martindurant Are you testing with master? (IIRC, this passes on 1.2.4 but fails on master). I can reproduce the failures on CI with master and fastparquet 0.6.2.
(Failing Web and Docs Build is the example above, and these are the failing tests for Windows py38/Linux Database

FAILED pandas/tests/io/test_fsspec.py::test_fastparquet_options - ValueError:...
FAILED pandas/tests/io/test_fsspec.py::test_s3_parquet - ValueError: Opening ...
FAILED pandas/tests/io/test_parquet.py::test_cross_engine_pa_fp - TypeError: ...
FAILED pandas/tests/io/test_parquet.py::TestParquetFastParquet::test_basic - ...
FAILED pandas/tests/io/test_parquet.py::TestParquetFastParquet::test_s3_roundtrip

)

@martindurant
Copy link
Contributor

No, I meant it passes against released pandas. So that means that something must have changed on the pandas side (also). If you happen to have the traceback for those failures, it may save me to the time to build a local main branch of pandas. It appears they all fail in the same way.

@datapythonista datapythonista added the Dependencies Required and optional dependencies label May 13, 2021
@lithomas1 lithomas1 closed this May 14, 2021
@lithomas1 lithomas1 reopened this May 14, 2021
@jreback jreback added this to the 1.3 milestone May 14, 2021
@jreback jreback merged commit afcf180 into pandas-dev:master May 14, 2021
@jreback
Copy link
Contributor

jreback commented May 14, 2021

thanks @lithomas1 and @martindurant for the patch!

@lithomas1
Copy link
Member Author

Thanks @martindurant.

@lithomas1 lithomas1 deleted the revert-41370-pin_fastparquet branch May 15, 2021 21:20
@simonjayhawkins
Copy link
Member

we have failures on 1.2.x. could backport this patch or backport #41370 instead. will start with trying this one.

@meeseeksdev backport 1.2.x

@simonjayhawkins
Copy link
Member

@lithomas1 This PR includes a patch, is a release note needed. I'm thinking best not to backport this if no release note. I'll backport #41370 first to see if can get ci to green, and maybe backport this after.

JulianWgs pushed a commit to JulianWgs/pandas that referenced this pull request Jul 3, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Dependencies Required and optional dependencies
Projects
None yet
Development

Successfully merging this pull request may close these issues.

CI: Fastparquet release broke ci
6 participants