Skip to content

REGR: to_parquet raising with bytes filename #48995

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Oct 12, 2022
Merged

Conversation

phofl
Copy link
Member

@phofl phofl commented Oct 7, 2022

@phofl phofl added Regression Functionality that used to work in a prior pandas version IO Parquet parquet, feather labels Oct 7, 2022
@phofl phofl added this to the 1.5.1 milestone Oct 7, 2022
@@ -84,6 +84,7 @@ Fixed regressions
- Fixed Regression in :meth:`DataFrameGroupBy.apply` when user defined function is called on an empty dataframe (:issue:`47985`)
- Fixed regression in :meth:`DataFrame.apply` when passing non-zero ``axis`` via keyword argument (:issue:`48656`)
- Fixed regression in :meth:`Series.groupby` and :meth:`DataFrame.groupby` when the grouper is a nullable data type (e.g. :class:`Int64`) or a PyArrow-backed string array, contains null values, and ``dropna=False`` (:issue:`48794`)
- Fixed regrssion in :meth:`DataFrame.to_parquet` raising when file name was specified as ``bytes`` (:issue:`48944`)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo regression

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thx, fixed

df = pd.DataFrame(data={"A": [0, 1], "B": [1, 0]})
with tm.ensure_clean("test.parquet") as path:
with open(path.encode(), "wb") as f:
df.to_parquet(f)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you parameterize over the engine (pyarrow and fastparquet)?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was originally referring to engine in to_parquet, but thanks for also add it in read_parquet

Copy link
Member Author

@phofl phofl Oct 12, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, mixed that up. to_parquet does not work, because it does not support BufferedWriter as path for fast parquet

Traceback (most recent call last):
  File "/Users/patrick/Library/Application Support/JetBrains/PyCharm2022.2/scratches/scratch.py", line 272, in <module>
    df.to_parquet(f, engine="fastparquet")
  File "/Users/patrick/mambaforge/envs/random/lib/python3.10/site-packages/pandas/util/_decorators.py", line 207, in wrapper
    return func(*args, **kwargs)
  File "/Users/patrick/mambaforge/envs/random/lib/python3.10/site-packages/pandas/core/frame.py", line 2835, in to_parquet
    return to_parquet(
  File "/Users/patrick/mambaforge/envs/random/lib/python3.10/site-packages/pandas/io/parquet.py", line 420, in to_parquet
    impl.write(
  File "/Users/patrick/mambaforge/envs/random/lib/python3.10/site-packages/pandas/io/parquet.py", line 301, in write
    self.api.write(
  File "/Users/patrick/mambaforge/envs/random/lib/python3.10/site-packages/fastparquet/writer.py", line 1231, in write
    write_simple(filename, data, fmd,
  File "/Users/patrick/mambaforge/envs/random/lib/python3.10/site-packages/fastparquet/writer.py", line 880, in write_simple
    with open_with(fn, mode) as f:
TypeError: expected str, bytes or os.PathLike object, not BufferedWriter

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah okay. Guess it's a limitation in fastparquet then.

@mroeschke mroeschke merged commit 56d82a9 into pandas-dev:main Oct 12, 2022
@mroeschke
Copy link
Member

Thanks @phofl

meeseeksmachine pushed a commit to meeseeksmachine/pandas that referenced this pull request Oct 12, 2022
phofl added a commit that referenced this pull request Oct 13, 2022
…es filename) (#49061)

Backport PR #48995: REGR: to_parquet raising with bytes filename

Co-authored-by: Patrick Hoefler <[email protected]>
@phofl phofl deleted the 48944 branch October 14, 2022 12:12
noatamir pushed a commit to noatamir/pandas that referenced this pull request Nov 9, 2022
* REGR: to_parquet raising with bytes filename

* Add check

* Fix typo

* Parametrize
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
IO Parquet parquet, feather Regression Functionality that used to work in a prior pandas version
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BUG: df.to_parquet() fails for a PyFilesystem2 file handle
3 participants