-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
ENH: DataFrame.to_parquet() returns bytes if path_or_buf not provided #37129
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
pandas/io/parquet.py
Outdated
|
||
if path is None: | ||
path = io.BytesIO() | ||
|
||
return impl.write( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you also need to return path.getvalue()
no? (of path was None in the first place).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes! fixed that
pandas/tests/io/test_parquet.py
Outdated
def test_to_bytes_without_path_or_buf_provided(self): | ||
# GH 37105 | ||
df = pd.DataFrame() | ||
df.to_parquet() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
check one of the test dataframes, and test that the round-trip works as well, alt you can write it to a file and check that its the ssame.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
pandas/io/parquet.py
Outdated
|
||
if path is None: | ||
path = io.BytesIO() | ||
|
||
return impl.write( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes! fixed that
pandas/tests/io/test_parquet.py
Outdated
def test_to_bytes_without_path_or_buf_provided(self): | ||
# GH 37105 | ||
df = pd.DataFrame() | ||
df.to_parquet() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
doc/source/whatsnew/v1.2.0.rst
Outdated
@@ -194,6 +194,7 @@ Other enhancements | |||
- Added :meth:`Rolling.sem()` and :meth:`Expanding.sem()` to compute the standard error of mean (:issue:`26476`). | |||
- :meth:`Rolling.var()` and :meth:`Rolling.std()` use Kahan summation and Welfords Method to avoid numerical issues (:issue:`37051`) | |||
- :meth:`DataFrame.plot` now recognizes ``xlabel`` and ``ylabel`` arguments for plots of type ``scatter`` and ``hexbin`` (:issue:`37001`) | |||
- :meth:`DataFrame.to_parquet` now writes to ``io.Bytes`` when no ``path`` argument is passed (:issue:`37105`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no this returns a bytes
object
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
right. Fixed
pandas/io/parquet.py
Outdated
If a string, it will be used as Root Directory path | ||
when writing a partitioned dataset. By file-like object, | ||
we refer to objects with a write() method, such as a file handle | ||
(e.g. via builtin open function) or io.BytesIO. The engine | ||
fastparquet does not accept file-like objects. | ||
fastparquet does not accept file-like objects. If path is None, | ||
frame is written to an io.BytesIO object and a bytes object with |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
revise, the io.Bytes is a detail that is not important; its the return of bytes
that is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
pandas/io/parquet.py
Outdated
fastparquet does not accept file-like objects. | ||
fastparquet does not accept file-like objects. If path is None, | ||
frame is written to an io.BytesIO object and a bytes object with | ||
the contents of the buffer is returned. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add a versionchanged 1.2 (return bytes)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done both
pandas/tests/io/test_parquet.py
Outdated
# GH 37105 | ||
|
||
buf = df_full.to_parquet() | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
assert that buf is bytes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
1518ad8
to
0bd0f47
Compare
lgtm. can you merge master and ping on green just to make sure. @jorisvandenbossche if any comments. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good! Small comment on the test
pandas/tests/io/test_parquet.py
Outdated
|
||
with tm.ensure_clean() as path: | ||
with open(path, "wb") as f: | ||
f.write(buf) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would maybe rather test that you can directly read the bytes again (by wrapping it in a BytesIO?) instead of writing the bytes to a file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
Thanks @arw2019 ! |
black pandas
git diff upstream/master -u -- "*.py" | flake8 --diff
Not sure if this is an API breaking change. Prior to this patch
path
was a required positional argument but it becomes optional here. It is now consistent with, for example, the csv writer.