Skip to content

Commit 064781c

Browse files
vvaidyVijay Vaidyanathan
and
Vijay Vaidyanathan
authored
DOC: Provide examples of using read_parquet #49739 (#54150)
* DOC: Provide examples of using read_parquet #49739 * DOC: Provide examples of using read_parquet #49739 * DOC: Provide examples of using read_parquet #49739 (with minor fixes) * DOC: Provide examples of using read_parquet #49739 Fixed typos that were causing tests to fail. Oops. * DOC: Provide examples of using read_parquet #49739 - fix formatting failed checks * DOC: Provide examples of using read_parquet #49739 - removed read_parquet from code_checks.sh as requested by @mroeschke --------- Co-authored-by: Vijay Vaidyanathan <[email protected]>
1 parent fd43d4b commit 064781c

File tree

2 files changed

+56
-1
lines changed

2 files changed

+56
-1
lines changed

ci/code_checks.sh

-1
Original file line numberDiff line numberDiff line change
@@ -84,7 +84,6 @@ if [[ -z "$CHECK" || "$CHECK" == "docstrings" ]]; then
8484
pandas.NaT \
8585
pandas.read_feather \
8686
pandas.DataFrame.to_feather \
87-
pandas.read_parquet \
8887
pandas.read_orc \
8988
pandas.read_sas \
9089
pandas.read_spss \

pandas/io/parquet.py

+56
Original file line numberDiff line numberDiff line change
@@ -556,7 +556,63 @@ def read_parquet(
556556
Returns
557557
-------
558558
DataFrame
559+
560+
See Also
561+
--------
562+
DataFrame.to_parquet : Create a parquet object that serializes a DataFrame.
563+
564+
Examples
565+
--------
566+
>>> original_df = pd.DataFrame(
567+
... {{"foo": range(5), "bar": range(5, 10)}}
568+
... )
569+
>>> original_df
570+
foo bar
571+
0 0 5
572+
1 1 6
573+
2 2 7
574+
3 3 8
575+
4 4 9
576+
>>> df_parquet_bytes = original_df.to_parquet()
577+
>>> from io import BytesIO
578+
>>> restored_df = pd.read_parquet(BytesIO(df_parquet_bytes))
579+
>>> restored_df
580+
foo bar
581+
0 0 5
582+
1 1 6
583+
2 2 7
584+
3 3 8
585+
4 4 9
586+
>>> restored_df.equals(original_df)
587+
True
588+
>>> restored_bar = pd.read_parquet(BytesIO(df_parquet_bytes), columns=["bar"])
589+
>>> restored_bar
590+
bar
591+
0 5
592+
1 6
593+
2 7
594+
3 8
595+
4 9
596+
>>> restored_bar.equals(original_df[['bar']])
597+
True
598+
599+
The function uses `kwargs` that are passed directly to the engine.
600+
In the following example, we use the `filters` argument of the pyarrow
601+
engine to filter the rows of the DataFrame.
602+
603+
Since `pyarrow` is the default engine, we can omit the `engine` argument.
604+
Note that the `filters` argument is implemented by the `pyarrow` engine,
605+
which can benefit from multithreading and also potentially be more
606+
economical in terms of memory.
607+
608+
>>> sel = [("foo", ">", 2)]
609+
>>> restored_part = pd.read_parquet(BytesIO(df_parquet_bytes), filters=sel)
610+
>>> restored_part
611+
foo bar
612+
0 3 8
613+
1 4 9
559614
"""
615+
560616
impl = get_engine(engine)
561617

562618
if use_nullable_dtypes is not lib.no_default:

0 commit comments

Comments
 (0)