From 40816dda4828661b4fb9e8aca021bddbb9d86ef7 Mon Sep 17 00:00:00 2001 From: RajatS Mukherjee Date: Sat, 1 Jul 2023 07:02:12 +0000 Subject: [PATCH 1/5] Added note and example for StringDtype --- doc/source/user_guide/io.rst | 18 ++++++++++++++++++ doc/source/user_guide/text.rst | 5 +++++ 2 files changed, 23 insertions(+) diff --git a/doc/source/user_guide/io.rst b/doc/source/user_guide/io.rst index 0084e885db2b5..33f8c135a8a31 100644 --- a/doc/source/user_guide/io.rst +++ b/doc/source/user_guide/io.rst @@ -5464,6 +5464,24 @@ The above example creates a partitioned dataset that may look like: except OSError: pass +.. note:: + + * The parquet representation of ``StringDtype`` is the same, regardless of the storage. + * The data will be read in accordance with the ``string_storage`` settings. + +.. ipython:: python + df1 = pd.DataFrame({"A": pd.array(['a', 'b'], dtype=pd.StringDtype("pyarrow"))}) + df2 = pd.DataFrame({"A": pd.array(['a', 'b'], dtype=pd.StringDtype("python"))}) + df1.to_parquet("test.parquet") + with pd.option_context("string_storage", "pyarrow"): + b = pd.read_parquet("test.parquet") + pd.testing.assert_frame_equal(b, a) + df2.to_parquet("test.parquet") + with pd.option_context("string_storage", "pyarrow"): + b = pd.read_parquet("test.parquet") + pd.testing.assert_frame_equal(b, a) + + .. _io.orc: ORC diff --git a/doc/source/user_guide/text.rst b/doc/source/user_guide/text.rst index c193df5118926..fa128ce983041 100644 --- a/doc/source/user_guide/text.rst +++ b/doc/source/user_guide/text.rst @@ -81,6 +81,11 @@ or convert from existing pandas data: s2 type(s2[0]) +.. note:: + + * The parquet representation of ``StringDtype`` is the same, regardless of the storage. + * The data will be read in accordance with the ``string_storage`` settings. + .. _text.differences: From e61788eef337b98bf28e871f8936863214e1161a Mon Sep 17 00:00:00 2001 From: RajatS Mukherjee Date: Sat, 1 Jul 2023 10:16:53 +0000 Subject: [PATCH 2/5] Fixed ipython snippet --- doc/source/user_guide/io.rst | 9 +++++---- 1 file changed, 5 insertions(+), 4 deletions(-) diff --git a/doc/source/user_guide/io.rst b/doc/source/user_guide/io.rst index 33f8c135a8a31..f540cb63e44d1 100644 --- a/doc/source/user_guide/io.rst +++ b/doc/source/user_guide/io.rst @@ -5470,16 +5470,17 @@ The above example creates a partitioned dataset that may look like: * The data will be read in accordance with the ``string_storage`` settings. .. ipython:: python + df1 = pd.DataFrame({"A": pd.array(['a', 'b'], dtype=pd.StringDtype("pyarrow"))}) df2 = pd.DataFrame({"A": pd.array(['a', 'b'], dtype=pd.StringDtype("python"))}) df1.to_parquet("test.parquet") with pd.option_context("string_storage", "pyarrow"): - b = pd.read_parquet("test.parquet") - pd.testing.assert_frame_equal(b, a) + df3 = pd.read_parquet("test.parquet") + pd.testing.assert_frame_equal(df3, df1) df2.to_parquet("test.parquet") with pd.option_context("string_storage", "pyarrow"): - b = pd.read_parquet("test.parquet") - pd.testing.assert_frame_equal(b, a) + df4 = pd.read_parquet("test.parquet") + pd.testing.assert_frame_equal(df4, df1) .. _io.orc: From e769d413b84915f3010473d93ee2faf960c32575 Mon Sep 17 00:00:00 2001 From: RajatS Mukherjee Date: Sat, 1 Jul 2023 11:45:40 +0000 Subject: [PATCH 3/5] suppressing warning --- doc/source/user_guide/io.rst | 1 + 1 file changed, 1 insertion(+) diff --git a/doc/source/user_guide/io.rst b/doc/source/user_guide/io.rst index f540cb63e44d1..19023011167a5 100644 --- a/doc/source/user_guide/io.rst +++ b/doc/source/user_guide/io.rst @@ -5470,6 +5470,7 @@ The above example creates a partitioned dataset that may look like: * The data will be read in accordance with the ``string_storage`` settings. .. ipython:: python + :suppress: df1 = pd.DataFrame({"A": pd.array(['a', 'b'], dtype=pd.StringDtype("pyarrow"))}) df2 = pd.DataFrame({"A": pd.array(['a', 'b'], dtype=pd.StringDtype("python"))}) From 938767ce0bb4b9b22c6cd83fbaed341dcad2c0f4 Mon Sep 17 00:00:00 2001 From: RajatS Mukherjee Date: Sat, 1 Jul 2023 12:30:45 +0000 Subject: [PATCH 4/5] Handling warning during runtime --- doc/source/user_guide/io.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/doc/source/user_guide/io.rst b/doc/source/user_guide/io.rst index 19023011167a5..75e47ee95816c 100644 --- a/doc/source/user_guide/io.rst +++ b/doc/source/user_guide/io.rst @@ -5470,7 +5470,7 @@ The above example creates a partitioned dataset that may look like: * The data will be read in accordance with the ``string_storage`` settings. .. ipython:: python - :suppress: + :okwarning: df1 = pd.DataFrame({"A": pd.array(['a', 'b'], dtype=pd.StringDtype("pyarrow"))}) df2 = pd.DataFrame({"A": pd.array(['a', 'b'], dtype=pd.StringDtype("python"))}) From 0df9121c994b046b64445c51fb282b4c1d76db77 Mon Sep 17 00:00:00 2001 From: Rajat Subhra Mukherjee Date: Tue, 4 Jul 2023 19:44:44 +0530 Subject: [PATCH 5/5] Removed `okwarning` --- doc/source/user_guide/io.rst | 1 - 1 file changed, 1 deletion(-) diff --git a/doc/source/user_guide/io.rst b/doc/source/user_guide/io.rst index 75e47ee95816c..f540cb63e44d1 100644 --- a/doc/source/user_guide/io.rst +++ b/doc/source/user_guide/io.rst @@ -5470,7 +5470,6 @@ The above example creates a partitioned dataset that may look like: * The data will be read in accordance with the ``string_storage`` settings. .. ipython:: python - :okwarning: df1 = pd.DataFrame({"A": pd.array(['a', 'b'], dtype=pd.StringDtype("pyarrow"))}) df2 = pd.DataFrame({"A": pd.array(['a', 'b'], dtype=pd.StringDtype("python"))})