From 62d3e335265076b56627fe705911ae0ea6ba7d1d Mon Sep 17 00:00:00 2001 From: JackCollins1991 <55454098+JackCollins1991@users.noreply.github.com> Date: Sun, 7 Jan 2024 11:31:07 +0100 Subject: [PATCH 1/9] Update io.rst Make consistent with other s3 bucket URL examples and avoid doc build error when problem with s3 url. --- doc/source/user_guide/io.rst | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/doc/source/user_guide/io.rst b/doc/source/user_guide/io.rst index b3ad23e0d4104..9510c686b6f27 100644 --- a/doc/source/user_guide/io.rst +++ b/doc/source/user_guide/io.rst @@ -3015,8 +3015,7 @@ Read in the content of the "books.xml" as instance of ``StringIO`` or Even read XML from AWS S3 buckets such as NIH NCBI PMC Article Datasets providing Biomedical and Life Science Jorurnals: -.. ipython:: python - :okwarning: +.. code-block:: python df = pd.read_xml( "s3://pmc-oa-opendata/oa_comm/xml/all/PMC1236943.xml", From 98436eb1cedd3a14052696a287cb2906fd2fffc9 Mon Sep 17 00:00:00 2001 From: JackCollins1991 <55454098+JackCollins1991@users.noreply.github.com> Date: Sun, 7 Jan 2024 12:45:37 +0100 Subject: [PATCH 2/9] Update io.rst Make example consistent with other code block examples --- doc/source/user_guide/io.rst | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/doc/source/user_guide/io.rst b/doc/source/user_guide/io.rst index 9510c686b6f27..3be42596a0754 100644 --- a/doc/source/user_guide/io.rst +++ b/doc/source/user_guide/io.rst @@ -3017,11 +3017,10 @@ Biomedical and Life Science Jorurnals: .. code-block:: python - df = pd.read_xml( + pd.read_xml( "s3://pmc-oa-opendata/oa_comm/xml/all/PMC1236943.xml", xpath=".//journal-meta", ) - df With `lxml`_ as default ``parser``, you access the full-featured XML library that extends Python's ElementTree API. One powerful tool is ability to query From 34cea0516d75489b1946cf92e3a6160ec7ff4f16 Mon Sep 17 00:00:00 2001 From: JackCollins1991 <55454098+JackCollins1991@users.noreply.github.com> Date: Sun, 7 Jan 2024 15:20:54 +0100 Subject: [PATCH 3/9] Update v2.3.0.rst --- doc/source/whatsnew/v2.3.0.rst | 1 + 1 file changed, 1 insertion(+) diff --git a/doc/source/whatsnew/v2.3.0.rst b/doc/source/whatsnew/v2.3.0.rst index 1f1b0c7d7195a..55a4ecfcb0eb4 100644 --- a/doc/source/whatsnew/v2.3.0.rst +++ b/doc/source/whatsnew/v2.3.0.rst @@ -206,6 +206,7 @@ Styler Other ^^^^^ +- Bug when building html documentation from ``doc\source\user_guide\io.rst`` no longer calls S3 bucket URL (:issue:`56592`) .. ***DO NOT USE THIS SECTION*** From e705ee450769c26a206cefe51f68a14f0175b3d3 Mon Sep 17 00:00:00 2001 From: JackCollins1991 <55454098+JackCollins1991@users.noreply.github.com> Date: Sun, 14 Jan 2024 14:23:29 +0100 Subject: [PATCH 4/9] immitating interactive mode For each S3 bucket code block, ideally we show what the output would be, but without making an actual call. Unfortunately, for several of the S3 buckets, there are issues with the code, which we must fix in another commit or PR. For now, the two S3 examples that do work, we edit to make the code block show what the output would have been if it had run successfully. Find details on issues in conversation on PR #56592 --- doc/source/user_guide/io.rst | 28 +++++++++++++++++++--------- doc/source/whatsnew/v2.3.0.rst | 2 -- 2 files changed, 19 insertions(+), 11 deletions(-) diff --git a/doc/source/user_guide/io.rst b/doc/source/user_guide/io.rst index 3be42596a0754..5dcb43f120d89 100644 --- a/doc/source/user_guide/io.rst +++ b/doc/source/user_guide/io.rst @@ -1717,11 +1717,18 @@ data by specifying an anonymous connection, such as .. code-block:: python - pd.read_csv( - "s3://ncei-wcsd-archive/data/processed/SH1305/18kHz/SaKe2013" - "-D20130523-T080854_to_SaKe2013-D20130523-T085643.csv", - storage_options={"anon": True}, - ) + >>> df = pd.read_csv( + ... "s3://ncei-wcsd-archive/data/processed/SH1305/18kHz/SaKe2013" + ... "-D20130523-T080854_to_SaKe2013-D20130523-T085643.csv", + ... storage_options={"anon": True}, + ...) + >>> df.columns + Index(['Ping_index', ' Distance_gps', ' Distance_vl', ' Ping_date', + ' Ping_time', ' Ping_milliseconds', ' Latitude', ' Longitude', + ' Depth_start', ' Depth_stop', ' Range_start', ' Range_stop', + ' Sample_count'], + dtype='object') + ``fsspec`` also allows complex URLs, for accessing data in compressed archives, local caching of files, and more. To locally cache the above @@ -3017,10 +3024,13 @@ Biomedical and Life Science Jorurnals: .. code-block:: python - pd.read_xml( - "s3://pmc-oa-opendata/oa_comm/xml/all/PMC1236943.xml", - xpath=".//journal-meta", - ) + >>> df = pd.read_xml( + ... "s3://pmc-oa-opendata/oa_comm/xml/all/PMC1236943.xml", + ... xpath=".//journal-meta", + ...) + >>> df.head(1) + journal-id journal-title issn publisher + 0 Cardiovasc Ultrasound Cardiovascular Ultrasound 1476-7120 NaN With `lxml`_ as default ``parser``, you access the full-featured XML library that extends Python's ElementTree API. One powerful tool is ability to query diff --git a/doc/source/whatsnew/v2.3.0.rst b/doc/source/whatsnew/v2.3.0.rst index 3b7cf5406c80f..ac9782b78c436 100644 --- a/doc/source/whatsnew/v2.3.0.rst +++ b/doc/source/whatsnew/v2.3.0.rst @@ -211,8 +211,6 @@ Styler Other ^^^^^ -- Bug when building html documentation from ``doc\source\user_guide\io.rst`` no longer calls S3 bucket URL (:issue:`56592`) - .. ***DO NOT USE THIS SECTION*** - From c57db5795797c9523048668c1eff90c42aaf7f04 Mon Sep 17 00:00:00 2001 From: JackCollins1991 <55454098+JackCollins1991@users.noreply.github.com> Date: Sun, 14 Jan 2024 14:48:41 +0100 Subject: [PATCH 5/9] Update io.rst Code still doesn't run, but at least unmatched } is no longer the issue. --- doc/source/user_guide/io.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/doc/source/user_guide/io.rst b/doc/source/user_guide/io.rst index 5dcb43f120d89..82433c9e734d7 100644 --- a/doc/source/user_guide/io.rst +++ b/doc/source/user_guide/io.rst @@ -1704,7 +1704,7 @@ option parameter: .. code-block:: python - storage_options = {"client_kwargs": {"endpoint_url": "http://127.0.0.1:5555"}}} + storage_options = {"client_kwargs": {"endpoint_url": "http://127.0.0.1:5555"}} df = pd.read_json("s3://pandas-test/test-1", storage_options=storage_options) More sample configurations and documentation can be found at `S3Fs documentation From 781216cae51e3303cfe47a79db5c09e9d6eb8626 Mon Sep 17 00:00:00 2001 From: JackCollins1991 <55454098+JackCollins1991@users.noreply.github.com> Date: Sun, 14 Jan 2024 15:46:37 +0100 Subject: [PATCH 6/9] Update v2.3.0.rst avoids unnecessary file change in PR --- doc/source/whatsnew/v2.3.0.rst | 1 + 1 file changed, 1 insertion(+) diff --git a/doc/source/whatsnew/v2.3.0.rst b/doc/source/whatsnew/v2.3.0.rst index ac9782b78c436..e217e8c8557bb 100644 --- a/doc/source/whatsnew/v2.3.0.rst +++ b/doc/source/whatsnew/v2.3.0.rst @@ -211,6 +211,7 @@ Styler Other ^^^^^ + .. ***DO NOT USE THIS SECTION*** - From 36f45386eb181c039dac80ddb9f6772311d058ce Mon Sep 17 00:00:00 2001 From: JackCollins1991 <55454098+JackCollins1991@users.noreply.github.com> Date: Sun, 14 Jan 2024 17:38:40 +0100 Subject: [PATCH 7/9] Update io.rst Rollback changes to one of the examples (out of scope) --- doc/source/user_guide/io.rst | 19 ++++++------------- 1 file changed, 6 insertions(+), 13 deletions(-) diff --git a/doc/source/user_guide/io.rst b/doc/source/user_guide/io.rst index 82433c9e734d7..f3a7e2643c618 100644 --- a/doc/source/user_guide/io.rst +++ b/doc/source/user_guide/io.rst @@ -1717,19 +1717,12 @@ data by specifying an anonymous connection, such as .. code-block:: python - >>> df = pd.read_csv( - ... "s3://ncei-wcsd-archive/data/processed/SH1305/18kHz/SaKe2013" - ... "-D20130523-T080854_to_SaKe2013-D20130523-T085643.csv", - ... storage_options={"anon": True}, - ...) - >>> df.columns - Index(['Ping_index', ' Distance_gps', ' Distance_vl', ' Ping_date', - ' Ping_time', ' Ping_milliseconds', ' Latitude', ' Longitude', - ' Depth_start', ' Depth_stop', ' Range_start', ' Range_stop', - ' Sample_count'], - dtype='object') - - + pd.read_csv( + "s3://ncei-wcsd-archive/data/processed/SH1305/18kHz/SaKe2013" + "-D20130523-T080854_to_SaKe2013-D20130523-T085643.csv", + storage_options={"anon": True}, + ) + ``fsspec`` also allows complex URLs, for accessing data in compressed archives, local caching of files, and more. To locally cache the above example, you would modify the call to From a2d3d3cfbed8be56b85cd7ee5977f95f4df4601e Mon Sep 17 00:00:00 2001 From: JackCollins1991 <55454098+JackCollins1991@users.noreply.github.com> Date: Sun, 14 Jan 2024 17:40:30 +0100 Subject: [PATCH 8/9] Update io.rst --- doc/source/user_guide/io.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/doc/source/user_guide/io.rst b/doc/source/user_guide/io.rst index f3a7e2643c618..4de4a2f3d91ef 100644 --- a/doc/source/user_guide/io.rst +++ b/doc/source/user_guide/io.rst @@ -1722,7 +1722,7 @@ data by specifying an anonymous connection, such as "-D20130523-T080854_to_SaKe2013-D20130523-T085643.csv", storage_options={"anon": True}, ) - + ``fsspec`` also allows complex URLs, for accessing data in compressed archives, local caching of files, and more. To locally cache the above example, you would modify the call to From 4fecd577f6f6a5cf1cb3cdbd5735390ec7d88c67 Mon Sep 17 00:00:00 2001 From: John Collins Date: Mon, 15 Jan 2024 09:14:40 +0100 Subject: [PATCH 9/9] Update io.rst --- doc/source/user_guide/io.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/doc/source/user_guide/io.rst b/doc/source/user_guide/io.rst index 4de4a2f3d91ef..bb5b4e056d527 100644 --- a/doc/source/user_guide/io.rst +++ b/doc/source/user_guide/io.rst @@ -3021,7 +3021,7 @@ Biomedical and Life Science Jorurnals: ... "s3://pmc-oa-opendata/oa_comm/xml/all/PMC1236943.xml", ... xpath=".//journal-meta", ...) - >>> df.head(1) + >>> df journal-id journal-title issn publisher 0 Cardiovasc Ultrasound Cardiovascular Ultrasound 1476-7120 NaN