DOC: s3fs is required when using `read_csv` with an S3 URI #35206

MartinThoma · 2020-07-10T06:25:50Z

Location of the documentation

Documentation problem

I've just noticed that s3fs is required when you read an URL from s3. While it is documented that you can read from S3, the implication that you need to install an extra is not documented.

Also, it would be nice if this was a pandas extra in setup.py (e.g. s3).

The text was updated successfully, but these errors were encountered:

jorisvandenbossche · 2020-07-10T10:17:59Z

The user guide mentions it: https://pandas.pydata.org/docs/user_guide/io.html#reading-remote-files, and the install guide as well: https://pandas.pydata.org/docs/getting_started/install.html#optional-dependencies

I think it would probably be too much to list all optional dependencies in the read_csv docstring as well (S3 is one, but eg Azure or Google Cloud need other optional deps), but we should maybe mention it in general that additional deps might be needed and link to one the other places where this is explained?

MartinThoma · 2020-07-10T14:14:06Z

I wasn't aware that there are even more 😱

we should maybe mention it in general that additional deps might be needed

Sounds good! Should I make a PR?

jorisvandenbossche · 2020-07-11T18:50:39Z

Yes, PR very welcome!

abdoulayegk · 2020-07-24T15:23:55Z

hello, can I make a PR cuz till now nobody makes it yet?

MartinThoma · 2020-07-24T17:56:51Z

@abdoulayegk Oops, sorry, I forgot. Please go ahead if you want to take care of that :-)

alecglassford · 2020-07-30T01:23:22Z

Perhaps this has already been noted, but it looks like fsspec also needs to be installed in addition to s3fs or gcsfs (related PR: ENH: add fsspec support #34266). This is reflected in the optional dependencies list but it's not necessarily obvious on first glance. It might be nice if the relevant rows noted this requirement, for example (my addition in bold):

Dependency Minimum Version Notes

gcsfs 0.6.0 Google Cloud Storage access (must be used with fsspec)

s3fs 0.4.0 Amazon S3 access (must be used with fsspec)
If you're adding a link in the read_csv docstring to the optional dependencies list, it likely makes sense to add an identical link to the docstrings of other pandas.read_{format} methods. I'm not sure it applies to all of them, but at least pandas.read_json and pandas.read_excel.
I couldn't find a list of all the supported filesystems anywhere; the most comprehensive listing I found is this release note. Given that fsspec supports many filesystems, maybe it's not feasible to list them all (and keep up with a potentially growing list); however, the reading remote files section of the IO doc could be updated to link to the fsspec documentation for users to learn about additional compatible filesystems. (Unfortunately, I couldn't find a more concise list of supported filesystems in the fsspec documentation than the source code that I just linked to.)

Sorry if these are beyond the scope of this issue! They seemed closely related, so I thought that I would note these gaps here rather than create a new issue.

MartinThoma added Docs Needs Triage Issue that has not been reviewed by a pandas team member labels Jul 10, 2020

simonjayhawkins removed the Needs Triage Issue that has not been reviewed by a pandas team member label Jul 10, 2020

jorisvandenbossche added this to the Contributions Welcome milestone Jul 11, 2020

person142 mentioned this issue Jan 14, 2021

ENH: add additional extras_require sections for optional dependencies #39164

Closed

mroeschke added IO CSV read_csv, to_csv IO Network Local or Cloud (AWS, GCS, etc.) IO Issues labels Aug 8, 2021

mroeschke removed this from the Contributions Welcome milestone Oct 13, 2022

phofl mentioned this issue Apr 17, 2023

Small documentation improvements noatamir/pyladies-workshop#6

Open

12 tasks

phofl mentioned this issue Jun 23, 2024

Small documentation fixes phofl/pydata-yerevan-sprint#2

Open

8 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DOC: s3fs is required when using `read_csv` with an S3 URI #35206

DOC: s3fs is required when using `read_csv` with an S3 URI #35206

MartinThoma commented Jul 10, 2020

jorisvandenbossche commented Jul 10, 2020

MartinThoma commented Jul 10, 2020

jorisvandenbossche commented Jul 11, 2020

abdoulayegk commented Jul 24, 2020

MartinThoma commented Jul 24, 2020

alecglassford commented Jul 30, 2020

DOC: s3fs is required when using read_csv with an S3 URI #35206

DOC: s3fs is required when using read_csv with an S3 URI #35206

Comments

MartinThoma commented Jul 10, 2020

Location of the documentation

Documentation problem

jorisvandenbossche commented Jul 10, 2020

MartinThoma commented Jul 10, 2020

jorisvandenbossche commented Jul 11, 2020

abdoulayegk commented Jul 24, 2020

MartinThoma commented Jul 24, 2020

alecglassford commented Jul 30, 2020

DOC: s3fs is required when using `read_csv` with an S3 URI #35206

DOC: s3fs is required when using `read_csv` with an S3 URI #35206