Read csv headers #37966

cdknox · 2020-11-20T00:02:47Z

closes ENH: Change Pandas User-Agent and add possibility to set custom http_headers to pd.read_* functions #36688
tests added / passed
passes black pandas
passes git diff upstream/master -u -- "*.py" | flake8 --diff
whatsnew entry

jreback · 2020-11-21T00:08:45Z

doc/source/user_guide/io.rst

doc/source/whatsnew/v1.2.0.rst

twoertwein

Great work! The only part missing is the documentation update in pandas/core/shared_docs.py and making sure that all functions that take storage_options work as expected with it. imho I would only add a new test for parquet, the other functions forward everything directly to get_handle.

doc/source/user_guide/io.rst

pandas/tests/io/test_common.py

pandas/core/shared_docs.py

jreback · 2020-11-22T18:08:55Z

we have tons of fixtures eg s3_resources

pls try to not reinvent the wheel

if u must mock then use monkeypatch

prefer not to mock at all

pandas/tests/io/test_common.py

pep8speaks · 2020-12-03T04:06:43Z

Hello @cdknox! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2020-12-13 23:17:56 UTC

The default compression on to_parquet is not installed on one of the test environments. It should work just the same with no compression anyway.

…H36688

pandas/tests/io/test_common.py

cdknox · 2020-12-12T23:03:11Z

I'm sure folks are busy; I just wanted to make sure no one is waiting on me for anything. I think I've addressed the comments so far, but if I'm missing something or I need to do something else please let me know!

pandas/io/common.py

doc/source/user_guide/io.rst

doc/source/whatsnew/v1.2.0.rst

pandas/io/parquet.py

pandas/tests/io/test_common.py

pandas/io/common.py

pandas/tests/io/test_user_agent.py

jreback · 2020-12-15T00:35:05Z

thanks @cdknox very nice

cdknox · 2020-12-15T01:00:17Z

Glad to help as a long time pandas user! Just double checking, we're good without the change then from storage_options={"User-Agent": "custom_user_agent"} to storage_options={"headers": {"User-Agent": "custom_user_agent"}}? I'm good with it either way

simonjayhawkins · 2021-06-07T11:55:53Z

pandas/io/parquet.py

 def read_parquet(
    path,
    engine: str = "auto",
    columns=None,
+    storage_options: StorageOptions = None,


This new parameter should maybe be added after use_nullable_dtypes

cdknox added 7 commits November 14, 2020 18:37

storage_options as headers and tests added

bb3e8e6

additional tests - gzip, test additional headers receipt

db51474

bailed on using threading for testing

6f901b8

clean up comments add json http tests

3af6a3d

Merge branch 'master' into read_csv_headers to update

bad5739

added documentation on storage_options for headers

8f5a0f1

DOC:Added doc for custom HTTP headers in read_csv and read_json

9fcc72a

jreback added the IO CSV read_csv, to_csv label Nov 21, 2020

jreback requested changes Nov 21, 2020

View reviewed changes

doc/source/user_guide/io.rst Show resolved Hide resolved

doc/source/whatsnew/v1.2.0.rst Outdated Show resolved Hide resolved

DOC:Corrected versionadded tag and added issue number for reference

df6e539

twoertwein reviewed Nov 21, 2020

View reviewed changes

doc/source/user_guide/io.rst Outdated Show resolved Hide resolved

pandas/tests/io/test_common.py Outdated Show resolved Hide resolved

pandas/tests/io/test_common.py Outdated Show resolved Hide resolved

pandas/tests/io/test_common.py Outdated Show resolved Hide resolved

cdknox added 7 commits November 21, 2020 13:53

DOC:updated storage_options documentation

98db1c4

TST:updated with tm.assert_frame_equal

f28f36c

TST:fixed incorrect usage of tm.assert_frame_equal

dd3265f

CLN:reordered imports to fix pre-commit error

02fc840

DOC:changed whatsnew and added to shared_docs.py GH36688

da97f0a

ENH: read nonfsspec URL with headers built from storage_options GH36688

fce4b17

TST:Added additional tests parquet and other read methods GH36688

e0cfcb6

twoertwein reviewed Nov 22, 2020

View reviewed changes

pandas/core/shared_docs.py Outdated Show resolved Hide resolved

arw2019 mentioned this pull request Nov 27, 2020

ENH: Add headers paramater to read_json and read_csv #36754

Closed

5 tasks

ivanovmg reviewed Nov 30, 2020

View reviewed changes

pandas/tests/io/test_common.py Outdated Show resolved Hide resolved

cdknox and others added 3 commits December 2, 2020 21:34

TST:removed mocking in favor of threaded http server

33115b7

DOC:refined storage_options doscstring

5a1c64e

Merge branch 'master' into read_csv_headers

018a399

cdknox added 3 commits December 2, 2020 23:09

CLN:used the github editor and had pep8 issues

87d7dc6

CLN: leftover comment removed

64a0d19

TST:attempted to address test warning of unclosed socket GH36688

1724e9b

cdknox added 8 commits December 4, 2020 01:02

CLN:try to silence mypy error via renaming GH36688

8a5c5a3

TST:pytest.importorfail replaced with pytest.skip GH36688

978d94a

TST:content of dataframe on error made more useful GH36688

807eb25

CLN:fixed flake8 error GH36688

44c2869

TST: windows fastparquet error needs raised for troubleshooting GH36688

01ce3ae

CLN:fix for flake8 GH36688

13bc775

TST:changed compression used in to_parquet from 'snappy' to None GH36688

6915517

The default compression on to_parquet is not installed on one of the test environments. It should work just the same with no compression anyway.

TST:allowed exceptions to be raised via removing a try except block G…

186b0a4

…H36688

twoertwein reviewed Dec 4, 2020

View reviewed changes

pandas/tests/io/test_common.py Outdated Show resolved Hide resolved

TST:replaced try except with pytest.importorskip GH36688

88e9600

stragu mentioned this pull request Dec 11, 2020

ENH: Change Pandas User-Agent and add possibility to set custom http_headers to pd.read_* functions #36688

Closed

twoertwein reviewed Dec 12, 2020

View reviewed changes

pandas/io/common.py Outdated Show resolved Hide resolved

twoertwein reviewed Dec 12, 2020

View reviewed changes

pandas/io/common.py Show resolved Hide resolved

CLN:removed dict() in favor of {} GH36688

2a05d0f

jreback requested changes Dec 13, 2020

View reviewed changes

cdknox added 6 commits December 13, 2020 12:43

Merge branch 'master' into read_csv_headers

d38a813

DOC: changed potentially included version from 1.2.0 to 1.3.0 GH36688

268e06a

TST:user agent tests moved from test_common to their own file GH36688

565197f

TST: used fsspec instead of patching bytesio GH36688

842e594

TST: added importorskip for fsspec on FastParquet test GH36688

c0c3d34

TST:added missing importorskip to fsspec in another test GH36688

7025abb

jreback requested changes Dec 14, 2020

View reviewed changes

pandas/io/common.py Show resolved Hide resolved

pandas/io/common.py Show resolved Hide resolved

pandas/tests/io/test_user_agent.py Show resolved Hide resolved

jreback added this to the 1.3 milestone Dec 14, 2020

jreback approved these changes Dec 15, 2020

View reviewed changes

jreback merged commit a226941 into pandas-dev:master Dec 15, 2020

luckyvs1 pushed a commit to luckyvs1/pandas that referenced this pull request Jan 20, 2021

Read csv headers (pandas-dev#37966)

2b847a5

simonjayhawkins reviewed Jun 7, 2021

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Read csv headers #37966

Read csv headers #37966

cdknox commented Nov 20, 2020

jreback commented Nov 21, 2020

twoertwein left a comment

jreback commented Nov 22, 2020

pep8speaks commented Dec 3, 2020 •

edited

Loading

cdknox commented Dec 12, 2020

jreback commented Dec 15, 2020

cdknox commented Dec 15, 2020

simonjayhawkins Jun 7, 2021

Read csv headers #37966

Read csv headers #37966

Conversation

cdknox commented Nov 20, 2020

jreback commented Nov 21, 2020

twoertwein left a comment

Choose a reason for hiding this comment

jreback commented Nov 22, 2020

pep8speaks commented Dec 3, 2020 • edited Loading

Comment last updated at 2020-12-13 23:17:56 UTC

cdknox commented Dec 12, 2020

jreback commented Dec 15, 2020

cdknox commented Dec 15, 2020

simonjayhawkins Jun 7, 2021

Choose a reason for hiding this comment

pep8speaks commented Dec 3, 2020 •

edited

Loading