-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
API: read_csv, to_csv line_terminator keyword inconsistency #35399
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Closed
Closed
Changes from all commits
Commits
Show all changes
29 commits
Select commit
Hold shift + click to select a range
30c9b83
add values.dtype.kind==f branch to array_with_unit_datetime
arw2019 2f25460
merge with master
arw2019 572363a
revert pandas/_libs/tslib.pyx
arw2019 b891030
merge with master
arw2019 ecd8ce3
merge with master
arw2019 ee55191
merge with master
arw2019 292fcdc
merge with master
arw2019 9e4ac71
Merge remote-tracking branch 'upstream/master'
arw2019 1d0ba61
merge with master
arw2019 b59831e
Merge branch 'master' of https://github.com/arw2019/pandas
arw2019 b954874
Merge remote-tracking branch 'upstream/master'
arw2019 ac0a7f1
merge with master
arw2019 bc55716
added line_terminator arg to read_csv
arw2019 ee69a76
added line_terminator, lineterminator args + tests
arw2019 4d00fea
merge with master
arw2019 c015da5
Merge remote-tracking branch 'upstream/master'
arw2019 73d6d11
fix csv api using kwargs
arw2019 1a6497f
TST: remove failing test - read_csv takes kwargs now
arw2019 3a88ef0
add space between kwargs and colon in docstring
arw2019 7fe8274
DOC: remove the semicolon after kwargs
arw2019 1c27b2c
added line_terminator arg to read_csv
arw2019 1912aa2
added line_terminator, lineterminator args + tests
arw2019 f54df81
fix csv api using kwargs
arw2019 cea28d8
TST: remove failing test - read_csv takes kwargs now
arw2019 85ddf44
add space between kwargs and colon in docstring
arw2019 a28657c
DOC: remove the semicolon after kwargs
arw2019 2b1333f
Merge branch 'csv-api' of https://github.com/arw2019/pandas into csv-api
arw2019 0786617
merge with master
arw2019 5e87bbc
small changes to docstrings
arw2019 File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -27,7 +27,7 @@ | |
ParserError, | ||
ParserWarning, | ||
) | ||
from pandas.util._decorators import Appender | ||
from pandas.util._decorators import Appender, _get_alias_from_kwargs | ||
|
||
from pandas.core.dtypes.cast import astype_nansafe | ||
from pandas.core.dtypes.common import ( | ||
|
@@ -285,7 +285,7 @@ | |
Thousands separator. | ||
decimal : str, default '.' | ||
Character to recognize as decimal point (e.g. use ',' for European data). | ||
lineterminator : str (length 1), optional | ||
line_terminator : str (length 1), optional | ||
Character to break file into lines. Only valid with C parser. | ||
quotechar : str (length 1), optional | ||
The character used to denote the start and end of a quoted item. Quoted | ||
|
@@ -346,6 +346,11 @@ | |
values. The options are `None` for the ordinary converter, | ||
`high` for the high-precision converter, and `round_trip` for the | ||
round-trip converter. | ||
kwargs | ||
Additional keyword arguments passed to ``pd.read_csv`` for compatibility | ||
with `csv` module. Include `lineterminator` (an alias of `line_terminator`). | ||
|
||
.. versionadded:: 1.2.0 | ||
|
||
Returns | ||
------- | ||
|
@@ -580,7 +585,7 @@ def read_csv( | |
compression="infer", | ||
thousands=None, | ||
decimal: str = ".", | ||
lineterminator=None, | ||
line_terminator=None, | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. again, in the signature, I think we should only have the one parameter and the compatibility keyword accepted though **kwargs |
||
quotechar='"', | ||
quoting=csv.QUOTE_MINIMAL, | ||
doublequote=True, | ||
|
@@ -597,6 +602,7 @@ def read_csv( | |
memory_map=False, | ||
float_precision=None, | ||
storage_options=None, | ||
**kwargs, | ||
): | ||
# gh-23761 | ||
# | ||
|
@@ -634,6 +640,8 @@ def read_csv( | |
engine = "c" | ||
engine_specified = False | ||
|
||
kwargs.setdefault("lineterminator", line_terminator) | ||
|
||
kwds.update( | ||
delimiter=delimiter, | ||
engine=engine, | ||
|
@@ -645,7 +653,6 @@ def read_csv( | |
quotechar=quotechar, | ||
quoting=quoting, | ||
skipinitialspace=skipinitialspace, | ||
lineterminator=lineterminator, | ||
header=header, | ||
index_col=index_col, | ||
names=names, | ||
|
@@ -684,6 +691,7 @@ def read_csv( | |
infer_datetime_format=infer_datetime_format, | ||
skip_blank_lines=skip_blank_lines, | ||
storage_options=storage_options, | ||
**kwargs, | ||
) | ||
|
||
return _read(filepath_or_buffer, kwds) | ||
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know that @simonjayhawkins and I have a different point of view here but I am strongly against adding kwargs to read_csv. The signature is already massive and adding this only makes things worse
I think we either stick to adding just one keyword arg or just leave this for now
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The other reason i'm not keen on adding the alias to the signature is that we also have doublequote, escapechar, quotechar and skipinitialspace as parameters to read_csv. So if we have both line_terminator and lineterminator, some bright spark will want snake case equivalents of the others.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would you guys be happy with @jbrockmendel solution? I think it doesn't add kwargs and
lineterminator
won't appear in the signatureThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A decorator would be a good solution, however some of our decorators 'lose' the signature in the docs e.g. https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.to_stata.html
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the goal is to keep both ad infinitum, it may be possible to mostly use
deprecate_kwarg
and just eat the warning.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@simonjayhawkins The "solution" to that issue is probably for the wrapper to rewrite the docstring with an explicit signature that can be created using
Signature
within the docstring. For example, after post processing the docstring forwould become
Of course, this would be for a different PR. The string variant is the method used to document the signature of Cython functions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@simonjayhawkins I tried adding this one to
read_csv
and compiling. Seems ok but I can't say how robust that is