You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have noted the following Issues of clarity in the docstring description for read_csv for the header parameter:
The behavior associated with header=None is not explicitly defined.
Default behavior description is in terms of header=0 and header=None, neither of which have been clearly explained yet.
The relationship between file line numbers (which are conventionally numbered from 1) and row numbers/indices (which are indexed from 0) is not described explicity (only alluded to implicitly through examples of header=0 meaning the first line)
The description "if column names are passed explicitly" is vague as it doesn't explicitly mention how (i.e. via names parameter).
The detailed descriptions to not align with the order given in the inital list of accepted values
Suggested fix for documentation
Original docstring:
header : int, list of int, None, default 'infer'
Row number(s) to use as the column names, and the start of the data. Default behavior is to infer the column names: if no names are passed the behavior is identical to header=0 and column names are inferred from the first line of the file, if column names are passed explicitly then the behavior is identical to header=None. Explicitly pass header=0 to be able to replace existing names. The header can be a list of integers that specify row locations for a multi-index on the columns e.g. [0,1,3]. Intervening rows that are not specified will be skipped (e.g. 2 in this example is skipped). Note that this parameter ignores commented lines and empty lines if skip_blank_lines=True, so header=0 denotes the first line of data rather than the first line of the file. Issues of clarity noted in the docstring description:
Proposed change to address the issues:
header : int, list of int, None, default 'infer'
Index or indices corresponding to line number(s) in the CSV file that will be read as DataFrame column labels. Index 0 corresponds to the first line in the file (or the first non-blank, non-commented line if skip_blank_lines=True). The following arguments are valid:
Single int: denotes the line index at which column labels will be read.
List of int: denotes line indices at which column labels will be read as a multi-index. Note: intervening rows not specified in the list will be skipped (e.g., for header=[0,1,3], the line at index 2 will be skipped).
None: indicates that none of the lines in the file will be interpreted as headers and columns will instead be labelled by column index (or by values passed to the names parameter when provided). This is typically for files with no header. If the file has a header which the user intends to override with the names parameter, header should be assigned 0 instead of None.
'infer' (default): behaves as header=0 if no names were passed, otherwise as header=None.
Note: this is more in line with how the read_excel function is described which would enhance consistency between the two similar functions as well. I would also like to propose making further edits to other parameter descriptions in the function but wanted to gauge support for this first by keying in on a specific example.
The text was updated successfully, but these errors were encountered:
Pandas version checks
main
hereLocation of the documentation
https://pandas.pydata.org/docs/reference/api/pandas.read_csv.html
Documentation problem
I have noted the following Issues of clarity in the docstring description for
read_csv
for theheader
parameter:header=None
is not explicitly defined.header=0
andheader=None
, neither of which have been clearly explained yet.header=0
meaning the first line)names
parameter).Suggested fix for documentation
Original docstring:
Proposed change to address the issues:
Note: this is more in line with how the
read_excel
function is described which would enhance consistency between the two similar functions as well. I would also like to propose making further edits to other parameter descriptions in the function but wanted to gauge support for this first by keying in on a specific example.The text was updated successfully, but these errors were encountered: