-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
BUG: Error when using usecols with names in read_csv #44147
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I think this is as expected, but the docs could be a bit clearer. If names is given, only the names are used, not the inferred column names from the file. Would you like to submit a PR to improve the documentation? |
Should this be the expected behavior though? Isn't that in contradiction with the second case?
I'd expect In fact, I tried the variations when using
|
Just to make @nancy-aggarwal examples clear: >>> import pandas as pd
>>>
>>> df = pd.DataFrame({'X': [1, 2, 3],
... 'V1': [34, 6, 5],
... 'V2': [4, 7, 3],
... 'V3': [ 5, 56, 2]})
>>>
>>> df.to_csv('df1.csv',index=False)
>>> df_4 = pd.read_csv('df1.csv'
... ,header = 0
... ,names = ['X','V1','V3'])
>>> print(df_4)
X V1 V3
1 34 4 5
2 6 7 56
3 5 3 2
>>> df_5 = pd.read_csv('df1.csv'
... ,header = 0
... ,names = ['T','y1','y3'])
>>> print(df_5)
T y1 y3
1 34 4 5
2 6 7 56
3 5 3 2 what is happening here, is that the first column is interpreted as index and names overrides the column names, just as it is stated in the documentation: |
Header=0 is in the call and typically headers are about rows and indexes are about columns. I understand what it's doing, but I don't understand why that's the expected behavior. Sorry :-/ For example, can you explain what would be a typical use case for the behavior in df_4, and why that's the default? And the behavior in df_4 and df_5 still don't explain (to me) why the line df_3 gives the error |
When using df = pd.DataFrame({'X': [1, 2, 3],
'V1': [34, 6, 5],
'V2': [4, 7, 3],
'V3': [ 5, 56, 2]})
df.to_csv('df1.csv',index=False)
## Now read.
df_4 = pd.read_csv('df1.csv' , header = 0, names = ['X','V1','V3'])
df_41 = pd.read_csv('df1.csv')
df_41.columns = ['X','V1','V3'] #same as df_4 |
Hi, I'd love to take this issue as my first contribution if that's alright :) I understand that it's a pretty small issue of minor ambiguity in the docs. I can just add a line in the docs to make this usage clear, and what's to be expected. (You can also suggest to me what should be added) |
Take |
I have checked that this issue has not already been reported.
I have confirmed this bug exists on the latest version of pandas.
I have confirmed this bug exists on the master branch of pandas.
Reproducible Example
Issue Description
When importing a csv into a dataframe, I want to import certain columns and rename some of them. See example below.
When doing this, I get the error message that "ValueError: Usecols do not match columns, columns expected but not found: ['X']"
The code works when I don't use the names option.
The code also works when I supply the column indices to usecols instead of column names in the csv file. This is prone to human error, so I'd like to avoid it.
Expected Behavior
Expect df_3 to be created with new column names.
Installed Versions
INSTALLED VERSIONS
commit : 945c9ed
python : 3.9.7.final.0
python-bits : 64
OS : Windows
OS-release : 10
Version : 10.0.19042
machine : AMD64
processor : Intel64 Family 6 Model 142 Stepping 12, GenuineIntel
byteorder : little
LC_ALL : None
LANG : en_US.UTF-8
LOCALE : English_United States.1252
pandas : 1.3.4
numpy : 1.21.3
pytz : 2021.3
dateutil : 2.8.2
pip : 21.3
setuptools : 57.4.0
Cython : None
pytest : None
hypothesis : None
sphinx : None
blosc : None
feather : None
xlsxwriter : None
lxml.etree : None
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : 3.0.2
IPython : 7.28.0
pandas_datareader: None
bs4 : None
bottleneck : None
fsspec : None
fastparquet : None
gcsfs : None
matplotlib : 3.4.3
numexpr : None
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : None
pyxlsb : None
s3fs : None
scipy : 1.7.1
sqlalchemy : None
tables : None
tabulate : None
xarray : None
xlrd : None
xlwt : None
numba : None
The text was updated successfully, but these errors were encountered: