-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
Strip whitespace from column names when usecols in read_csv #14480
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
This just boils down to the issue you already reported. Pandas does not strip whitespace from the columns, so your actual column names are |
yes. Do we want |
you can already do #14234, or post-strip with |
I think it is possibly useful, but that's already discussed in the other issue (#14460). So let's close this one. |
ohh. my bad. i didn't know that we could pass a callable to |
@rahulporuri the callables are at the moment only a PR for enhancement, not in master or released version. |
pretty much all software i use including excel write column names in csv with spaces: "a, b, c, ...". Pandas should be able to read these correctly as intended names of columns are a b c. This is obvious behaviour and not a bug. Hence if pandas think its "b " "c " etc then that is a bug. People often forget that bug is not only when something blows up in your face. Bug is also when something behaves differently from what is reasonable to expect. And this is exactly the case. |
A small, complete example of the issue
when loading a file of the type, where headers have a trailing whitespace,
I would expect the following code to work and give the result
Expected Output
Actual Output
Neither the
c
nor thepython
engine produce the expected result.the tracebacks have been concatenated for brevity.
This is related to an issue reported earlier #14460 on stripping columns/column names of whitespaces.
On a side note, if the file has column names with leading whitespaces instead of trailing whitespaces, adding the
skipinitialspace=True
kwarg topandas.read_table
produces the expected result.Output of
pd.show_versions()
commit: 794f792
python: 2.7.11.final.0
python-bits: 64
OS: Darwin
OS-release: 16.0.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: en_US.UTF-8
LANG: en_US.UTF-8
LOCALE: None.None
pandas: 0.19.0+27.g794f792
nose: None
pip: 8.1.2
setuptools: 23.1.0
Cython: 0.24
numpy: 1.10.4
scipy: 0.17.1
statsmodels: None
xarray: 0.7.2
IPython: 4.1.2
sphinx: 1.4.1
patsy: None
dateutil: 2.5.2
pytz: 2016.7
blosc: None
bottleneck: None
tables: 3.2.2
numexpr: 2.6.0
matplotlib: 1.5.1
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: None
httplib2: None
apiclient: None
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: 2.8
boto: None
pandas_datareader: None
The text was updated successfully, but these errors were encountered: