-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
Strip columns/column names in data frame of white spaces #14460
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
The main problem is exacerbated when you have duplicated column names. With The current pandas behaviour is hard to work with. Having to either write code to fix the mangling or write code to do our own header processing is far from optimal as it just duplicated what pandas does in a slightly different way. |
In general, you can use:
Regarding the |
@jorisvandenbossche do you have specific use case where stripping the column name before mangling would not be desired? That is a simple change and would provide what I think is the appropriate behaviour in the current implementation. I'll see if we can look into opening a PR allowing to remove the mangling. |
@dpinte No, I cannot think of one (apart from the one where the spaces do mean something to you, but never dealt with such data). But stripping the whitespace when mangling, and not in other cases would also introduce an inconsistency. |
See also #14367 for a similar recent issue |
@jorisvandenbossche thanks for the feedback. In the very short term, @rahulporuri can probably work on a PR to add stripping to column names for all the cases and exposing an option to turn it on/off. |
Over a year, it is still an issue for '0.23.0'. I can not believe it... This is a solution |
@liangbright it's not a question of belief, it's a question of someone doing it. A pull request to add this option is always welcome. |
@jorisvandenbossche |
This is more of a question than a bug.
A small, complete example of the issue
while opening a data file similar to
using
Observed Output
Index([u' a ', u' b ', u' c ', u' d '], dtype='object')
Expected Output
Index([u'a', u'b', u'c', u'd'], dtype='object')
We expected that the column names/columns be stripped of white spaces.
Apologies for the noise if this has already been reported or is being addressed.
Output of
pd.show_versions()
commit: None
python: 2.7.11.final.0
python-bits: 64
OS: Darwin
OS-release: 16.0.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: en_US.UTF-8
LANG: en_US.UTF-8
LOCALE: None.None
pandas: 0.19.0
nose: 1.3.7
pip: 8.1.2
setuptools: 23.1.0
Cython: 0.24
numpy: 1.10.4
scipy: None
statsmodels: None
xarray: None
IPython: 5.1.0
sphinx: 1.4.1
patsy: None
dateutil: 2.5.2
pytz: 2016.3
blosc: None
bottleneck: None
tables: None
numexpr: None
matplotlib: None
openpyxl: 2.4.0
xlrd: 1.0.0
xlwt: None
xlsxwriter: None
lxml: 3.6.0
bs4: 4.4.1
html5lib: 0.999
httplib2: None
apiclient: None
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: None
boto: None
pandas_datareader: None
The text was updated successfully, but these errors were encountered: