-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
read_fwf does not use comment character if colspecs argument does not include first column #14135
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
kind of surpised that the comment works at all, it isn't tested and disambiguating the comment vs the colspecs is certainly not tested. that said, comment determination works just fine as FWF inherits from the PythonParser. I'll mark it, pull-requests with tests welcome. |
This appears to work now on master. Could use a test
|
take |
Specs:
I think the bug still exists but on a deeper level. If I understand the problem correctly, from io import StringIO
import pandas as pd
from pandas.io.parsers import read_fwf
data = [
"#\n123\n456",
"# \n123\n456",
"#abc\n123\n456",
"# abc\n123\n456",
]
colspecs = [(0, 1), (1, 2)]
print(f"colspecs: {colspecs}")
for s in data:
df = read_fwf(StringIO(s), comment="#", colspecs=colspecs, header=None)
print(f"{df}\n")
colspecs = [(1, 2), (2, 3)]
print(f"colspecs: {colspecs}")
for s in data:
df = read_fwf(StringIO(s), comment="#", colspecs=colspecs, header=None)
print(f"{df}\n") Output:
The last two results are the interesting ones. In short, if you set
Please let me know if I made a mistake somewhere. |
Here is the test code I made: @pytest.mark.parametrize(
"data",
[
"#\n123\n456",
"# \n123\n456",
"#abc\n123\n456",
"# abc\n123\n456",
],
)
@pytest.mark.parametrize(
"colspecs, expected",
[
([(0, 1), (1, 2)], DataFrame([[1, 2], [4, 5]])),
([(1, 2), (2, 3)], DataFrame([[2, 3], [5, 6]])),
],
)
def test_comment_no_colspecs(data, colspecs, expected):
result = read_fwf(StringIO(data), comment="#", colspecs=colspecs, header=None)
tm.assert_frame_equal(result, expected) |
When reading fixed-width files using the
read_fwf
function, it is possible to specify a comment character using thecomment
argument. I expected that all lines beginning with the comment character would be ignored. However, if you do not specify the first column in the file in any column incolspecs
, the comment character does not appear to be used.Code Sample
Expected Output
I expect that the final line should have the same result as the other three that is:
Should result in
I first reported this on StackOverflow.
output of
pd.show_versions()
INSTALLED VERSIONS
commit: None
python: 3.5.2.final.0
python-bits: 64
OS: Linux
OS-release: 4.6.5-x86_64-linode71
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
pandas: 0.18.1
nose: None
pip: 8.1.2
setuptools: 3.3
Cython: None
numpy: 1.11.1
scipy: None
statsmodels: None
xarray: None
IPython: None
sphinx: None
patsy: None
dateutil: 2.5.3
pytz: 2016.6.1
blosc: None
bottleneck: None
tables: None
numexpr: None
matplotlib: None
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: 0.999
httplib2: None
apiclient: None
sqlalchemy: 1.0.14
pymysql: None
psycopg2: 2.6.2 (dt dec pq3 ext lo64)
jinja2: None
boto: None
pandas_datareader: None
The text was updated successfully, but these errors were encountered: