-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
read_fwf does not take into account skiprows when inferring column width #11256
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
can you post a copy-pastable example which repros this? |
Here is a minimal example: import pandas as pd
with open('testfwf.txt', 'w') as f:
f.write("Text contained in the file header\n")
f.write("\n")
f.write("DataCol1 DataCol2\n")
f.write(" 0.0 1.0\n")
f.write(" 101.6 956.1\n")
skiprows = 2
print("Using the read_fwf function simply:")
df = pd.read_fwf('testfwf.txt', skiprows = skiprows)
print(list(df))
print("Manually skipping the text lines before using read_fwf:")
with open('testfwf.txt', 'r') as f:
for i in range(skiprows):
f.readline()
df = pd.read_fwf(f, encoding="utf8")
print(list(df)) It gives the following result:
I'm using Python 3.4.3 and pandas 0.16.2 |
ok, looks like a bug. pull-requests are welcome to fix! |
dsm054
added a commit
to dsm054/pandas
that referenced
this issue
Aug 18, 2016
4 tasks
dsm054
added a commit
to dsm054/pandas
that referenced
this issue
Jan 10, 2017
AnkurDedania
pushed a commit
to AnkurDedania/pandas
that referenced
this issue
Mar 21, 2017
Fix the fact that we don't skip the rows when inferring colspecs by passing skiprows down the chain until it's needed. - [X] closes pandas-dev#11256 - [X] 3 tests added / passed - [X] passes `git diff upstream/master | flake8 --diff` - [X] whatsnew entry Author: D.S. McNeil <[email protected]> Closes pandas-dev#14028 from dsm054/bugfix/fwf_skiprows and squashes the following commits: b5b3e66 [D.S. McNeil] BUG: read_fwf inference should respect skiprows (pandas-dev#11256)
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
read_fwf can automatically detect column widths, but that does not seem to take into account rows to be skipped (parameter skiprows).
So, if the file contains text before the data, column detection fails. It could be solved by using only the actual data lines (and header) to infer column widths.
I worked around that issue by opening the text file, calling readline as many times as necessary (= skiprows), and then calling read_fwf on that buffer (with skiprows=0)
The text was updated successfully, but these errors were encountered: