ignore newline while reading files? #4917

jseabold · 2013-09-21T16:13:40Z

Apparently this [1] is a thing. Is there a way to specify to read_table or read_fwf to read X number of variables. I was hoping something like this might work. I'm not positive I got the regex right yet, but the idea is match any consecutive space even optionally followed by a newline character. I don't know if it's worth supporting, but it seems like maybe it should be.

url = "http://cameron.econ.ucdavis.edu/mmabook/patr7079.asc"
names = ["CUSIP", "ARDSSIC", "SCISECT", "LOGK", "SUMPAT", "LOGR70",
         "LOGR71", "LOGR72", "LOGR73", "", "LOGR74", "LOGR75", "LOGR76",
         "LOGR77", "LOGR78", "LOGR79", "PAT70", "PAT71", "PAT72", "PAT73",
         "PAT74", "PAT75", "PAT76", "PAT77", "PAT78", "PAT79"]
dta = pd.read_table(url, sep=r" *(?=\n)?", names=names)

[1] http://cameron.econ.ucdavis.edu/mmabook/patr7079.asc

The text was updated successfully, but these errors were encountered:

jreback · 2013-09-21T16:19:50Z

shouldn't you use read_fwf?

see also: #4488

jseabold · 2013-09-21T16:26:41Z

Yep, that's what I want. Funny timing. So I guess this is in the pipeline. Good to hear. Closing.

I suppose, though, that it doesn't have to be fixed width...

jreback · 2013-09-21T16:28:04Z

@jseabold you can do read_fwf now!

the issue above is about detecting the widths properly

jseabold · 2013-09-21T16:32:10Z

I was hoping to avoid counting / scraping the format too.

jreback · 2013-09-21T16:35:05Z

@jseabold yep....and \s+ doesn't work? (as there are fields that don't have a marker)

jseabold · 2013-09-21T16:39:57Z

The "rows" are split across line terminators.

jreback · 2013-09-21T16:49:00Z

why don't you read them in as normal then just separate out after, eg df.iloc[::2] and df.iloc[1::2](assuming 2 lines per row)

jseabold · 2013-09-21T16:50:54Z

That's what I ended up doing, but it'd be nice to have this in one command, which is what that other issue is proposing AFAICT E.g., in stata it's

infile varlist using url

jseabold · 2013-09-21T16:52:56Z

It's also messy because the lines are not split with the same number of variables across rows. It's mainly just a pain and I think it could be easier.

jreback · 2013-09-21T17:02:29Z

you could read in with normal seps
the unstack then chop at the boundaries

weird way to format the file

jseabold closed this as completed Sep 21, 2013

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ignore newline while reading files? #4917

ignore newline while reading files? #4917

jseabold commented Sep 21, 2013

jreback commented Sep 21, 2013

jseabold commented Sep 21, 2013

jreback commented Sep 21, 2013

jseabold commented Sep 21, 2013

jreback commented Sep 21, 2013

jseabold commented Sep 21, 2013

jreback commented Sep 21, 2013

jseabold commented Sep 21, 2013

jseabold commented Sep 21, 2013

jreback commented Sep 21, 2013

ignore newline while reading files? #4917

ignore newline while reading files? #4917

Comments

jseabold commented Sep 21, 2013

jreback commented Sep 21, 2013

jseabold commented Sep 21, 2013

jreback commented Sep 21, 2013

jseabold commented Sep 21, 2013

jreback commented Sep 21, 2013

jseabold commented Sep 21, 2013

jreback commented Sep 21, 2013

jseabold commented Sep 21, 2013

jseabold commented Sep 21, 2013

jreback commented Sep 21, 2013