Skip to content

ENH: Add a Parser to read fixed-width ASCII data using a data description file or dictionary file #7030

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
AllenDowney opened this issue May 3, 2014 · 5 comments

Comments

@AllenDowney
Copy link
Contributor

As an example, I would like to be able to read a dataset like this one:

http://www.cdc.gov/nchs/nsfg/nsfg_cycle6.htm

The data files themselves are ASCII with fixed-width fields. The variables names, types, and indices are in a separate data description file, available for SAS, SPSS and STATA. I would like to add a parser that reads at least one of these description files and then parses the data file. Since two files are used, it might require changes in the Parser API.

I am happy to write a parser that reads the dictionary file and then the data file. I could use help with either setting up the new parser ahead of time or (after the fact) integrating my code with the existing structure.

Also, is there a preference for the SAS, SPSS, or Stata format?

@jreback
Copy link
Contributor

jreback commented May 5, 2014

does pd.read_fwf(...) do a credible job with this (w/o the columns being specified and hence being inferred)?

@jreback jreback added the Data IO label May 5, 2014
@AllenDowney
Copy link
Contributor Author

Yes, once I have parsed the dictionary, it looks like I will be able to use
this to parse the data file.

But for the data files I am working with I don't think it would be possible
to infer the breaks between columns.

Thanks!
Allen

On Mon, May 5, 2014 at 9:21 AM, jreback [email protected] wrote:

does pd.read_fwf(...) do a credible job with this (w/o the columns being
specified and hence being inferred)?


Reply to this email directly or view it on GitHubhttps://github.com//issues/7030#issuecomment-42186589
.

@jreback jreback added this to the Someday milestone Mar 8, 2015
@tyler-abbot
Copy link

@AllenDowney I am thinking of working on this project. Did you ever get anywhere with it? Or do you have any sample data that you would like to see treated by a function?

@AllenDowney
Copy link
Contributor Author

Yes, I have a dataset and a hacky solution that might be a good example.
It's all in this repo

https://github.com/AllenDowney/MarriageNSFG

In thinkstats2.py, you'll see a function called ReadStataDct that returns a
FixedWidthVariables object that provides ReadFixedWidth

I am using it to read data from the NSFG; the data is also in the repo.
You can see an example in marriage.py, specifically the
function ReadFemResp.

Please let me know if I can help.

On Wed, Jun 24, 2015 at 9:00 AM, tyler-abbot [email protected]
wrote:

@AllenDowney https://github.com/AllenDowney I am thinking of working on
this project. Did you ever get anywhere with it? Or do you have any sample
data that you would like to see treated by a function?


Reply to this email directly or view it on GitHub
#7030 (comment).

@jbrockmendel jbrockmendel added IO Fixed Width read_fwf and removed IO Data IO issues that don't fit into a more specific label labels Dec 11, 2019
@jbrockmendel
Copy link
Member

Closing and adding to a tracker issue #30407 for IO format requests, can re-open if interest is expressed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants