Skip to content

BUG: None should be a valid colspec? #7079

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
AllenDowney opened this issue May 8, 2014 · 10 comments · Fixed by #7081
Closed

BUG: None should be a valid colspec? #7079

AllenDowney opened this issue May 8, 2014 · 10 comments · Fixed by #7081
Labels
Bug IO CSV read_csv, to_csv
Milestone

Comments

@AllenDowney
Copy link
Contributor

In https://github.com/pydata/pandas/blob/master/pandas/io/parsers.py, the init method for FixedWidthReaders checks whether the colspecs are valid. Since None is a value slice index, should it be allowed as a colspec?

This would help me because I don't know the width of the last column until I start reading the file.

class FixedWidthReader(object):
"""
A reader of fixed-width lines.
"""
def __init__(self, f, colspecs, delimiter, comment):
    self.f = f
    self.buffer = None
    self.delimiter = '\r\n' + delimiter if delimiter else '\n\r\t '
    self.comment = comment
    if colspecs == 'infer':
        self.colspecs = self.detect_colspecs()
    else:
        self.colspecs = colspecs

    if not isinstance(self.colspecs, (tuple, list)):
        raise TypeError("column specifications must be a list or tuple, "
                        "input was a %r" % type(colspecs).__name__)

    for colspec in self.colspecs:
        if not (isinstance(colspec, (tuple, list)) and
                len(colspec) == 2 and
                isinstance(colspec[0], (int, np.integer)) and
                isinstance(colspec[1], (int, np.integer))):
            raise TypeError('Each column specification must be '
                            '2 element tuple or list of integers')
@jreback
Copy link
Contributor

jreback commented May 8, 2014

this is not called directly by a user, rather you call read_fwf which validates the arguments, e.g. both colspects & widths cannot be None

@AllenDowney
Copy link
Contributor Author

Sorry, let me clarify: the issue is not that colspecs is None, but that
colspecs is a list of index pairs, and the index pairs should be allowed to
be None.

So these lines

            isinstance(colspec[0], (int, np.integer)) and
            isinstance(colspec[1], (int, np.integer))):

should be

            isinstance(colspec[0], (int, np.integer, NoneType)) and
            isinstance(colspec[1], (int, np.integer, NoneType))):

On Thu, May 8, 2014 at 3:12 PM, jreback [email protected] wrote:

this is not called directly by a user, rather you call read_fwf which
validates the arguments, e.g. both colspects & widths cannot be None


Reply to this email directly or view it on GitHubhttps://github.com//issues/7079#issuecomment-42592684
.

@jreback
Copy link
Contributor

jreback commented May 8, 2014

ok, can you put up an example that fails (that you think should work)

what does a None columnspec mean?

@AllenDowney
Copy link
Contributor Author

colspecs = [(None, None)]
frame = pd.read_fwf(dat_file, colspecs=colspecs)

None as a colspec should mean the same as None as a slice index. As the
first index, it is the same as 0; as the second index, it includes
everything to the end of the line.

So the example above should create a frame with a single column that
contain the entire line from the file.

On Thu, May 8, 2014 at 3:21 PM, jreback [email protected] wrote:

ok, can you put up an example that fails (that you think should work)

what does a None columnspec mean?


Reply to this email directly or view it on GitHubhttps://github.com//issues/7079#issuecomment-42593893
.

@jreback
Copy link
Contributor

jreback commented May 8, 2014

what I mean is a copy-pastable example, that shows the input and output

@AllenDowney
Copy link
Contributor Author

import pandas as pd
from cStringIO import StringIO

# this works
file_like = StringIO('foo')
colspecs = [(0, 3)]
frame = pd.read_fwf(file_like, colspecs=colspecs, header=None)
print frame

# this should work, and do the same thing, but it throws a failed

assertion
file_like = StringIO('foo')
colspecs = [(None, None)]
frame = pd.read_fwf(file_like, colspecs=colspecs, header=None)
print frame

On Thu, May 8, 2014 at 3:31 PM, jreback [email protected] wrote:

what I mean is a copy-pastable example, that shows the input and output


Reply to this email directly or view it on GitHubhttps://github.com//issues/7079#issuecomment-42595341
.

@TomAugspurger
Copy link
Contributor

Slicing until the end of the line seems correct. @AllenDowney I can submit a pull request to fix this if you want.

@AllenDowney
Copy link
Contributor Author

Yes, please. In retrospect I could have done that myself.

OTOH, I am running a stable release, so I am not really set up to modify
and test pandas.

On Thu, May 8, 2014 at 4:43 PM, Tom Augspurger [email protected]:

Slicing until the end of the line seems correct. @AllenDowneyhttps://github.com/AllenDowneyI can submit a pull request to fix this if you want.


Reply to this email directly or view it on GitHubhttps://github.com//issues/7079#issuecomment-42603703
.

@TomAugspurger
Copy link
Contributor

I think your time is more valuable writing great books :)

Thanks for the report!

@AllenDowney
Copy link
Contributor Author

Thank you!

On Thu, May 8, 2014 at 4:55 PM, Tom Augspurger [email protected]:

I think your time is more valuable writing great books :)

Thanks for the report!


Reply to this email directly or view it on GitHubhttps://github.com//issues/7079#issuecomment-42605067
.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug IO CSV read_csv, to_csv
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants