Skip to content

BUG: parser can handle a common_format multi-column index (no row index cols), (GH4702) #5298

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Oct 24, 2013

Conversation

jreback
Copy link
Contributor

@jreback jreback commented Oct 22, 2013

closes #4702, should help with #5254

Will handle the format generated by to_csv (which has a an 'extra' line for the index names)
and the more 'common' format

In [4]:         data = """,a,a,a,b,c,c
   ...: ,q,r,s,t,u,v
   ...: one,1,2,3,4,5,6
   ...: two,7,8,9,10,11,12"""

In [6]: read_csv(StringIO(data),header=[0,1],index_col=0)
Out[6]: 
     a         b   c    
     q  r  s   t   u   v
one  1  2  3   4   5   6
two  7  8  9  10  11  12

equiv to

df = DataFrame([[1,2,3,4,5,6],[7,8,9,10,11,12]],
                       index=['one','two'],
                       columns=MultiIndex.from_tuples([('a','q'),('a','r'),('a','s'),
                                                  ('b','t'),('c','u'),('c','v')]))

no index col

In [1]: data = """a,a,a,b,c,c
   ...: q,r,s,t,u,v
   ...: 1,2,3,4,5,6
   ...: 7,8,9,10,11,12"""

In [2]: read_csv(StringIO(data),header=[0,1],index_col=None)
Out[2]: 
   a         b   c    
   q  r  s   t   u   v
0  1  2  3   4   5   6
1  7  8  9  10  11  12

@jtratner
Copy link
Contributor

This is great! What is the expected behavior if first column has a label in
the first two rows? (say I explicitly tell it that 0 + 1 are headers and 0
is index col.

@jreback
Copy link
Contributor Author

jreback commented Oct 23, 2013

that is the example that is up there

@jtratner
Copy link
Contributor

@jreback your example starts out with a comma, so doesn't that mean the cell in the first two rows first column is empty?

@jreback
Copy link
Contributor Author

jreback commented Oct 23, 2013

@jtratner I put another example up, no index col specified

@jreback
Copy link
Contributor Author

jreback commented Oct 24, 2013

@jtratner what you were looking for?

@jtratner
Copy link
Contributor

I'm guessing this works, but I mean ambiguous-ish cases:

In [1]: data = """a,a,a,b,c,c
   ...: q,r,s,t,u,v
   ...: 1,2,3,4,5,6
   ...: 7,8,9,10,11,12"""

read_csv(StringIO(data),header=[0,1],index_col=0)

What should this do? Raise? Assume that a and q are labels for MI levels?

Similar question here

In [1]: data = """,a,a,b,c,c
   ...: q,r,s,t,u,v
   ...: 1,2,3,4,5,6
   ...: 7,8,9,10,11,12"""

read_csv(StringIO(data),header=[0,1],index_col=0)

(it takes my old mac a long time to compile, so I'm trying to pull down your branch now).

And similarly what happens with MI header and MI index - 😄

In [1]: data = """,a,a,b,c,c
   ...: q,r,s,t,u,v
   ...: 1,2,3,4,5,6
   ...: 7,8,9,10,11,12"""

read_csv(StringIO(data),header=[0,1],index_col=[0, 1])

@jreback
Copy link
Contributor Author

jreback commented Oct 24, 2013

@jtratner your examples

First one is user error

In [7]: In [1]: data = """a,a,a,b,c,c
   ...:    ...: q,r,s,t,u,v
   ...:    ...: 1,2,3,4,5,6
   ...:    ...: 7,8,9,10,11,12"""

In [13]: read_csv(StringIO(data),header=[0,1],index_col=0)
Out[13]: 
a  a      b   c    
q  r  s   t   u   v
1  2  3   4   5   6
7  8  9  10  11  12

In [14]: read_csv(StringIO(data),header=[0,1])
Out[14]: 
   a         b   c    
   q  r  s   t   u   v
0  1  2  3   4   5   6
1  7  8  9  10  11  12

2nd is user error, but the data is corrupt (missing first label in top-level).
maybe can detect; though

In [15]: In [1]: data = """,a,a,b,c,c
   ....:    ...: q,r,s,t,u,v
   ....:    ...: 1,2,3,4,5,6
   ....:    ...: 7,8,9,10,11,12"""

In [16]: 

In [16]: read_csv(StringIO(data),header=[0,1],index_col=0)
Out[16]: 
   a      b   c    
q  r  s   t   u   v
1  2  3   4   5   6
7  8  9  10  11  12

In [17]: read_csv(StringIO(data),header=[0,1])
Out[17]: 
   Unnamed: 0_level_0  a      b   c    
                    q  r  s   t   u   v
0                   1  2  3   4   5   6
1                   7  8  9  10  11  12

3rd works fine (its wrong specification, but could be valid)

In [18]: In [1]: data = """,a,a,b,c,c
   ....:    ...: q,r,s,t,u,v
   ....:    ...: 1,2,3,4,5,6
   ....:    ...: 7,8,9,10,11,12"""

In [19]: 

In [19]: read_csv(StringIO(data),header=[0,1],index_col=[0, 1])
Out[19]: 
     a   b   c    
q    s   t   u   v
1 2  3   4   5   6
7 8  9  10  11  12

@jtratner
Copy link
Contributor

The second one is what you'd expect from something with two level
hierarchical columns and a single index column, no?
On Oct 24, 2013 7:53 AM, "jreback" [email protected] wrote:

@jtratner https://github.com/jtratner your examples

First one is user error

In [7]: In [1]: data = """a,a,a,b,c,c
...: ...: q,r,s,t,u,v
...: ...: 1,2,3,4,5,6
...: ...: 7,8,9,10,11,12"""

In [13]: read_csv(StringIO(data),header=[0,1],index_col=0)
Out[13]:
a a b c
q r s t u v
1 2 3 4 5 6
7 8 9 10 11 12

In [14]: read_csv(StringIO(data),header=[0,1])
Out[14]:
a b c
q r s t u v
0 1 2 3 4 5 6
1 7 8 9 10 11 12

2nd is user error, but the data is corrupt (missing first label in
top-level).
maybe can detect; though

In [15]: In [1]: data = """,a,a,b,c,c
....: ...: q,r,s,t,u,v
....: ...: 1,2,3,4,5,6
....: ...: 7,8,9,10,11,12"""

In [16]:

In [16]: read_csv(StringIO(data),header=[0,1],index_col=0)
Out[16]:
a b c
q r s t u v
1 2 3 4 5 6
7 8 9 10 11 12

In [17]: read_csv(StringIO(data),header=[0,1])
Out[17]:
Unnamed: 0_level_0 a b c
q r s t u v
0 1 2 3 4 5 6
1 7 8 9 10 11 12

3rd works fine.

In [18]: In [1]: data = """,a,a,b,c,c
....: ...: q,r,s,t,u,v
....: ...: 1,2,3,4,5,6
....: ...: 7,8,9,10,11,12"""

In [19]:

In [19]: read_csv(StringIO(data),header=[0,1],index_col=[0, 1])
Out[19]:
a b c
q s t u v
1 2 3 4 5 6
7 8 9 10 11 12


Reply to this email directly or view it on GitHubhttps://github.com//pull/5298#issuecomment-26985834
.

@jreback
Copy link
Contributor Author

jreback commented Oct 24, 2013

@jtratner I think the 2nd one is exactly what you got, 2-level column mi, and 1-level index named q. maybe not what the user expects, but it parses correctly according tot he rules.

@jtratner
Copy link
Contributor

Okay yeah that's what I'd expect. I was confused because you called it
malformed.

+1 to this!

@jreback
Copy link
Contributor Author

jreback commented Oct 24, 2013

ok...I mean malformed as in possibly not what the user expects

…ex cols), (GH4702)

TST: addtl mi malformed test cases

DOC: update io.rst docs for multi-index for columns
jreback added a commit that referenced this pull request Oct 24, 2013
BUG: parser can handle a common_format multi-column index (no row index cols), (GH4702)
@jreback jreback merged commit e067b61 into pandas-dev:master Oct 24, 2013
@jtratner
Copy link
Contributor

Thanks @jreback!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Bug in read_csv when passing header kwarg?
2 participants