Closed
Description
- The docstring of the (non-member)
pivot()
function, http://pandas.pydata.org/pandas-docs/stable/generated/pandas.pivot.html#pandas.pivot, saysProduce ‘pivot’ table based on 3 columns of this DataFrame. Uses unique values from index / columns and fills with values.
But there is no DataFrame argument, and so no "this DataFrame". Is this an internal function that shouldn't be exposed? Or is the docstring wrong? - While the (non-member)
pivot_table()
supports specifying multiple columns forcolumns
, so that the resulting table has multi-index columns,DataFrame.pivot()
does not. Any reason it doesn't? I would have expected the two functions to behave similarly. Granted, the docstring forDataFrame.pivot()
doesn't claim that it supports multiple columns forcolumns
, so this isn't a bug, but it does seem inconsistent (and restrictive) vs.pivot_table()
.
In [2]: from pandas import DataFrame, pivot_table
In [3]: df = DataFrame([['foo', 'ABC', 'A', 1],
...: ['foo', 'ABC', 'B', 2],
...: ['foo', 'XYZ', 'X', 3],
...: ['foo', 'XYZ', 'Y', 4],
...: ['bar', 'ABC', 'B', 5],
...: ['bar', 'XYZ', 'X', 6]],
...: columns=['FooBar', 'TLA', 'Letter', 'Number'])
In [4]: df
Out[4]:
FooBar TLA Letter Number
0 foo ABC A 1
1 foo ABC B 2
2 foo XYZ X 3
3 foo XYZ Y 4
4 bar ABC B 5
5 bar XYZ X 6
In [11]: pivot_table(df, index='FooBar', columns=['TLA', 'Letter'], values='Number')
Out[11]:
TLA ABC XYZ
Letter A B X Y
FooBar
bar NaN 5 6 NaN
foo 1 2 3 4
In [13]: df.pivot(index='FooBar', columns=['TLA', 'Letter'], values='Number')
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-13-8585f7e09b0c> in <module>()
----> 1 df.pivot(index='FooBar', columns=['TLA', 'Letter'], values='Number')
C:\Python34\lib\site-packages\pandas\core\frame.py in pivot(self, index, columns, values)
3264 """
3265 from pandas.core.reshape import pivot
-> 3266 return pivot(self, index=index, columns=columns, values=values)
3267
3268 def stack(self, level=-1, dropna=True):
C:\Python34\lib\site-packages\pandas\core\reshape.py in pivot(self, index, columns, values)
357 indexed = Series(self[values].values,
358 index=MultiIndex.from_arrays([self[index],
--> 359 self[columns]]))
360 return indexed.unstack(columns)
361
C:\Python34\lib\site-packages\pandas\core\index.py in from_arrays(cls, arrays, sortorder, names)
2795 return Index(arrays[0], name=name)
2796
-> 2797 cats = [Categorical.from_array(arr) for arr in arrays]
2798 levels = [c.levels for c in cats]
2799 labels = [c.labels for c in cats]
C:\Python34\lib\site-packages\pandas\core\index.py in <listcomp>(.0)
2795 return Index(arrays[0], name=name)
2796
-> 2797 cats = [Categorical.from_array(arr) for arr in arrays]
2798 levels = [c.levels for c in cats]
2799 labels = [c.labels for c in cats]
C:\Python34\lib\site-packages\pandas\core\categorical.py in from_array(cls, data)
101 the unique values of `data`.
102 """
--> 103 return Categorical(data)
104
105 _levels = None
C:\Python34\lib\site-packages\pandas\core\categorical.py in __init__(self, labels, levels, name)
82 name = getattr(labels, 'name', None)
83 try:
---> 84 labels, levels = factorize(labels, sort=True)
85 except TypeError:
86 labels, levels = factorize(labels, sort=False)
C:\Python34\lib\site-packages\pandas\core\algorithms.py in factorize(values, sort, order, na_sentinel)
128 table = hash_klass(len(vals))
129 uniques = vec_klass()
--> 130 labels = table.get_labels(vals, uniques, 0, na_sentinel)
131
132 labels = com._ensure_platform_int(labels)
C:\Python34\lib\site-packages\pandas\hashtable.pyd in pandas.hashtable.PyObjectHashTable.get_labels (pandas\hashtable.c:13534)()
ValueError: Buffer has wrong number of dimensions (expected 1, got 2)
This is with Pandas v0.14.1.