Skip to content

DOC/BUG? pivot functionality confusing/inconsistent #8160

Closed
@seth-p

Description

@seth-p
  1. The docstring of the (non-member) pivot() function, http://pandas.pydata.org/pandas-docs/stable/generated/pandas.pivot.html#pandas.pivot, says Produce ‘pivot’ table based on 3 columns of this DataFrame. Uses unique values from index / columns and fills with values. But there is no DataFrame argument, and so no "this DataFrame". Is this an internal function that shouldn't be exposed? Or is the docstring wrong?
  2. While the (non-member) pivot_table() supports specifying multiple columns for columns, so that the resulting table has multi-index columns, DataFrame.pivot() does not. Any reason it doesn't? I would have expected the two functions to behave similarly. Granted, the docstring for DataFrame.pivot() doesn't claim that it supports multiple columns for columns, so this isn't a bug, but it does seem inconsistent (and restrictive) vs. pivot_table().
In [2]: from pandas import DataFrame, pivot_table

In [3]: df = DataFrame([['foo', 'ABC', 'A', 1],
   ...:                 ['foo', 'ABC', 'B', 2],
   ...:                 ['foo', 'XYZ', 'X', 3],
   ...:                 ['foo', 'XYZ', 'Y', 4],
   ...:                 ['bar', 'ABC', 'B', 5],
   ...:                 ['bar', 'XYZ', 'X', 6]],
   ...:                columns=['FooBar', 'TLA', 'Letter', 'Number'])

In [4]: df
Out[4]:
  FooBar  TLA Letter  Number
0    foo  ABC      A       1
1    foo  ABC      B       2
2    foo  XYZ      X       3
3    foo  XYZ      Y       4
4    bar  ABC      B       5
5    bar  XYZ      X       6

In [11]: pivot_table(df, index='FooBar', columns=['TLA', 'Letter'], values='Number')
Out[11]:
TLA     ABC     XYZ
Letter    A  B    X   Y
FooBar
bar     NaN  5    6 NaN
foo       1  2    3   4

In [13]: df.pivot(index='FooBar', columns=['TLA', 'Letter'], values='Number')
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-13-8585f7e09b0c> in <module>()
----> 1 df.pivot(index='FooBar', columns=['TLA', 'Letter'], values='Number')

C:\Python34\lib\site-packages\pandas\core\frame.py in pivot(self, index, columns, values)
   3264         """
   3265         from pandas.core.reshape import pivot
-> 3266         return pivot(self, index=index, columns=columns, values=values)
   3267
   3268     def stack(self, level=-1, dropna=True):

C:\Python34\lib\site-packages\pandas\core\reshape.py in pivot(self, index, columns, values)
    357         indexed = Series(self[values].values,
    358                          index=MultiIndex.from_arrays([self[index],
--> 359                                                        self[columns]]))
    360         return indexed.unstack(columns)
    361

C:\Python34\lib\site-packages\pandas\core\index.py in from_arrays(cls, arrays, sortorder, names)
   2795             return Index(arrays[0], name=name)
   2796
-> 2797         cats = [Categorical.from_array(arr) for arr in arrays]
   2798         levels = [c.levels for c in cats]
   2799         labels = [c.labels for c in cats]

C:\Python34\lib\site-packages\pandas\core\index.py in <listcomp>(.0)
   2795             return Index(arrays[0], name=name)
   2796
-> 2797         cats = [Categorical.from_array(arr) for arr in arrays]
   2798         levels = [c.levels for c in cats]
   2799         labels = [c.labels for c in cats]

C:\Python34\lib\site-packages\pandas\core\categorical.py in from_array(cls, data)
    101             the unique values of `data`.
    102         """
--> 103         return Categorical(data)
    104
    105     _levels = None

C:\Python34\lib\site-packages\pandas\core\categorical.py in __init__(self, labels, levels, name)
     82                 name = getattr(labels, 'name', None)
     83             try:
---> 84                 labels, levels = factorize(labels, sort=True)
     85             except TypeError:
     86                 labels, levels = factorize(labels, sort=False)

C:\Python34\lib\site-packages\pandas\core\algorithms.py in factorize(values, sort, order, na_sentinel)
    128     table = hash_klass(len(vals))
    129     uniques = vec_klass()
--> 130     labels = table.get_labels(vals, uniques, 0, na_sentinel)
    131
    132     labels = com._ensure_platform_int(labels)

C:\Python34\lib\site-packages\pandas\hashtable.pyd in pandas.hashtable.PyObjectHashTable.get_labels (pandas\hashtable.c:13534)()

ValueError: Buffer has wrong number of dimensions (expected 1, got 2)

This is with Pandas v0.14.1.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions