Skip to content

Commit 682ca25

Browse files
committed
ENH: implement passed quantile array to qcut and document that plus factors, close #1407
1 parent 8f94009 commit 682ca25

File tree

5 files changed

+71
-7
lines changed

5 files changed

+71
-7
lines changed

doc/source/basics.rst

Lines changed: 41 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -369,6 +369,47 @@ index labels with the minimum and maximum corresponding values:
369369
df1.idxmin(axis=0)
370370
df1.idxmax(axis=1)
371371
372+
Value counts (histogramming)
373+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
374+
375+
The ``value_counts`` Series method and top-level function computes a histogram
376+
of a 1D array of values. It can also be used as a function on regular arrays:
377+
378+
.. ipython:: python
379+
380+
data = np.random.randint(0, 7, size=50)
381+
data
382+
s = Series(data)
383+
s.value_counts()
384+
value_counts(data)
385+
386+
387+
Discretization and quantiling
388+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
389+
390+
Continuous values can be discretized using the ``cut`` (bins based on values)
391+
and ``qcut`` (bins based on sample quantiles) functions:
392+
393+
.. ipython:: python
394+
395+
arr = np.random.randn(20)
396+
factor = cut(arr, 4)
397+
factor
398+
399+
factor = cut(arr, [-5, -1, 0, 1, 5])
400+
factor
401+
402+
``qcut`` computes sample quantiles. For example, we could slice up some
403+
normally distributed data into equal-size quartiles like so:
404+
405+
.. ipython:: python
406+
407+
arr = np.random.randn(30)
408+
factor = qcut(arr, [0, .25, .5, .75, 1])
409+
factor
410+
value_counts(factor)
411+
412+
372413
.. _basics.apply:
373414

374415
Function application

doc/source/groupby.rst

Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -590,3 +590,17 @@ If there are any NaN values in the grouping key, these will be automatically
590590
excluded. So there will never be an "NA group". This was not the case in older
591591
versions of pandas, but users were generally discarding the NA group anyway
592592
(and supporting it was an implementation headache).
593+
594+
Grouping with ordered factors
595+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
596+
597+
Categorical variables represented as instance of pandas's ``Factor`` class can
598+
be used as group keys. If so, the order of the levels will be preserved:
599+
600+
.. ipython:: python
601+
602+
data = Series(np.random.randn(100))
603+
604+
factor = qcut(data, [0, .25, .5, .75, 1.])
605+
606+
data.groupby(factor).mean()

doc/source/timeseries.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -859,7 +859,7 @@ time zone:
859859
860860
ts = Series(randn(len(rng)), rng)
861861
862-
ts_utc = ts.tz_convert('UTC')
862+
ts_utc = ts.tz_localize('UTC')
863863
ts_utc
864864
865865
ts_utc.tz_convert('US/Eastern')

pandas/tools/tests/test_tile.py

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -109,6 +109,13 @@ def test_qcut_bounds(self):
109109
factor = qcut(arr, 10, labels=False)
110110
self.assert_(len(np.unique(factor)) == 10)
111111

112+
def test_qcut_specify_quantiles(self):
113+
arr = np.random.randn(100)
114+
115+
factor = qcut(arr, [0, .25, .5, .75, 1.])
116+
expected = qcut(arr, 4)
117+
self.assert_(factor.equals(expected))
118+
112119
def test_cut_out_of_bounds(self):
113120
np.random.seed(12345)
114121

pandas/tools/tile.py

Lines changed: 8 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -111,7 +111,8 @@ def qcut(x, q=4, labels=None, retbins=False, precision=3):
111111
x : ndarray or Series
112112
q : integer or array of quantiles
113113
Number of quantiles. 10 for deciles, 4 for quartiles, etc. Alternately
114-
array of quantiles, e.g. [0, .25, .5, .75, 1.] for quartiles
114+
array of quantiles, e.g. [0, .25, .5, .75, 1.] for quartiles. Array of
115+
quantiles must span [0, 1]
115116
labels : array or boolean, default None
116117
Labels to use for bin edges, or False to return integer bin labels
117118
retbins : bool, optional
@@ -129,12 +130,13 @@ def qcut(x, q=4, labels=None, retbins=False, precision=3):
129130
"""
130131
if com.is_integer(q):
131132
quantiles = np.linspace(0, 1, q + 1)
132-
bins = algos.quantile(x, quantiles)
133-
bins[0] -= 0.001 * (x.max() - x.min())
134-
return _bins_to_cuts(x, bins, labels=labels, retbins=retbins,
135-
precision=precision)
136133
else:
137-
raise NotImplementedError
134+
quantiles = q
135+
bins = algos.quantile(x, quantiles)
136+
bins[0] -= 0.001 * (x.max() - x.min())
137+
138+
return _bins_to_cuts(x, bins, labels=labels, retbins=retbins,
139+
precision=precision)
138140

139141

140142
def _bins_to_cuts(x, bins, right=True, labels=None, retbins=False,

0 commit comments

Comments
 (0)