Skip to content

Commit 41793ea

Browse files
committed
DOC: added sorting examples to 10min
BUG: fixed multi-index selection via loc, back to using some of ix code (but still do validation if not mi) ENH: add xs to Series for compatiblity, create _xs functions in all objects DOC: added several sub-sections to 10min fixed some references in basics.rst
1 parent 643e1cb commit 41793ea

File tree

9 files changed

+201
-36
lines changed

9 files changed

+201
-36
lines changed

RELEASE.rst

+1-1
Original file line numberDiff line numberDiff line change
@@ -59,6 +59,7 @@ pandas 0.11.0
5959
- Add ``format`` option to ``pandas.to_datetime`` with faster conversion of
6060
strings that can be parsed with datetime.strptime
6161
- Add ``axes`` property to ``Series`` for compatibility
62+
- Add ``xs`` function to ``Series`` for compatibility
6263

6364
**API Changes**
6465

@@ -135,7 +136,6 @@ pandas 0.11.0
135136
- Bug on in-place putmasking on an ``integer`` series that needs to be converted to ``float`` (GH2746_)
136137
- Bug in argsort of ``datetime64[ns]`` Series with ``NaT`` (GH2967_)
137138
- Bug in idxmin/idxmax of ``datetime64[ns]`` Series with ``NaT`` (GH2982__)
138-
- ``icol`` with negative indicies was return ``nan`` (see GH2922_)
139139
- Bug in ``icol`` with negative indicies was incorrect producing incorrect return values (see GH2922_)
140140

141141
.. _GH622: https://github.com/pydata/pandas/issues/622

doc/source/10min.rst

+111-7
Original file line numberDiff line numberDiff line change
@@ -67,7 +67,7 @@ Creating a ``DataFrame`` by passing a dict of objects that can be converted to s
6767
'E' : 'foo' })
6868
df2
6969
70-
Having specific dtypes
70+
Having specific :ref:`dtypes <basics.dtypes>`
7171

7272
.. ipython:: python
7373
@@ -83,7 +83,7 @@ See the top & bottom rows of the frame
8383
.. ipython:: python
8484
8585
df.head()
86-
df.tail()
86+
df.tail(3)
8787
8888
Display the index,columns, and the underlying numpy data
8989

@@ -99,6 +99,24 @@ Describe shows a quick statistic summary of your data
9999
100100
df.describe()
101101
102+
Transposing your data
103+
104+
.. ipython:: python
105+
106+
df.T
107+
108+
Sorting by an axis
109+
110+
.. ipython:: python
111+
112+
df.sort_index(axis=1, ascending=False)
113+
114+
Sorting by values
115+
116+
.. ipython:: python
117+
118+
df.sort(columns='B')
119+
102120
Selection
103121
---------
104122

@@ -112,6 +130,7 @@ Selecting a single column, which yields a ``Series``
112130

113131
.. ipython:: python
114132
133+
# equivalently ``df.A``
115134
df['A']
116135
117136
Selecting via ``[]``, which slices the rows.
@@ -167,7 +186,6 @@ Select via the position of the passed integers
167186

168187
.. ipython:: python
169188
170-
# this is a cross-section of the object
171189
df.iloc[3]
172190
173191
By integer slices, acting similar to numpy/python
@@ -220,7 +238,7 @@ Pandas will detect this and raise ``IndexError``, rather than return an empty st
220238

221239
::
222240

223-
>>> df.iloc[:,3:6]
241+
>>> df.iloc[:,8:10]
224242
IndexError: out-of-bounds on slice (end)
225243

226244
Boolean Indexing
@@ -232,7 +250,7 @@ Using a single column's values to select data.
232250
233251
df[df.A > 0]
234252
235-
A ``where`` operation.
253+
A ``where`` operation for getting.
236254

237255
.. ipython:: python
238256
@@ -270,6 +288,14 @@ Setting by assigning with a numpy array
270288
df.loc[:,'D'] = np.array([5] * len(df))
271289
df
272290
291+
A ``where`` operation with setting.
292+
293+
.. ipython:: python
294+
295+
df2 = df.copy()
296+
df2[df2 > 0] = -df2
297+
df2
298+
273299
Missing Data
274300
------------
275301

@@ -297,6 +323,12 @@ Filling missing data
297323
298324
df1.fillna(value=5)
299325
326+
To get the boolean mask where values are ``nan``
327+
328+
.. ipython:: python
329+
330+
pd.isnull(df1)
331+
300332
301333
Operations
302334
----------
@@ -306,6 +338,8 @@ See the :ref:`Basic section on Binary Ops <basics.binop>`
306338
Stats
307339
~~~~~
308340

341+
Operations in general *exclude* missing data.
342+
309343
Performing a descriptive statistic
310344

311345
.. ipython:: python
@@ -318,11 +352,15 @@ Same operation on the other axis
318352
319353
df.mean(1)
320354
321-
Operations on missing data, exclude the data
355+
Operating with objects that have different dimensionality and need alignment.
356+
In addition, pandas automatically broadcasts along the specified dimension.
322357

323358
.. ipython:: python
324359
325-
df1.mean()
360+
s = pd.Series([1,3,5,np.nan,6,8],index=dates).shift(2)
361+
s
362+
df.sub(s,axis='index')
363+
326364
327365
Apply
328366
~~~~~
@@ -334,6 +372,27 @@ Applying functions to the data
334372
df.apply(np.cumsum)
335373
df.apply(lambda x: x.max() - x.min())
336374
375+
Histogramming
376+
~~~~~~~~~~~~~
377+
378+
See more at :ref:`Histogramming and Discretization <basics.discretization>`
379+
380+
.. ipython:: python
381+
382+
s = Series(np.random.randint(0,7,size=10))
383+
s
384+
s.value_counts()
385+
386+
String Methods
387+
~~~~~~~~~~~~~~
388+
389+
See more at :ref:`Vectorized String Methods <basics.string_methods>`
390+
391+
.. ipython:: python
392+
393+
s = Series(['A', 'B', 'C', 'Aaba', 'Baca', np.nan, 'CABA', 'dog', 'cat'])
394+
s.str.lower()
395+
337396
Merge
338397
-----
339398

@@ -425,6 +484,9 @@ Reshaping
425484
See the section on :ref:`Hierarchical Indexing <indexing.hierarchical>` and
426485
see the section on :ref:`Reshaping <reshaping.stacking>`).
427486

487+
Stack
488+
~~~~~
489+
428490
.. ipython:: python
429491
430492
tuples = zip(*[['bar', 'bar', 'baz', 'baz',
@@ -453,6 +515,26 @@ unstacks the **last level**:
453515
stacked.unstack(1)
454516
stacked.unstack(0)
455517
518+
Pivot Tables
519+
~~~~~~~~~~~~
520+
See the section on :ref:`Pivot Tables <reshaping.pivot>`).
521+
522+
.. ipython:: python
523+
524+
df = DataFrame({'A' : ['one', 'one', 'two', 'three'] * 3,
525+
'B' : ['A', 'B', 'C'] * 4,
526+
'C' : ['foo', 'foo', 'foo', 'bar', 'bar', 'bar'] * 2,
527+
'D' : np.random.randn(12),
528+
'E' : np.random.randn(12)})
529+
df
530+
531+
We can produce pivot tables from this data very easily:
532+
533+
.. ipython:: python
534+
535+
pivot_table(df, values='D', rows=['A', 'B'], cols=['C'])
536+
537+
456538
Time Series
457539
-----------
458540

@@ -581,3 +663,25 @@ Reading from a HDF5 Store
581663
store.close()
582664
os.remove('foo.h5')
583665
666+
Excel
667+
~~~~~
668+
669+
Reading and writing to :ref:`MS Excel <io.excel>`
670+
671+
Writing to an excel file
672+
673+
.. ipython:: python
674+
675+
df.to_excel('foo.xlsx', sheet_name='sheet1')
676+
677+
Reading from an excel file
678+
679+
.. ipython:: python
680+
681+
xls = ExcelFile('foo.xlsx')
682+
xls.parse('sheet1', index_col=None, na_values=['NA'])
683+
684+
.. ipython:: python
685+
:suppress:
686+
687+
os.remove('foo.xlsx')

doc/source/basics.rst

+7-5
Original file line numberDiff line numberDiff line change
@@ -9,9 +9,9 @@
99
randn = np.random.randn
1010
np.set_printoptions(precision=4, suppress=True)
1111
12-
*****************************
13-
Essential Basic Functionality
14-
*****************************
12+
==============================
13+
Essential Basic Functionality
14+
==============================
1515

1616
Here we discuss a lot of the essential functionality common to the pandas data
1717
structures. Here's how to create some of the objects used in the examples from
@@ -374,6 +374,8 @@ value, ``idxmin`` and ``idxmax`` return the first matching index:
374374
df3
375375
df3['A'].idxmin()
376376
377+
.. _basics.discretization:
378+
377379
Value counts (histogramming)
378380
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
379381

@@ -976,11 +978,11 @@ To be clear, no pandas methods have the side effect of modifying your data;
976978
almost all methods return new objects, leaving the original object
977979
untouched. If data is modified, it is because you did so explicitly.
978980

981+
.. _basics.dtypes:
982+
979983
dtypes
980984
------
981985

982-
.. _basics.dtypes:
983-
984986
The main types stored in pandas objects are ``float``, ``int``, ``bool``, ``datetime64[ns]``, ``timedelta[ns]``,
985987
and ``object``. In addition these dtypes have item sizes, e.g. ``int64`` and ``int32``. A convenient ``dtypes``
986988
attribute for DataFrames returns a Series with the data type of each column.

doc/source/io.rst

+2
Original file line numberDiff line numberDiff line change
@@ -906,6 +906,8 @@ And then import the data directly to a DataFrame by calling:
906906
clipdf
907907
908908
909+
.. _io.excel:
910+
909911
Excel files
910912
-----------
911913

pandas/core/frame.py

+2
Original file line numberDiff line numberDiff line change
@@ -2343,6 +2343,8 @@ def xs(self, key, axis=0, level=None, copy=True):
23432343
result.index = new_index
23442344
return result
23452345

2346+
_xs = xs
2347+
23462348
def lookup(self, row_labels, col_labels):
23472349
"""
23482350
Label-based "fancy indexing" function for DataFrame. Given equal-length

pandas/core/indexing.py

+18-14
Original file line numberDiff line numberDiff line change
@@ -55,9 +55,9 @@ def _get_label(self, label, axis=0):
5555
raise IndexingError('no slices here')
5656

5757
try:
58-
return self.obj.xs(label, axis=axis, copy=False)
58+
return self.obj._xs(label, axis=axis, copy=False)
5959
except Exception:
60-
return self.obj.xs(label, axis=axis, copy=True)
60+
return self.obj._xs(label, axis=axis, copy=True)
6161

6262
def _get_loc(self, key, axis=0):
6363
return self.obj._ixs(key, axis=axis)
@@ -86,6 +86,9 @@ def __setitem__(self, key, value):
8686

8787
self._setitem_with_indexer(indexer, value)
8888

89+
def _has_valid_tuple(self, key):
90+
pass
91+
8992
def _convert_tuple(self, key):
9093
keyidx = []
9194
for i, k in enumerate(key):
@@ -224,6 +227,9 @@ def _getitem_tuple(self, tup):
224227
if self._multi_take_opportunity(tup):
225228
return self._multi_take(tup)
226229

230+
# no multi-index, so validate all of the indexers
231+
self._has_valid_tuple(tup)
232+
227233
# no shortcut needed
228234
retval = self.obj
229235
for i, key in enumerate(tup):
@@ -616,15 +622,16 @@ class _LocationIndexer(_NDFrameIndexer):
616622
def _has_valid_type(self, k, axis):
617623
raise NotImplementedError()
618624

625+
def _has_valid_tuple(self, key):
626+
""" check the key for valid keys across my indexer """
627+
for i, k in enumerate(key):
628+
if i >= self.obj.ndim:
629+
raise ValueError('Too many indexers')
630+
if not self._has_valid_type(k,i):
631+
raise ValueError("Location based indexing can only have [%s] types" % self._valid_types)
632+
619633
def __getitem__(self, key):
620634
if type(key) is tuple:
621-
622-
for i, k in enumerate(key):
623-
if i >= self.obj.ndim:
624-
raise ValueError('Too many indexers')
625-
if not self._has_valid_type(k,i):
626-
raise ValueError("Location based indexing can only have [%s] types" % self._valid_types)
627-
628635
return self._getitem_tuple(key)
629636
else:
630637
return self._getitem_axis(key, axis=0)
@@ -707,11 +714,7 @@ def _getitem_axis(self, key, axis=0):
707714

708715
return self._getitem_iterable(key, axis=axis)
709716
else:
710-
indexer = labels.get_loc(key)
711-
return self._get_loc(indexer, axis=axis)
712-
713-
def _get_loc(self, key, axis=0):
714-
return self.obj._ixs(key, axis=axis)
717+
return self._get_label(key, axis=axis)
715718

716719
class _iLocIndexer(_LocationIndexer):
717720
""" purely integer based location based indexing """
@@ -723,6 +726,7 @@ def _has_valid_type(self, key, axis):
723726

724727
def _getitem_tuple(self, tup):
725728

729+
self._has_valid_tuple(tup)
726730
retval = self.obj
727731
for i, key in enumerate(tup):
728732
if _is_null_slice(key):

pandas/core/panel.py

+2
Original file line numberDiff line numberDiff line change
@@ -1065,6 +1065,8 @@ def xs(self, key, axis=1, copy=True):
10651065
new_data = self._data.xs(key, axis=axis_number, copy=copy)
10661066
return self._constructor_sliced(new_data)
10671067

1068+
_xs = xs
1069+
10681070
def _ixs(self, i, axis=0):
10691071
# for compatibility with .ix indexing
10701072
# Won't work with hierarchical indexing yet

pandas/core/series.py

+3
Original file line numberDiff line numberDiff line change
@@ -559,6 +559,9 @@ def ix(self):
559559

560560
return self._ix
561561

562+
def _xs(self, key, axis=0, level=None, copy=True):
563+
return self.__getitem__(key)
564+
562565
def _ixs(self, i, axis=0):
563566
"""
564567
Return the i-th value or values in the Series by location

0 commit comments

Comments
 (0)