forked from pandas-dev/pandas
-
Notifications
You must be signed in to change notification settings - Fork 1
/
Copy pathshared_docs.py
327 lines (273 loc) · 9.05 KB
/
shared_docs.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
from typing import Dict
_shared_docs: Dict[str, str] = dict()
_shared_docs[
"aggregate"
] = """\
Aggregate using one or more operations over the specified axis.
Parameters
----------
func : function, str, list or dict
Function to use for aggregating the data. If a function, must either
work when passed a {klass} or when passed to {klass}.apply.
Accepted combinations are:
- function
- string function name
- list of functions and/or function names, e.g. ``[np.sum, 'mean']``
- dict of axis labels -> functions, function names or list of such.
{axis}
*args
Positional arguments to pass to `func`.
**kwargs
Keyword arguments to pass to `func`.
Returns
-------
scalar, Series or DataFrame
The return can be:
* scalar : when Series.agg is called with single function
* Series : when DataFrame.agg is called with a single function
* DataFrame : when DataFrame.agg is called with several functions
Return scalar, Series or DataFrame.
{see_also}
Notes
-----
`agg` is an alias for `aggregate`. Use the alias.
A passed user-defined-function will be passed a Series for evaluation.
{examples}"""
_shared_docs[
"compare"
] = """\
Compare to another %(klass)s and show the differences.
.. versionadded:: 1.1.0
Parameters
----------
other : %(klass)s
Object to compare with.
align_axis : {0 or 'index', 1 or 'columns'}, default 1
Determine which axis to align the comparison on.
* 0, or 'index' : Resulting differences are stacked vertically
with rows drawn alternately from self and other.
* 1, or 'columns' : Resulting differences are aligned horizontally
with columns drawn alternately from self and other.
keep_shape : bool, default False
If true, all rows and columns are kept.
Otherwise, only the ones with different values are kept.
keep_equal : bool, default False
If true, the result keeps values that are equal.
Otherwise, equal values are shown as NaNs.
"""
_shared_docs[
"groupby"
] = """\
Group %(klass)s using a mapper or by a Series of columns.
A groupby operation involves some combination of splitting the
object, applying a function, and combining the results. This can be
used to group large amounts of data and compute operations on these
groups.
Parameters
----------
by : mapping, function, label, or list of labels
Used to determine the groups for the groupby.
If ``by`` is a function, it's called on each value of the object's
index. If a dict or Series is passed, the Series or dict VALUES
will be used to determine the groups (the Series' values are first
aligned; see ``.align()`` method). If an ndarray is passed, the
values are used as-is determine the groups. A label or list of
labels may be passed to group by the columns in ``self``. Notice
that a tuple is interpreted as a (single) key.
axis : {0 or 'index', 1 or 'columns'}, default 0
Split along rows (0) or columns (1).
level : int, level name, or sequence of such, default None
If the axis is a MultiIndex (hierarchical), group by a particular
level or levels.
as_index : bool, default True
For aggregated output, return object with group labels as the
index. Only relevant for DataFrame input. as_index=False is
effectively "SQL-style" grouped output.
sort : bool, default True
Sort group keys. Get better performance by turning this off.
Note this does not influence the order of observations within each
group. Groupby preserves the order of rows within each group.
group_keys : bool, default True
When calling apply, add group keys to index to identify pieces.
squeeze : bool, default False
Reduce the dimensionality of the return type if possible,
otherwise return a consistent type.
.. deprecated:: 1.1.0
observed : bool, default False
This only applies if any of the groupers are Categoricals.
If True: only show observed values for categorical groupers.
If False: show all values for categorical groupers.
dropna : bool, default True
If True, and if group keys contain NA values, NA values together
with row/column will be dropped.
If False, NA values will also be treated as the key in groups
.. versionadded:: 1.1.0
Returns
-------
%(klass)sGroupBy
Returns a groupby object that contains information about the groups.
See Also
--------
resample : Convenience method for frequency conversion and resampling
of time series.
Notes
-----
See the `user guide
<https://pandas.pydata.org/pandas-docs/stable/groupby.html>`_ for more.
"""
_shared_docs[
"melt"
] = """\
Unpivot a DataFrame from wide to long format, optionally leaving identifiers set.
This function is useful to massage a DataFrame into a format where one
or more columns are identifier variables (`id_vars`), while all other
columns, considered measured variables (`value_vars`), are "unpivoted" to
the row axis, leaving just two non-identifier columns, 'variable' and
'value'.
Parameters
----------
id_vars : tuple, list, or ndarray, optional
Column(s) to use as identifier variables.
value_vars : tuple, list, or ndarray, optional
Column(s) to unpivot. If not specified, uses all columns that
are not set as `id_vars`.
var_name : scalar
Name to use for the 'variable' column. If None it uses
``frame.columns.name`` or 'variable'.
value_name : scalar, default 'value'
Name to use for the 'value' column.
col_level : int or str, optional
If columns are a MultiIndex then use this level to melt.
ignore_index : bool, default True
If True, original index is ignored. If False, the original index is retained.
Index labels will be repeated as necessary.
.. versionadded:: 1.1.0
Returns
-------
DataFrame
Unpivoted DataFrame.
See Also
--------
%(other)s : Identical method.
pivot_table : Create a spreadsheet-style pivot table as a DataFrame.
DataFrame.pivot : Return reshaped DataFrame organized
by given index / column values.
DataFrame.explode : Explode a DataFrame from list-like
columns to long format.
Examples
--------
>>> df = pd.DataFrame({'A': {0: 'a', 1: 'b', 2: 'c'},
... 'B': {0: 1, 1: 3, 2: 5},
... 'C': {0: 2, 1: 4, 2: 6}})
>>> df
A B C
0 a 1 2
1 b 3 4
2 c 5 6
>>> %(caller)sid_vars=['A'], value_vars=['B'])
A variable value
0 a B 1
1 b B 3
2 c B 5
>>> %(caller)sid_vars=['A'], value_vars=['B', 'C'])
A variable value
0 a B 1
1 b B 3
2 c B 5
3 a C 2
4 b C 4
5 c C 6
The names of 'variable' and 'value' columns can be customized:
>>> %(caller)sid_vars=['A'], value_vars=['B'],
... var_name='myVarname', value_name='myValname')
A myVarname myValname
0 a B 1
1 b B 3
2 c B 5
Original index values can be kept around:
>>> %(caller)sid_vars=['A'], value_vars=['B', 'C'], ignore_index=False)
A variable value
0 a B 1
1 b B 3
2 c B 5
0 a C 2
1 b C 4
2 c C 6
If you have multi-index columns:
>>> df.columns = [list('ABC'), list('DEF')]
>>> df
A B C
D E F
0 a 1 2
1 b 3 4
2 c 5 6
>>> %(caller)scol_level=0, id_vars=['A'], value_vars=['B'])
A variable value
0 a B 1
1 b B 3
2 c B 5
>>> %(caller)sid_vars=[('A', 'D')], value_vars=[('B', 'E')])
(A, D) variable_0 variable_1 value
0 a B E 1
1 b B E 3
2 c B E 5
"""
_shared_docs[
"transform"
] = """\
Call ``func`` on self producing a {klass} with transformed values.
Produced {klass} will have same axis length as self.
Parameters
----------
func : function, str, list-like or dict-like
Function to use for transforming the data. If a function, must either
work when passed a {klass} or when passed to {klass}.apply. If func
is both list-like and dict-like, dict-like behavior takes precedence.
Accepted combinations are:
- function
- string function name
- list-like of functions and/or function names, e.g. ``[np.exp, 'sqrt']``
- dict-like of axis labels -> functions, function names or list-like of such.
{axis}
*args
Positional arguments to pass to `func`.
**kwargs
Keyword arguments to pass to `func`.
Returns
-------
{klass}
A {klass} that must have the same length as self.
Raises
------
ValueError : If the returned {klass} has a different length than self.
See Also
--------
{klass}.agg : Only perform aggregating type operations.
{klass}.apply : Invoke function on a {klass}.
Examples
--------
>>> df = pd.DataFrame({{'A': range(3), 'B': range(1, 4)}})
>>> df
A B
0 0 1
1 1 2
2 2 3
>>> df.transform(lambda x: x + 1)
A B
0 1 2
1 2 3
2 3 4
Even though the resulting {klass} must have the same length as the
input {klass}, it is possible to provide several input functions:
>>> s = pd.Series(range(3))
>>> s
0 0
1 1
2 2
dtype: int64
>>> s.transform([np.sqrt, np.exp])
sqrt exp
0 0.000000 1.000000
1 1.000000 2.718282
2 1.414214 7.389056
"""