Skip to content

Commit 5781c3c

Browse files
committed
Merge tag 'v0.15.2' into debian
Version 0.15.2 * tag 'v0.15.2': (64 commits) RLS: 0.15.2 final DOC: fix-up docs for 0.15.2 release DOC: update release notes DOC: v0.15.2 editiing, removing several duplicated issues TST: period-like test for GH9012 BUG: fix PeriodConverter issue when given a list of integers (GH9012) TST: fix related to dateutil test failure in test_series.py Return from to_timedelta is forced to dtype timedelta64[ns]. (Fixes pydata/pandas pandas-dev#9011) TST: dateutil fixes (GH8639) Fix timedelta json on windows DOC: expand docs on sql type conversion ENH: Infer dtype from non-nulls when pushing to SQL COMPAT: windows dtype compat w.r.t. GH9019 COMPAT: dateutil fixups for 2.3 (GH9021, GH8639) DOC: fix categorical comparison example (GH8946) Clean up style a bit Fix timedeltas to work with to_json BUG: Fix plots showing 2 sets of axis labels when the index is a timeseries. API: update NDFrame __setattr__ to match behavior of __getattr__ (GH8994) Make Timestamp('now') equivalent to Timestamp.now() and Timestamp('today') equivalent to Timestamp.today() and pass tz to today(). ...
2 parents 3859345 + 18ea1d8 commit 5781c3c

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

82 files changed

+2140
-530
lines changed

.travis.yml

+16
Original file line numberDiff line numberDiff line change
@@ -59,6 +59,14 @@ matrix:
5959
- CLIPBOARD=xsel
6060
- BUILD_TYPE=conda
6161
- JOB_NAME: "34_nslow"
62+
- python: 3.4
63+
env:
64+
- NOSE_ARGS="slow and not network and not disabled"
65+
- FULL_DEPS=true
66+
- JOB_TAG=_SLOW
67+
- CLIPBOARD=xsel
68+
- BUILD_TYPE=conda
69+
- JOB_NAME: "34_slow"
6270
- python: 3.2
6371
env:
6472
- NOSE_ARGS="not slow and not network and not disabled"
@@ -90,6 +98,14 @@ matrix:
9098
- JOB_TAG=_SLOW
9199
- BUILD_TYPE=conda
92100
- JOB_NAME: "27_slow"
101+
- python: 3.4
102+
env:
103+
- NOSE_ARGS="slow and not network and not disabled"
104+
- FULL_DEPS=true
105+
- JOB_TAG=_SLOW
106+
- CLIPBOARD=xsel
107+
- BUILD_TYPE=conda
108+
- JOB_NAME: "34_slow"
93109
- python: 2.7
94110
env:
95111
- EXPERIMENTAL=true

ci/requirements-3.4_SLOW.txt

+19
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,19 @@
1+
dateutil
2+
pytz
3+
openpyxl
4+
xlsxwriter
5+
xlrd
6+
html5lib
7+
patsy
8+
beautiful-soup
9+
numpy
10+
cython
11+
scipy
12+
numexpr
13+
pytables
14+
matplotlib=1.3.1
15+
lxml
16+
sqlalchemy
17+
bottleneck
18+
pymysql
19+
psycopg2

doc/source/categorical.rst

+54-15
Original file line numberDiff line numberDiff line change
@@ -51,7 +51,7 @@ The categorical data type is useful in the following cases:
5151
variable to a categorical variable will save some memory, see :ref:`here <categorical.memory>`.
5252
* The lexical order of a variable is not the same as the logical order ("one", "two", "three").
5353
By converting to a categorical and specifying an order on the categories, sorting and
54-
min/max will use the logical order instead of the lexical order.
54+
min/max will use the logical order instead of the lexical order, see :ref:`here <categorical.sort>`.
5555
* As a signal to other python libraries that this column should be treated as a categorical
5656
variable (e.g. to use suitable statistical methods or plot types).
5757

@@ -265,9 +265,11 @@ or simply set the categories to a predefined scale, use :func:`Categorical.set_c
265265
intentionally or because it is misspelled or (under Python3) due to a type difference (e.g.,
266266
numpys S1 dtype and python strings). This can result in surprising behaviour!
267267

268-
Ordered or not...
268+
Sorting and Order
269269
-----------------
270270

271+
.. _categorical.sort:
272+
271273
If categorical data is ordered (``s.cat.ordered == True``), then the order of the categories has a
272274
meaning and certain operations are possible. If the categorical is unordered, a `TypeError` is
273275
raised.
@@ -296,9 +298,14 @@ This is even true for strings and numeric data:
296298
s
297299
s.min(), s.max()
298300
301+
302+
Reordering
303+
~~~~~~~~~~
304+
299305
Reordering the categories is possible via the :func:`Categorical.reorder_categories` and
300306
the :func:`Categorical.set_categories` methods. For :func:`Categorical.reorder_categories`, all
301-
old categories must be included in the new categories and no new categories are allowed.
307+
old categories must be included in the new categories and no new categories are allowed. This will
308+
necessarily make the sort order the same as the categories order.
302309

303310
.. ipython:: python
304311
@@ -324,17 +331,45 @@ old categories must be included in the new categories and no new categories are
324331
(e.g.``Series.median()``, which would need to compute the mean between two values if the length
325332
of an array is even) do not work and raise a `TypeError`.
326333

334+
Multi Column Sorting
335+
~~~~~~~~~~~~~~~~~~~~
336+
337+
A categorical dtyped column will partcipate in a multi-column sort in a similar manner to other columns.
338+
The ordering of the categorical is determined by the ``categories`` of that columns.
339+
340+
.. ipython:: python
341+
342+
dfs = DataFrame({'A' : Categorical(list('bbeebbaa'),categories=['e','a','b']),
343+
'B' : [1,2,1,2,2,1,2,1] })
344+
dfs.sort(['A','B'])
345+
346+
Reordering the ``categories``, changes a future sort.
347+
348+
.. ipython:: python
349+
350+
dfs['A'] = dfs['A'].cat.reorder_categories(['a','b','e'])
351+
dfs.sort(['A','B'])
327352
328353
Comparisons
329354
-----------
330355

331-
Comparing `Categoricals` with other objects is possible in two cases:
356+
Comparing categorical data with other objects is possible in three cases:
357+
358+
* comparing equality (``==`` and ``!=``) to a list-like object (list, Series, array,
359+
...) of the same length as the categorical data.
360+
* all comparisons (``==``, ``!=``, ``>``, ``>=``, ``<``, and ``<=``) of categorical data to
361+
another categorical Series, when ``ordered==True`` and the `categories` are the same.
362+
* all comparisons of a categorical data to a scalar.
332363

333-
* comparing a categorical Series to another categorical Series, when `categories` and `ordered` is
334-
the same or
335-
* comparing a categorical Series to a scalar.
364+
All other comparisons, especially "non-equality" comparisons of two categoricals with different
365+
categories or a categorical with any list-like object, will raise a TypeError.
336366

337-
All other comparisons will raise a TypeError.
367+
.. note::
368+
369+
Any "non-equality" comparisons of categorical data with a `Series`, `np.array`, `list` or
370+
categorical data with different categories or ordering will raise an `TypeError` because custom
371+
categories ordering could be interpreted in two ways: one with taking in account the
372+
ordering and one without.
338373

339374
.. ipython:: python
340375
@@ -353,6 +388,14 @@ Comparing to a categorical with the same categories and ordering or to a scalar
353388
cat > cat_base
354389
cat > 2
355390
391+
Equality comparisons work with any list-like object of same length and scalars:
392+
393+
.. ipython:: python
394+
395+
cat == cat_base
396+
cat == np.array([1,2,3])
397+
cat == 2
398+
356399
This doesn't work because the categories are not the same:
357400

358401
.. ipython:: python
@@ -362,13 +405,9 @@ This doesn't work because the categories are not the same:
362405
except TypeError as e:
363406
print("TypeError: " + str(e))
364407
365-
.. note::
366-
367-
Comparisons with `Series`, `np.array` or a `Categorical` with different categories or ordering
368-
will raise an `TypeError` because custom categories ordering could be interpreted in two ways:
369-
one with taking in account the ordering and one without. If you want to compare a categorical
370-
series with such a type, you need to be explicit and convert the categorical data back to the
371-
original values:
408+
If you want to do a "non-equality" comparison of a categorical series with a list-like object
409+
which is not categorical data, you need to be explicit and convert the categorical data back to
410+
the original values:
372411

373412
.. ipython:: python
374413

doc/source/cookbook.rst

+4-4
Original file line numberDiff line numberDiff line change
@@ -489,10 +489,10 @@ Unlike agg, apply's callable is passed a sub-DataFrame which gives you access to
489489
.. ipython:: python
490490
491491
def GrowUp(x):
492-
avg_weight = sum(x[x.size == 'S'].weight * 1.5)
493-
avg_weight += sum(x[x.size == 'M'].weight * 1.25)
494-
avg_weight += sum(x[x.size == 'L'].weight)
495-
avg_weight = avg_weight / len(x)
492+
avg_weight = sum(x[x['size'] == 'S'].weight * 1.5)
493+
avg_weight += sum(x[x['size'] == 'M'].weight * 1.25)
494+
avg_weight += sum(x[x['size'] == 'L'].weight)
495+
avg_weight /= len(x)
496496
return pd.Series(['L',avg_weight,True], index=['size', 'weight', 'adult'])
497497
498498
expected_df = gb.apply(GrowUp)

doc/source/indexing.rst

+8-4
Original file line numberDiff line numberDiff line change
@@ -84,14 +84,17 @@ of multi-axis indexing.
8484

8585
See more at :ref:`Selection by Label <indexing.label>`
8686

87-
- ``.iloc`` is strictly integer position based (from ``0`` to ``length-1`` of
88-
the axis), will raise ``IndexError`` if an indexer is requested and it
89-
is out-of-bounds, except *slice* indexers which allow out-of-bounds indexing.
90-
(this conforms with python/numpy *slice* semantics). Allowed inputs are:
87+
- ``.iloc`` is primarily integer position based (from ``0`` to
88+
``length-1`` of the axis), but may also be used with a boolean
89+
array. ``.iloc`` will raise ``IndexError`` if a requested
90+
indexer is out-of-bounds, except *slice* indexers which allow
91+
out-of-bounds indexing. (this conforms with python/numpy *slice*
92+
semantics). Allowed inputs are:
9193

9294
- An integer e.g. ``5``
9395
- A list or array of integers ``[4, 3, 0]``
9496
- A slice object with ints ``1:7``
97+
- A boolean array
9598

9699
See more at :ref:`Selection by Position <indexing.integer>`
97100

@@ -368,6 +371,7 @@ The ``.iloc`` attribute is the primary access method. The following are valid in
368371
- An integer e.g. ``5``
369372
- A list or array of integers ``[4, 3, 0]``
370373
- A slice object with ints ``1:7``
374+
- A boolean array
371375

372376
.. ipython:: python
373377

doc/source/io.rst

+35-6
Original file line numberDiff line numberDiff line change
@@ -2348,7 +2348,7 @@ Closing a Store, Context Manager
23482348
23492349
# Working with, and automatically closing the store with the context
23502350
# manager
2351-
with get_store('store.h5') as store:
2351+
with HDFStore('store.h5') as store:
23522352
store.keys()
23532353
23542354
.. ipython:: python
@@ -3393,12 +3393,34 @@ the database using :func:`~pandas.DataFrame.to_sql`.
33933393
33943394
data.to_sql('data', engine)
33953395
3396-
With some databases, writing large DataFrames can result in errors due to packet size limitations being exceeded. This can be avoided by setting the ``chunksize`` parameter when calling ``to_sql``. For example, the following writes ``data`` to the database in batches of 1000 rows at a time:
3396+
With some databases, writing large DataFrames can result in errors due to
3397+
packet size limitations being exceeded. This can be avoided by setting the
3398+
``chunksize`` parameter when calling ``to_sql``. For example, the following
3399+
writes ``data`` to the database in batches of 1000 rows at a time:
33973400

33983401
.. ipython:: python
33993402
34003403
data.to_sql('data_chunked', engine, chunksize=1000)
34013404
3405+
SQL data types
3406+
++++++++++++++
3407+
3408+
:func:`~pandas.DataFrame.to_sql` will try to map your data to an appropriate
3409+
SQL data type based on the dtype of the data. When you have columns of dtype
3410+
``object``, pandas will try to infer the data type.
3411+
3412+
You can always override the default type by specifying the desired SQL type of
3413+
any of the columns by using the ``dtype`` argument. This argument needs a
3414+
dictionary mapping column names to SQLAlchemy types (or strings for the sqlite3
3415+
fallback mode).
3416+
For example, specifying to use the sqlalchemy ``String`` type instead of the
3417+
default ``Text`` type for string columns:
3418+
3419+
.. ipython:: python
3420+
3421+
from sqlalchemy.types import String
3422+
data.to_sql('data_dtype', engine, dtype={'Col_1': String})
3423+
34023424
.. note::
34033425

34043426
Due to the limited support for timedelta's in the different database
@@ -3413,7 +3435,6 @@ With some databases, writing large DataFrames can result in errors due to packet
34133435
Because of this, reading the database table back in does **not** generate
34143436
a categorical.
34153437

3416-
34173438
Reading Tables
34183439
~~~~~~~~~~~~~~
34193440

@@ -3643,6 +3664,14 @@ data quickly, but it is not a direct replacement for a transactional database.
36433664
You can access the management console to determine project id's by:
36443665
<https://code.google.com/apis/console/b/0/?noredirect>
36453666

3667+
As of 0.15.2, the gbq module has a function ``generate_bq_schema`` which
3668+
will produce the dictionary representation of the schema.
3669+
3670+
.. code-block:: python
3671+
3672+
df = pandas.DataFrame({'A': [1.0]})
3673+
gbq.generate_bq_schema(df, default_type='STRING')
3674+
36463675
.. warning::
36473676

36483677
To use this module, you will need a valid BigQuery account. See
@@ -3766,13 +3795,13 @@ is lost when exporting.
37663795

37673796
*Stata* only supports string value labels, and so ``str`` is called on the
37683797
categories when exporting data. Exporting ``Categorical`` variables with
3769-
non-string categories produces a warning, and can result a loss of
3798+
non-string categories produces a warning, and can result a loss of
37703799
information if the ``str`` representations of the categories are not unique.
37713800

37723801
Labeled data can similarly be imported from *Stata* data files as ``Categorical``
3773-
variables using the keyword argument ``convert_categoricals`` (``True`` by default).
3802+
variables using the keyword argument ``convert_categoricals`` (``True`` by default).
37743803
The keyword argument ``order_categoricals`` (``True`` by default) determines
3775-
whether imported ``Categorical`` variables are ordered.
3804+
whether imported ``Categorical`` variables are ordered.
37763805

37773806
.. note::
37783807

doc/source/release.rst

+53-3
Original file line numberDiff line numberDiff line change
@@ -48,17 +48,67 @@ analysis / manipulation tool available in any language.
4848
pandas 0.15.2
4949
-------------
5050

51-
**Release date:** (December ??, 2014)
51+
**Release date:** (December 12, 2014)
5252

53-
This is a minor release from 0.15.1 and includes a small number of API changes, several new features, enhancements, and
54-
performance improvements along with a large number of bug fixes.
53+
This is a minor release from 0.15.1 and includes a large number of bug fixes
54+
along with several new features, enhancements, and performance improvements.
55+
A small number of API changes were necessary to fix existing bugs.
5556

5657
See the :ref:`v0.15.2 Whatsnew <whatsnew_0152>` overview for an extensive list
5758
of all API changes, enhancements and bugs that have been fixed in 0.15.2.
5859

5960
Thanks
6061
~~~~~~
6162

63+
- Aaron Staple
64+
- Angelos Evripiotis
65+
- Artemy Kolchinsky
66+
- Benoit Pointet
67+
- Brian Jacobowski
68+
- Charalampos Papaloizou
69+
- Chris Warth
70+
- David Stephens
71+
- Fabio Zanini
72+
- Francesc Via
73+
- Henry Kleynhans
74+
- Jake VanderPlas
75+
- Jan Schulz
76+
- Jeff Reback
77+
- Jeff Tratner
78+
- Joris Van den Bossche
79+
- Kevin Sheppard
80+
- Matt Suggit
81+
- Matthew Brett
82+
- Phillip Cloud
83+
- Rupert Thompson
84+
- Scott E Lasley
85+
- Stephan Hoyer
86+
- Stephen Simmons
87+
- Sylvain Corlay
88+
- Thomas Grainger
89+
- Tiago Antao
90+
- Trent Hauck
91+
- Victor Chaves
92+
- Victor Salgado
93+
- Vikram Bhandoh
94+
- WANG Aiyong
95+
- Will Holmgren
96+
- behzad nouri
97+
- broessli
98+
- charalampos papaloizou
99+
- immerrr
100+
- jnmclarty
101+
- jreback
102+
- mgilbert
103+
- onesandzeroes
104+
- peadarcoyle
105+
- rockg
106+
- seth-p
107+
- sinhrks
108+
- unutbu
109+
- wavedatalab
110+
- Åsmund Hjulstad
111+
62112
pandas 0.15.1
63113
-------------
64114

0 commit comments

Comments
 (0)