forked from pandas-dev/pandas
-
Notifications
You must be signed in to change notification settings - Fork 2
/
Copy pathv0.17.1.txt
204 lines (157 loc) · 11.5 KB
/
v0.17.1.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
.. _whatsnew_0171:
v0.17.1 (November 21, 2015)
---------------------------
.. note::
We are proud to announce that *pandas* has become a sponsored project of the (`NUMFocus organization`_). This will help ensure the success of development of *pandas* as a world-class open-source project.
.. _numfocus organization: http://www.numfocus.org/blog/numfocus-announces-new-fiscally-sponsored-project-pandas
This is a minor bug-fix release from 0.17.0 and includes a large number of
bug fixes along several new features, enhancements, and performance improvements.
We recommend that all users upgrade to this version.
Highlights include:
- Support for Conditional HTML Formatting, see :ref:`here <whatsnew_0171.style>`
- Releasing the GIL on the csv reader & other ops, see :ref:`here <whatsnew_0171.performance>`
- Fixed regression in ``DataFrame.drop_duplicates`` from 0.16.2, causing incorrect results on integer values (:issue:`11376`)
.. contents:: What's new in v0.17.1
:local:
:backlinks: none
New features
~~~~~~~~~~~~
.. _whatsnew_0171.style:
Conditional HTML Formatting
^^^^^^^^^^^^^^^^^^^^^^^^^^^
.. warning::
This is a new feature and is under active development.
We'll be adding features an possibly making breaking changes in future
releases. Feedback is welcome_.
.. _welcome: https://github.com/pandas-dev/pandas/issues/11610
We've added *experimental* support for conditional HTML formatting:
the visual styling of a DataFrame based on the data.
The styling is accomplished with HTML and CSS.
Acesses the styler class with the :attr:`pandas.DataFrame.style`, attribute,
an instance of :class:`~pandas.core.style.Styler` with your data attached.
Here's a quick example:
.. ipython:: python
np.random.seed(123)
df = DataFrame(np.random.randn(10, 5), columns=list('abcde'))
html = df.style.background_gradient(cmap='viridis', low=.5)
We can render the HTML to get the following table.
.. raw:: html
:file: whatsnew_0171_html_table.html
:class:`~pandas.core.style.Styler` interacts nicely with the Jupyter Notebook.
See the :ref:`documentation <style>` for more.
.. _whatsnew_0171.enhancements:
Enhancements
~~~~~~~~~~~~
- ``DatetimeIndex`` now supports conversion to strings with ``astype(str)`` (:issue:`10442`)
- Support for ``compression`` (gzip/bz2) in :meth:`pandas.DataFrame.to_csv` (:issue:`7615`)
- ``pd.read_*`` functions can now also accept :class:`python:pathlib.Path`, or :class:`py:py._path.local.LocalPath`
objects for the ``filepath_or_buffer`` argument. (:issue:`11033`)
- The ``DataFrame`` and ``Series`` functions ``.to_csv()``, ``.to_html()`` and ``.to_latex()`` can now handle paths beginning with tildes (e.g. ``~/Documents/``) (:issue:`11438`)
- ``DataFrame`` now uses the fields of a ``namedtuple`` as columns, if columns are not supplied (:issue:`11181`)
- ``DataFrame.itertuples()`` now returns ``namedtuple`` objects, when possible. (:issue:`11269`, :issue:`11625`)
- Added ``axvlines_kwds`` to parallel coordinates plot (:issue:`10709`)
- Option to ``.info()`` and ``.memory_usage()`` to provide for deep introspection of memory consumption. Note that this can be expensive to compute and therefore is an optional parameter. (:issue:`11595`)
.. ipython:: python
df = DataFrame({'A' : ['foo']*1000})
df['B'] = df['A'].astype('category')
# shows the '+' as we have object dtypes
df.info()
# we have an accurate memory assessment (but can be expensive to compute this)
df.info(memory_usage='deep')
- ``Index`` now has a ``fillna`` method (:issue:`10089`)
.. ipython:: python
pd.Index([1, np.nan, 3]).fillna(2)
- Series of type ``category`` now make ``.str.<...>`` and ``.dt.<...>`` accessor methods / properties available, if the categories are of that type. (:issue:`10661`)
.. ipython:: python
s = pd.Series(list('aabb')).astype('category')
s
s.str.contains("a")
date = pd.Series(pd.date_range('1/1/2015', periods=5)).astype('category')
date
date.dt.day
- ``pivot_table`` now has a ``margins_name`` argument so you can use something other than the default of 'All' (:issue:`3335`)
- Implement export of ``datetime64[ns, tz]`` dtypes with a fixed HDF5 store (:issue:`11411`)
- Pretty printing sets (e.g. in DataFrame cells) now uses set literal syntax (``{x, y}``) instead of
Legacy Python syntax (``set([x, y])``) (:issue:`11215`)
- Improve the error message in :func:`pandas.io.gbq.to_gbq` when a streaming insert fails (:issue:`11285`)
and when the DataFrame does not match the schema of the destination table (:issue:`11359`)
.. _whatsnew_0171.api:
API changes
~~~~~~~~~~~
- raise ``NotImplementedError`` in ``Index.shift`` for non-supported index types (:issue:`8038`)
- ``min`` and ``max`` reductions on ``datetime64`` and ``timedelta64`` dtyped series now
result in ``NaT`` and not ``nan`` (:issue:`11245`).
- Indexing with a null key will raise a ``TypeError``, instead of a ``ValueError`` (:issue:`11356`)
- ``Series.ptp`` will now ignore missing values by default (:issue:`11163`)
.. _whatsnew_0171.deprecations:
Deprecations
^^^^^^^^^^^^
- The ``pandas.io.ga`` module which implements ``google-analytics`` support is deprecated and will be removed in a future version (:issue:`11308`)
- Deprecate the ``engine`` keyword in ``.to_csv()``, which will be removed in a future version (:issue:`11274`)
.. _whatsnew_0171.performance:
Performance Improvements
~~~~~~~~~~~~~~~~~~~~~~~~
- Checking monotonic-ness before sorting on an index (:issue:`11080`)
- ``Series.dropna`` performance improvement when its dtype can't contain ``NaN`` (:issue:`11159`)
- Release the GIL on most datetime field operations (e.g. ``DatetimeIndex.year``, ``Series.dt.year``), normalization, and conversion to and from ``Period``, ``DatetimeIndex.to_period`` and ``PeriodIndex.to_timestamp`` (:issue:`11263`)
- Release the GIL on some rolling algos: ``rolling_median``, ``rolling_mean``, ``rolling_max``, ``rolling_min``, ``rolling_var``, ``rolling_kurt``, ``rolling_skew`` (:issue:`11450`)
- Release the GIL when reading and parsing text files in ``read_csv``, ``read_table`` (:issue:`11272`)
- Improved performance of ``rolling_median`` (:issue:`11450`)
- Improved performance of ``to_excel`` (:issue:`11352`)
- Performance bug in repr of ``Categorical`` categories, which was rendering the strings before chopping them for display (:issue:`11305`)
- Performance improvement in ``Categorical.remove_unused_categories``, (:issue:`11643`).
- Improved performance of ``Series`` constructor with no data and ``DatetimeIndex`` (:issue:`11433`)
- Improved performance of ``shift``, ``cumprod``, and ``cumsum`` with groupby (:issue:`4095`)
.. _whatsnew_0171.bug_fixes:
Bug Fixes
~~~~~~~~~
- ``SparseArray.__iter__()`` now does not cause ``PendingDeprecationWarning`` in Python 3.5 (:issue:`11622`)
- Regression from 0.16.2 for output formatting of long floats/nan, restored in (:issue:`11302`)
- ``Series.sort_index()`` now correctly handles the ``inplace`` option (:issue:`11402`)
- Incorrectly distributed .c file in the build on ``PyPi`` when reading a csv of floats and passing ``na_values=<a scalar>`` would show an exception (:issue:`11374`)
- Bug in ``.to_latex()`` output broken when the index has a name (:issue:`10660`)
- Bug in ``HDFStore.append`` with strings whose encoded length exceded the max unencoded length (:issue:`11234`)
- Bug in merging ``datetime64[ns, tz]`` dtypes (:issue:`11405`)
- Bug in ``HDFStore.select`` when comparing with a numpy scalar in a where clause (:issue:`11283`)
- Bug in using ``DataFrame.ix`` with a multi-index indexer (:issue:`11372`)
- Bug in ``date_range`` with ambigous endpoints (:issue:`11626`)
- Prevent adding new attributes to the accessors ``.str``, ``.dt`` and ``.cat``. Retrieving such
a value was not possible, so error out on setting it. (:issue:`10673`)
- Bug in tz-conversions with an ambiguous time and ``.dt`` accessors (:issue:`11295`)
- Bug in output formatting when using an index of ambiguous times (:issue:`11619`)
- Bug in comparisons of Series vs list-likes (:issue:`11339`)
- Bug in ``DataFrame.replace`` with a ``datetime64[ns, tz]`` and a non-compat to_replace (:issue:`11326`, :issue:`11153`)
- Bug in ``isnull`` where ``numpy.datetime64('NaT')`` in a ``numpy.array`` was not determined to be null(:issue:`11206`)
- Bug in list-like indexing with a mixed-integer Index (:issue:`11320`)
- Bug in ``pivot_table`` with ``margins=True`` when indexes are of ``Categorical`` dtype (:issue:`10993`)
- Bug in ``DataFrame.plot`` cannot use hex strings colors (:issue:`10299`)
- Regression in ``DataFrame.drop_duplicates`` from 0.16.2, causing incorrect results on integer values (:issue:`11376`)
- Bug in ``pd.eval`` where unary ops in a list error (:issue:`11235`)
- Bug in ``squeeze()`` with zero length arrays (:issue:`11230`, :issue:`8999`)
- Bug in ``describe()`` dropping column names for hierarchical indexes (:issue:`11517`)
- Bug in ``DataFrame.pct_change()`` not propagating ``axis`` keyword on ``.fillna`` method (:issue:`11150`)
- Bug in ``.to_csv()`` when a mix of integer and string column names are passed as the ``columns`` parameter (:issue:`11637`)
- Bug in indexing with a ``range``, (:issue:`11652`)
- Bug in inference of numpy scalars and preserving dtype when setting columns (:issue:`11638`)
- Bug in ``to_sql`` using unicode column names giving UnicodeEncodeError with (:issue:`11431`).
- Fix regression in setting of ``xticks`` in ``plot`` (:issue:`11529`).
- Bug in ``holiday.dates`` where observance rules could not be applied to holiday and doc enhancement (:issue:`11477`, :issue:`11533`)
- Fix plotting issues when having plain ``Axes`` instances instead of ``SubplotAxes`` (:issue:`11520`, :issue:`11556`).
- Bug in ``DataFrame.to_latex()`` produces an extra rule when ``header=False`` (:issue:`7124`)
- Bug in ``df.groupby(...).apply(func)`` when a func returns a ``Series`` containing a new datetimelike column (:issue:`11324`)
- Bug in ``pandas.json`` when file to load is big (:issue:`11344`)
- Bugs in ``to_excel`` with duplicate columns (:issue:`11007`, :issue:`10982`, :issue:`10970`)
- Fixed a bug that prevented the construction of an empty series of dtype ``datetime64[ns, tz]`` (:issue:`11245`).
- Bug in ``read_excel`` with multi-index containing integers (:issue:`11317`)
- Bug in ``to_excel`` with openpyxl 2.2+ and merging (:issue:`11408`)
- Bug in ``DataFrame.to_dict()`` produces a ``np.datetime64`` object instead of ``Timestamp`` when only datetime is present in data (:issue:`11327`)
- Bug in ``DataFrame.corr()`` raises exception when computes Kendall correlation for DataFrames with boolean and not boolean columns (:issue:`11560`)
- Bug in the link-time error caused by C ``inline`` functions on FreeBSD 10+ (with ``clang``) (:issue:`10510`)
- Bug in ``DataFrame.to_csv`` in passing through arguments for formatting ``MultiIndexes``, including ``date_format`` (:issue:`7791`)
- Bug in ``DataFrame.join()`` with ``how='right'`` producing a ``TypeError`` (:issue:`11519`)
- Bug in ``Series.quantile`` with empty list results has ``Index`` with ``object`` dtype (:issue:`11588`)
- Bug in ``pd.merge`` results in empty ``Int64Index`` rather than ``Index(dtype=object)`` when the merge result is empty (:issue:`11588`)
- Bug in ``Categorical.remove_unused_categories`` when having ``NaN`` values (:issue:`11599`)
- Bug in ``DataFrame.to_sparse()`` loses column names for MultiIndexes (:issue:`11600`)
- Bug in ``DataFrame.round()`` with non-unique column index producing a Fatal Python error (:issue:`11611`)
- Bug in ``DataFrame.round()`` with ``decimals`` being a non-unique indexed Series producing extra columns (:issue:`11618`)