Skip to content

Commit 8529343

Browse files
committed
Allow non-default indexes in to_parquet.
...when supported by the underlying engine. Fixes pandas-dev#18581
1 parent fdba133 commit 8529343

File tree

7 files changed

+206
-205
lines changed

7 files changed

+206
-205
lines changed

ci/requirements-2.7.sh

+1-1
Original file line numberDiff line numberDiff line change
@@ -4,4 +4,4 @@ source activate pandas
44

55
echo "install 27"
66

7-
conda install -n pandas -c conda-forge feather-format pyarrow=0.4.1 fastparquet
7+
conda install -n pandas -c conda-forge feather-format pyarrow=0.7.0 fastparquet

ci/requirements-3.5.sh

+1-1
Original file line numberDiff line numberDiff line change
@@ -8,4 +8,4 @@ echo "install 35"
88
conda remove -n pandas python-dateutil --force
99
pip install python-dateutil
1010

11-
conda install -n pandas -c conda-forge feather-format pyarrow=0.5.0
11+
conda install -n pandas -c conda-forge feather-format pyarrow=0.7.0

doc/source/install.rst

+1-1
Original file line numberDiff line numberDiff line change
@@ -233,7 +233,7 @@ Optional Dependencies
233233
* `xarray <http://xarray.pydata.org>`__: pandas like handling for > 2 dims, needed for converting Panels to xarray objects. Version 0.7.0 or higher is recommended.
234234
* `PyTables <http://www.pytables.org>`__: necessary for HDF5-based storage. Version 3.0.0 or higher required, Version 3.2.1 or higher highly recommended.
235235
* `Feather Format <https://github.com/wesm/feather>`__: necessary for feather-based storage, version 0.3.1 or higher.
236-
* `Apache Parquet <https://parquet.apache.org/>`__, either `pyarrow <http://arrow.apache.org/docs/python/>`__ (>= 0.4.1) or `fastparquet <https://fastparquet.readthedocs.io/en/latest/necessary>`__ (>= 0.0.6) for parquet-based storage. The `snappy <https://pypi.python.org/pypi/python-snappy>`__ and `brotli <https://pypi.python.org/pypi/brotlipy>`__ are available for compression support.
236+
* `Apache Parquet <https://parquet.apache.org/>`__, either `pyarrow <http://arrow.apache.org/docs/python/>`__ (>= 0.7.0) or `fastparquet <https://fastparquet.readthedocs.io/en/latest/necessary>`__ (>= 0.1.0) for parquet-based storage. The `snappy <https://pypi.python.org/pypi/python-snappy>`__ and `brotli <https://pypi.python.org/pypi/brotlipy>`__ are available for compression support.
237237
* `SQLAlchemy <http://www.sqlalchemy.org>`__: for SQL database support. Version 0.8.1 or higher recommended. Besides SQLAlchemy, you also need a database specific driver. You can find an overview of supported drivers for each SQL dialect in the `SQLAlchemy docs <http://docs.sqlalchemy.org/en/latest/dialects/index.html>`__. Some common drivers are:
238238

239239
* `psycopg2 <http://initd.org/psycopg/>`__: for PostgreSQL

doc/source/io.rst

+1-4
Original file line numberDiff line numberDiff line change
@@ -4504,11 +4504,8 @@ dtypes, including extension dtypes such as datetime with tz.
45044504

45054505
Several caveats.
45064506

4507-
- The format will NOT write an ``Index``, or ``MultiIndex`` for the
4508-
``DataFrame`` and will raise an error if a non-default one is provided. You
4509-
can ``.reset_index()`` to store the index or ``.reset_index(drop=True)`` to
4510-
ignore it.
45114507
- Duplicate column names and non-string columns names are not supported
4508+
- Index level names, if specified, must be strings
45124509
- Categorical dtypes can be serialized to parquet, but will de-serialize as ``object`` dtype.
45134510
- Non supported types include ``Period`` and actual python object types. These will raise a helpful error message
45144511
on an attempt at serialization.

doc/source/whatsnew/v0.22.0.txt

+26-52
Original file line numberDiff line numberDiff line change
@@ -65,23 +65,15 @@ levels <merging.merge_on_columns_and_levels>` documentation section.
6565

6666
.. _whatsnew_0220.enhancements.ran_inf:
6767

68-
handle ``inf`` values properly when ``NaN`` are present
69-
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
68+
``.rank()`` handles ``inf`` values when ``NaN`` are present
69+
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
7070

71-
In previous version, ``inf`` elements were assigned ``NaN`` as their ranks. Now ranks are calculated properly. (:issue:`6945`)
71+
In previous versions, ``.rank()`` would assign ``inf`` elements ``NaN`` as their ranks. Now ranks are calculated properly. (:issue:`6945`)
7272

7373
.. ipython:: python
7474

75-
In [9]: s = pd.Series([-np.inf, 0, 1, np.nan, np.inf])
76-
77-
In [10]: s
78-
Out[10]:
79-
0 -inf
80-
1 0.000000
81-
2 1.000000
82-
3 NaN
83-
4 inf
84-
dtype: float64
75+
s = pd.Series([-np.inf, 0, 1, np.nan, np.inf])
76+
s
8577

8678
Previous Behavior:
8779

@@ -100,28 +92,14 @@ Current Behavior
10092

10193
.. ipython:: python
10294

103-
In [4]: s.rank()
104-
Out[4]:
105-
0 1.0
106-
1 2.0
107-
2 3.0
108-
3 NaN
109-
4 4.0
110-
dtype: float64
95+
s.rank()
11196

11297
Furthermore, previously if you rank ``inf`` or ``-inf`` values together with ``NaN`` values, the calculation won't distinguish ``NaN`` from infinity when using 'top' or 'bottom' argument.
11398

11499
.. ipython:: python
115100

116-
In [14]: s = pd.Series([np.nan, np.nan, -np.inf, -np.inf])
117-
118-
In [15]: s
119-
Out[15]:
120-
0 NaN
121-
1 NaN
122-
2 -inf
123-
3 -inf
124-
dtype: float64
101+
s = pd.Series([np.nan, np.nan, -np.inf, -np.inf])
102+
s
125103

126104
Previous Behavior:
127105

@@ -139,14 +117,7 @@ Current Behavior
139117

140118
.. ipython:: python
141119

142-
In [4]: s.rank(na_option='top')
143-
Out[4]:
144-
0 1.5
145-
1 1.5
146-
2 3.5
147-
3 3.5
148-
dtype: float64
149-
120+
s.rank(na_option='top')
150121

151122
.. _whatsnew_0220.enhancements.other:
152123

@@ -163,6 +134,7 @@ Other Enhancements
163134
- :func:`pandas.read_clipboard` updated to use qtpy, falling back to PyQt5 and then PyQt4, adding compatibility with Python3 and multiple python-qt bindings (:issue:`17722`)
164135
- Improved wording of ``ValueError`` raised in :func:`read_csv` when the ``usecols`` argument cannot match all columns. (:issue:`17301`)
165136
- :func:`DataFrame.corrwith` now silently drops non-numeric columns when passed a Series. Before, an exception was raised (:issue:`18570`).
137+
- :class:`IntervalIndex` now supports time zone aware ``Interval`` objects (:issue:`18537`, :issue:`18538`)
166138

167139

168140
.. _whatsnew_0220.api_breaking:
@@ -187,6 +159,9 @@ If installed, we now require:
187159
+-----------------+-----------------+----------+
188160

189161

162+
- Building pandas for development now requires ``cython >= 0.24`` (:issue:`18613`)
163+
- Building from source now explicity requires ``setuptools`` in ``setup.py`` (:issue:`18113`)
164+
190165
.. _whatsnew_0220.api:
191166

192167
Other API Changes
@@ -200,7 +175,7 @@ Other API Changes
200175
- :class:`Timestamp` will no longer silently ignore unused or invalid ``tz`` or ``tzinfo`` keyword arguments (:issue:`17690`)
201176
- :class:`Timestamp` will no longer silently ignore invalid ``freq`` arguments (:issue:`5168`)
202177
- :class:`CacheableOffset` and :class:`WeekDay` are no longer available in the ``pandas.tseries.offsets`` module (:issue:`17830`)
203-
- `tseries.frequencies.get_freq_group()` and `tseries.frequencies.DAYS` are removed from the public API (:issue:`18034`)
178+
- ``pandas.tseries.frequencies.get_freq_group()`` and ``pandas.tseries.frequencies.DAYS`` are removed from the public API (:issue:`18034`)
204179
- :func:`Series.truncate` and :func:`DataFrame.truncate` will raise a ``ValueError`` if the index is not sorted instead of an unhelpful ``KeyError`` (:issue:`17935`)
205180
- :func:`Index.map` can now accept ``Series`` and dictionary input objects (:issue:`12756`, :issue:`18482`, :issue:`18509`).
206181
- :func:`Dataframe.unstack` will now default to filling with ``np.nan`` for ``object`` columns. (:issue:`12815`)
@@ -209,11 +184,10 @@ Other API Changes
209184
- Restricted ``DateOffset`` keyword arguments. Previously, ``DateOffset`` subclasses allowed arbitrary keyword arguments which could lead to unexpected behavior. Now, only valid arguments will be accepted. (:issue:`17176`, :issue:`18226`).
210185
- :func:`DataFrame.from_items` provides a more informative error message when passed scalar values (:issue:`17312`)
211186
- When created with duplicate labels, ``MultiIndex`` now raises a ``ValueError``. (:issue:`17464`)
212-
- Building from source now explicity requires ``setuptools`` in ``setup.py`` (:issue:`18113`)
213187
- :func:`Series.fillna` now raises a ``TypeError`` instead of a ``ValueError`` when passed a list, tuple or DataFrame as a ``value`` (:issue:`18293`)
214188
- :func:`pandas.DataFrame.merge` no longer casts a ``float`` column to ``object`` when merging on ``int`` and ``float`` columns (:issue:`16572`)
215189
- The default NA value for :class:`UInt64Index` has changed from 0 to ``NaN``, which impacts methods that mask with NA, such as ``UInt64Index.where()`` (:issue:`18398`)
216-
- Building pandas for development now requires ``cython >= 0.24`` (:issue:`18613`)
190+
- Refactored ``setup.py`` to use ``find_packages`` instead of explicitly listing out all subpackages (:issue:`18535`)
217191

218192
.. _whatsnew_0220.deprecations:
219193

@@ -222,7 +196,7 @@ Deprecations
222196

223197
- ``Series.from_array`` and ``SparseSeries.from_array`` are deprecated. Use the normal constructor ``Series(..)`` and ``SparseSeries(..)`` instead (:issue:`18213`).
224198
- ``DataFrame.as_matrix`` is deprecated. Use ``DataFrame.values`` instead (:issue:`18458`).
225-
- ``Series.asobject``, ``DatetimeIndex.asobject``, ``PeriodIndex.asobject`` and ``TimeDeltaIndex.asobject`` have been deprecated. Use '.astype(object)' instead (:issue:`18572`)
199+
- ``Series.asobject``, ``DatetimeIndex.asobject``, ``PeriodIndex.asobject`` and ``TimeDeltaIndex.asobject`` have been deprecated. Use ``.astype(object)`` instead (:issue:`18572`)
226200

227201
.. _whatsnew_0220.prior_deprecations:
228202

@@ -237,8 +211,8 @@ Removal of prior version deprecations/changes
237211
- The ``SparseList`` class has been removed (:issue:`14007`)
238212
- The ``pandas.io.wb`` and ``pandas.io.data`` stub modules have been removed (:issue:`13735`)
239213
- ``Categorical.from_array`` has been removed (:issue:`13854`)
240-
- The ``freq`` parameter has been removed from the ``rolling``/``expanding``/``ewm`` methods of DataFrame
241-
and Series (deprecated since v0.18). Instead, resample before calling the methods. (:issue:18601)
214+
- The ``freq`` and ``how`` parameters have been removed from the ``rolling``/``expanding``/``ewm`` methods of DataFrame
215+
and Series (deprecated since v0.18). Instead, resample before calling the methods. (:issue:18601 & :issue:18668)
242216
- ``DatetimeIndex.to_datetime``, ``Timestamp.to_datetime``, ``PeriodIndex.to_datetime``, and ``Index.to_datetime`` have been removed (:issue:`8254`, :issue:`14096`, :issue:`14113`)
243217

244218
.. _whatsnew_0220.performance:
@@ -254,7 +228,7 @@ Performance Improvements
254228
- The overriden ``Timedelta`` properties of days, seconds and microseconds have been removed, leveraging their built-in Python versions instead (:issue:`18242`)
255229
- ``Series`` construction will reduce the number of copies made of the input data in certain cases (:issue:`17449`)
256230
- Improved performance of :func:`Series.dt.date` and :func:`DatetimeIndex.date` (:issue:`18058`)
257-
- Improved performance of ``IntervalIndex.symmetric_difference()`` (:issue:`18475`)
231+
- Improved performance of :func:`IntervalIndex.symmetric_difference()` (:issue:`18475`)
258232
- Improved performance of ``DatetimeIndex`` and ``Series`` arithmetic operations with Business-Month and Business-Quarter frequencies (:issue:`18489`)
259233

260234
.. _whatsnew_0220.docs:
@@ -275,7 +249,7 @@ Bug Fixes
275249
Conversion
276250
^^^^^^^^^^
277251

278-
- Bug in :class:`Index` constructor with `dtype='uint64'` where int-like floats were not coerced to :class:`UInt64Index` (:issue:`18400`)
252+
- Bug in :class:`Index` constructor with ``dtype='uint64'`` where int-like floats were not coerced to :class:`UInt64Index` (:issue:`18400`)
279253
- Bug in the :class:`DataFrame` constructor in which data containing very large positive or very large negative numbers was causing ``OverflowError`` (:issue:`18584`)
280254
- Fixed a bug where creating a Series from an array that contains both tz-naive and tz-aware values will result in a Series whose dtype is tz-aware instead of object (:issue:`16406`)
281255
- Adding a ``Period`` object to a ``datetime`` or ``Timestamp`` object will now correctly raise a ``TypeError`` (:issue:`17983`)
@@ -287,13 +261,13 @@ Indexing
287261

288262
- Bug in :func:`Series.truncate` which raises ``TypeError`` with a monotonic ``PeriodIndex`` (:issue:`17717`)
289263
- Bug in :func:`DataFrame.groupby` where tuples were interpreted as lists of keys rather than as keys (:issue:`17979`, :issue:`18249`)
290-
- Bug in :func:`MultiIndex.remove_unused_levels`` which would fill nan values (:issue:`18417`)
264+
- Bug in :func:`MultiIndex.remove_unused_levels` which would fill nan values (:issue:`18417`)
291265
- Bug in :func:`MultiIndex.from_tuples`` which would fail to take zipped tuples in python3 (:issue:`18434`)
292-
- Bug in :class:`Index`` construction from list of mixed type tuples (:issue:`18505`)
266+
- Bug in :class:`Index` construction from list of mixed type tuples (:issue:`18505`)
293267
- Bug in :class:`IntervalIndex` where empty and purely NA data was constructed inconsistently depending on the construction method (:issue:`18421`)
294-
- Bug in ``IntervalIndex.symmetric_difference()`` where the symmetric difference with a non-``IntervalIndex`` did not raise (:issue:`18475`)
268+
- Bug in :func:`IntervalIndex.symmetric_difference` where the symmetric difference with a non-``IntervalIndex`` did not raise (:issue:`18475`)
295269
- Bug in indexing a datetimelike ``Index`` that raised ``ValueError`` instead of ``IndexError`` (:issue:`18386`).
296-
270+
- Bug in tz-aware :class:`DatetimeIndex` where addition/subtraction with a :class:`TimedeltaIndex` or array with ``dtype='timedelta64[ns]'`` was incorrect (:issue:`17558`)
297271

298272
I/O
299273
^^^
@@ -351,5 +325,5 @@ Categorical
351325
Other
352326
^^^^^
353327

354-
- Improved error message when attempting to use a Python keyword as an identifier in a numexpr query (:issue:`18221`)
355-
-
328+
- Improved error message when attempting to use a Python keyword as an identifier in a ``numexpr`` backed query (:issue:`18221`)
329+
-

0 commit comments

Comments
 (0)