Allow non-default indexes in to_parquet.

dhirschfeld · dhirschfeld · commit 8529343f31bd · 2017-12-09T17:33:12.000+10:00
...when supported by the underlying engine. Fixes pandas-dev#18581
diff --git a/ci/requirements-2.7.sh b/ci/requirements-2.7.sh
@@ -4,4 +4,4 @@ source activate pandas
 
 echo "install 27"
 
-conda install -n pandas -c conda-forge feather-format pyarrow=0.4.1 fastparquet
+conda install -n pandas -c conda-forge feather-format pyarrow=0.7.0 fastparquet
diff --git a/ci/requirements-3.5.sh b/ci/requirements-3.5.sh
@@ -8,4 +8,4 @@ echo "install 35"
 conda remove -n pandas python-dateutil --force
 pip install python-dateutil
 
-conda install -n pandas -c conda-forge feather-format pyarrow=0.5.0
+conda install -n pandas -c conda-forge feather-format pyarrow=0.7.0
diff --git a/doc/source/install.rst b/doc/source/install.rst
@@ -233,7 +233,7 @@ Optional Dependencies
 * `xarray <http://xarray.pydata.org>`__: pandas like handling for > 2 dims, needed for converting Panels to xarray objects. Version 0.7.0 or higher is recommended.
 * `PyTables <http://www.pytables.org>`__: necessary for HDF5-based storage. Version 3.0.0 or higher required, Version 3.2.1 or higher highly recommended.
 * `Feather Format <https://github.com/wesm/feather>`__: necessary for feather-based storage, version 0.3.1 or higher.
-* `Apache Parquet <https://parquet.apache.org/>`__, either `pyarrow <http://arrow.apache.org/docs/python/>`__ (>= 0.4.1) or `fastparquet <https://fastparquet.readthedocs.io/en/latest/necessary>`__ (>= 0.0.6) for parquet-based storage. The `snappy <https://pypi.python.org/pypi/python-snappy>`__ and `brotli <https://pypi.python.org/pypi/brotlipy>`__ are available for compression support.
+* `Apache Parquet <https://parquet.apache.org/>`__, either `pyarrow <http://arrow.apache.org/docs/python/>`__ (>= 0.7.0) or `fastparquet <https://fastparquet.readthedocs.io/en/latest/necessary>`__ (>= 0.1.0) for parquet-based storage. The `snappy <https://pypi.python.org/pypi/python-snappy>`__ and `brotli <https://pypi.python.org/pypi/brotlipy>`__ are available for compression support.
 * `SQLAlchemy <http://www.sqlalchemy.org>`__: for SQL database support. Version 0.8.1 or higher recommended. Besides SQLAlchemy, you also need a database specific driver. You can find an overview of supported drivers for each SQL dialect in the `SQLAlchemy docs <http://docs.sqlalchemy.org/en/latest/dialects/index.html>`__. Some common drivers are:
 
   * `psycopg2 <http://initd.org/psycopg/>`__: for PostgreSQL
diff --git a/doc/source/io.rst b/doc/source/io.rst
@@ -4504,11 +4504,8 @@ dtypes, including extension dtypes such as datetime with tz.
 
 Several caveats.
 
-- The format will NOT write an ``Index``, or ``MultiIndex`` for the
-  ``DataFrame`` and will raise an error if a non-default one is provided. You
-  can ``.reset_index()`` to store the index or ``.reset_index(drop=True)`` to
-  ignore it.
 - Duplicate column names and non-string columns names are not supported
+- Index level names, if specified, must be strings
 - Categorical dtypes can be serialized to parquet, but will de-serialize as ``object`` dtype.
 - Non supported types include ``Period`` and actual python object types. These will raise a helpful error message
   on an attempt at serialization.
diff --git a/doc/source/whatsnew/v0.22.0.txt b/doc/source/whatsnew/v0.22.0.txt
@@ -65,23 +65,15 @@ levels <merging.merge_on_columns_and_levels>` documentation section.
 
 .. _whatsnew_0220.enhancements.ran_inf:
 
-handle ``inf`` values properly when ``NaN`` are present
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+``.rank()`` handles ``inf`` values when ``NaN`` are present
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 
-In previous version, ``inf`` elements were assigned ``NaN`` as their ranks. Now ranks are calculated properly. (:issue:`6945`)
+In previous versions, ``.rank()`` would assign ``inf`` elements ``NaN`` as their ranks. Now ranks are calculated properly. (:issue:`6945`)
 
 .. ipython:: python
 
-    In [9]: s = pd.Series([-np.inf, 0, 1, np.nan, np.inf])
-
-    In [10]: s
-    Out[10]:
-    0        -inf
-    1    0.000000
-    2    1.000000
-    3         NaN
-    4         inf
-    dtype: float64
+    s = pd.Series([-np.inf, 0, 1, np.nan, np.inf])
+    s
 
 Previous Behavior:
 
@@ -100,28 +92,14 @@ Current Behavior
 
 .. ipython:: python
 
-    In [4]: s.rank()
-    Out[4]:
-    0    1.0
-    1    2.0
-    2    3.0
-    3    NaN
-    4    4.0
-    dtype: float64
+    s.rank()
 
 Furthermore, previously if you rank ``inf`` or ``-inf`` values together with ``NaN`` values, the calculation won't distinguish ``NaN`` from infinity when using 'top' or 'bottom' argument.
 
 .. ipython:: python
 
-    In [14]: s = pd.Series([np.nan, np.nan, -np.inf, -np.inf])
-
-    In [15]: s
-    Out[15]:
-    0    NaN
-    1    NaN
-    2   -inf
-    3   -inf
-    dtype: float64
+    s = pd.Series([np.nan, np.nan, -np.inf, -np.inf])
+    s
 
 Previous Behavior:
 
@@ -139,14 +117,7 @@ Current Behavior
 
 .. ipython:: python
 
-    In [4]: s.rank(na_option='top')
-    Out[4]:
-    0    1.5
-    1    1.5
-    2    3.5
-    3    3.5
-    dtype: float64
-
+    s.rank(na_option='top')
 
 .. _whatsnew_0220.enhancements.other:
 
@@ -163,6 +134,7 @@ Other Enhancements
 - :func:`pandas.read_clipboard` updated to use qtpy, falling back to PyQt5 and then PyQt4, adding compatibility with Python3 and multiple python-qt bindings (:issue:`17722`)
 - Improved wording of ``ValueError`` raised in :func:`read_csv` when the ``usecols`` argument cannot match all columns. (:issue:`17301`)
 - :func:`DataFrame.corrwith` now silently drops non-numeric columns when passed a Series. Before, an exception was raised (:issue:`18570`).
+- :class:`IntervalIndex` now supports time zone aware ``Interval`` objects (:issue:`18537`, :issue:`18538`)
 
 
 .. _whatsnew_0220.api_breaking:
@@ -187,6 +159,9 @@ If installed, we now require:
    +-----------------+-----------------+----------+
 
 
+- Building pandas for development now requires ``cython >= 0.24`` (:issue:`18613`)
+- Building from source now explicity requires ``setuptools`` in ``setup.py`` (:issue:`18113`)
+
 .. _whatsnew_0220.api:
 
 Other API Changes
@@ -200,7 +175,7 @@ Other API Changes
 - :class:`Timestamp` will no longer silently ignore unused or invalid ``tz`` or ``tzinfo`` keyword arguments (:issue:`17690`)
 - :class:`Timestamp` will no longer silently ignore invalid ``freq`` arguments (:issue:`5168`)
 - :class:`CacheableOffset` and :class:`WeekDay` are no longer available in the ``pandas.tseries.offsets`` module (:issue:`17830`)
-- `tseries.frequencies.get_freq_group()` and `tseries.frequencies.DAYS` are removed from the public API (:issue:`18034`)
+- ``pandas.tseries.frequencies.get_freq_group()`` and ``pandas.tseries.frequencies.DAYS`` are removed from the public API (:issue:`18034`)
 - :func:`Series.truncate` and :func:`DataFrame.truncate` will raise a ``ValueError`` if the index is not sorted instead of an unhelpful ``KeyError`` (:issue:`17935`)
 - :func:`Index.map` can now accept ``Series`` and dictionary input objects (:issue:`12756`, :issue:`18482`, :issue:`18509`).
 - :func:`Dataframe.unstack` will now default to filling with ``np.nan`` for ``object`` columns. (:issue:`12815`)
@@ -209,11 +184,10 @@ Other API Changes
 - Restricted ``DateOffset`` keyword arguments. Previously, ``DateOffset`` subclasses allowed arbitrary keyword arguments which could lead to unexpected behavior. Now, only valid arguments will be accepted. (:issue:`17176`, :issue:`18226`).
 - :func:`DataFrame.from_items` provides a more informative error message when passed scalar values (:issue:`17312`)
 - When created with duplicate labels, ``MultiIndex`` now raises a ``ValueError``. (:issue:`17464`)
-- Building from source now explicity requires ``setuptools`` in ``setup.py`` (:issue:`18113`)
 - :func:`Series.fillna` now raises a ``TypeError`` instead of a ``ValueError`` when passed a list, tuple or DataFrame as a ``value`` (:issue:`18293`)
 - :func:`pandas.DataFrame.merge` no longer casts a ``float`` column to ``object`` when merging on ``int`` and ``float`` columns (:issue:`16572`)
 - The default NA value for :class:`UInt64Index` has changed from 0 to ``NaN``, which impacts methods that mask with NA, such as ``UInt64Index.where()`` (:issue:`18398`)
-- Building pandas for development now requires ``cython >= 0.24`` (:issue:`18613`)
+- Refactored ``setup.py`` to use ``find_packages`` instead of explicitly listing out all subpackages (:issue:`18535`)
 
 .. _whatsnew_0220.deprecations:
 
@@ -222,7 +196,7 @@ Deprecations
 
 - ``Series.from_array`` and ``SparseSeries.from_array`` are deprecated. Use the normal constructor ``Series(..)`` and ``SparseSeries(..)`` instead (:issue:`18213`).
 - ``DataFrame.as_matrix`` is deprecated. Use ``DataFrame.values`` instead (:issue:`18458`).
-- ``Series.asobject``, ``DatetimeIndex.asobject``, ``PeriodIndex.asobject`` and ``TimeDeltaIndex.asobject`` have been deprecated. Use '.astype(object)' instead (:issue:`18572`)
+- ``Series.asobject``, ``DatetimeIndex.asobject``, ``PeriodIndex.asobject`` and ``TimeDeltaIndex.asobject`` have been deprecated. Use ``.astype(object)`` instead (:issue:`18572`)
 
 .. _whatsnew_0220.prior_deprecations:
 
@@ -237,8 +211,8 @@ Removal of prior version deprecations/changes
 - The ``SparseList`` class has been removed (:issue:`14007`)
 - The ``pandas.io.wb`` and ``pandas.io.data`` stub modules have been removed (:issue:`13735`)
 - ``Categorical.from_array`` has been removed (:issue:`13854`)
-- The ``freq`` parameter has been removed from the ``rolling``/``expanding``/``ewm`` methods of DataFrame
-  and Series (deprecated since v0.18). Instead, resample before calling the methods. (:issue:18601)
+- The ``freq`` and ``how`` parameters have been removed from the ``rolling``/``expanding``/``ewm`` methods of DataFrame
+  and Series (deprecated since v0.18). Instead, resample before calling the methods. (:issue:18601 & :issue:18668)
 - ``DatetimeIndex.to_datetime``, ``Timestamp.to_datetime``, ``PeriodIndex.to_datetime``, and ``Index.to_datetime`` have been removed (:issue:`8254`, :issue:`14096`, :issue:`14113`)
 
 .. _whatsnew_0220.performance:
@@ -254,7 +228,7 @@ Performance Improvements
 - The overriden ``Timedelta`` properties of days, seconds and microseconds have been removed, leveraging their built-in Python versions instead (:issue:`18242`)
 - ``Series`` construction will reduce the number of copies made of the input data in certain cases (:issue:`17449`)
 - Improved performance of :func:`Series.dt.date` and :func:`DatetimeIndex.date` (:issue:`18058`)
-- Improved performance of ``IntervalIndex.symmetric_difference()`` (:issue:`18475`)
+- Improved performance of :func:`IntervalIndex.symmetric_difference()` (:issue:`18475`)
 - Improved performance of ``DatetimeIndex`` and ``Series`` arithmetic operations with Business-Month and Business-Quarter frequencies (:issue:`18489`)
 
 .. _whatsnew_0220.docs:
@@ -275,7 +249,7 @@ Bug Fixes
 Conversion
 ^^^^^^^^^^
 
-- Bug in :class:`Index` constructor with `dtype='uint64'` where int-like floats were not coerced to :class:`UInt64Index` (:issue:`18400`)
+- Bug in :class:`Index` constructor with ``dtype='uint64'`` where int-like floats were not coerced to :class:`UInt64Index` (:issue:`18400`)
 - Bug in the :class:`DataFrame` constructor in which data containing very large positive or very large negative numbers was causing ``OverflowError`` (:issue:`18584`)
 - Fixed a bug where creating a Series from an array that contains both tz-naive and tz-aware values will result in a Series whose dtype is tz-aware instead of object (:issue:`16406`)
 - Adding a ``Period`` object to a ``datetime`` or ``Timestamp`` object will now correctly raise a ``TypeError`` (:issue:`17983`)
@@ -287,13 +261,13 @@ Indexing
 
 - Bug in :func:`Series.truncate` which raises ``TypeError`` with a monotonic ``PeriodIndex`` (:issue:`17717`)
 - Bug in :func:`DataFrame.groupby` where tuples were interpreted as lists of keys rather than as keys (:issue:`17979`, :issue:`18249`)
-- Bug in :func:`MultiIndex.remove_unused_levels`` which would fill nan values (:issue:`18417`)
+- Bug in :func:`MultiIndex.remove_unused_levels` which would fill nan values (:issue:`18417`)
 - Bug in :func:`MultiIndex.from_tuples`` which would fail to take zipped tuples in python3 (:issue:`18434`)
-- Bug in :class:`Index`` construction from list of mixed type tuples (:issue:`18505`)
+- Bug in :class:`Index` construction from list of mixed type tuples (:issue:`18505`)
 - Bug in :class:`IntervalIndex` where empty and purely NA data was constructed inconsistently depending on the construction method (:issue:`18421`)
-- Bug in ``IntervalIndex.symmetric_difference()`` where the symmetric difference with a non-``IntervalIndex`` did not raise (:issue:`18475`)
+- Bug in :func:`IntervalIndex.symmetric_difference` where the symmetric difference with a non-``IntervalIndex`` did not raise (:issue:`18475`)
 - Bug in indexing a datetimelike ``Index`` that raised ``ValueError`` instead of ``IndexError`` (:issue:`18386`).
-
+- Bug in tz-aware :class:`DatetimeIndex` where addition/subtraction with a :class:`TimedeltaIndex` or array with ``dtype='timedelta64[ns]'`` was incorrect (:issue:`17558`)
 
 I/O
 ^^^
@@ -351,5 +325,5 @@ Categorical
 Other
 ^^^^^
 
-- Improved error message when attempting to use a Python keyword as an identifier in a numexpr query (:issue:`18221`)
--
+- Improved error message when attempting to use a Python keyword as an identifier in a ``numexpr`` backed query (:issue:`18221`)
+-
diff --git a/pandas/io/parquet.py b/pandas/io/parquet.py
diff --git a/pandas/tests/io/test_parquet.py b/pandas/tests/io/test_parquet.py

Original file line number	Diff line number	Diff line change
`@@ -4,4 +4,4 @@ source activate pandas`
`4`	`4`
`5`	`5`	`echo "install 27"`
`6`	`6`
`7`		`-conda install -n pandas -c conda-forge feather-format pyarrow=0.4.1 fastparquet`
	`7`	`+conda install -n pandas -c conda-forge feather-format pyarrow=0.7.0 fastparquet`