Skip to content

Commit eecfa88

Browse files
committed
ENH: feather support in the pandas IO api
closes #13092 Author: Jeff Reback <[email protected]> Closes #14383 from jreback/feather and squashes the following commits: 3ede160 [Jeff Reback] ENH: feather support in the pandas IO api
1 parent 72786cc commit eecfa88

19 files changed

+348
-10
lines changed

appveyor.yml

+1
Original file line numberDiff line numberDiff line change
@@ -80,6 +80,7 @@ install:
8080
- cmd: conda config --set ssl_verify false
8181

8282
# add the pandas channel *before* defaults to have defaults take priority
83+
- cmd: conda config --add channels conda-forge
8384
- cmd: conda config --add channels pandas
8485
- cmd: conda config --remove channels defaults
8586
- cmd: conda config --add channels defaults

ci/install_travis.sh

+2-1
Original file line numberDiff line numberDiff line change
@@ -71,7 +71,8 @@ else
7171
conda config --set always_yes true --set changeps1 false || exit 1
7272
conda update -q conda
7373

74-
# add the pandas channel *before* defaults to have defaults take priority
74+
# add the pandas channel to take priority
75+
# to add extra packages
7576
echo "add channels"
7677
conda config --add channels pandas || exit 1
7778
conda config --remove channels defaults || exit 1

ci/requirements-2.7-64.run

+1-1
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,7 @@ pytz
33
numpy=1.10*
44
xlwt
55
numexpr
6-
pytables
6+
pytables==3.2.2
77
matplotlib
88
openpyxl
99
xlrd

ci/requirements-2.7.sh

+7
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,7 @@
1+
#!/bin/bash
2+
3+
source activate pandas
4+
5+
echo "install 27"
6+
7+
conda install -n pandas -c conda-forge feather-format

ci/requirements-3.5-64.run

+2-1
Original file line numberDiff line numberDiff line change
@@ -1,11 +1,12 @@
11
python-dateutil
22
pytz
3-
numpy=1.10*
3+
numpy
44
openpyxl
55
xlsxwriter
66
xlrd
77
xlwt
88
scipy
9+
feather-format
910
numexpr
1011
pytables
1112
matplotlib

ci/requirements-3.5.run

+1-3
Original file line numberDiff line numberDiff line change
@@ -18,6 +18,4 @@ pymysql
1818
psycopg2
1919
xarray
2020
s3fs
21-
22-
# incompat with conda ATM
23-
# beautiful-soup
21+
beautifulsoup4

ci/requirements-3.5.sh

+7
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,7 @@
1+
#!/bin/bash
2+
3+
source activate pandas
4+
5+
echo "install 35"
6+
7+
conda install -n pandas -c conda-forge feather-format

ci/requirements-3.5_OSX.run

+1-3
Original file line numberDiff line numberDiff line change
@@ -13,6 +13,4 @@ jinja2
1313
bottleneck
1414
xarray
1515
s3fs
16-
17-
# incompat with conda ATM
18-
# beautiful-soup
16+
beautifulsoup4

ci/requirements-3.5_OSX.sh

+7
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,7 @@
1+
#!/bin/bash
2+
3+
source activate pandas
4+
5+
echo "install 35_OSX"
6+
7+
conda install -n pandas -c conda-forge feather-format

doc/source/api.rst

+9
Original file line numberDiff line numberDiff line change
@@ -83,6 +83,14 @@ HDFStore: PyTables (HDF5)
8383
HDFStore.get
8484
HDFStore.select
8585

86+
Feather
87+
~~~~~~~
88+
89+
.. autosummary::
90+
:toctree: generated/
91+
92+
read_feather
93+
8694
SAS
8795
~~~
8896

@@ -1015,6 +1023,7 @@ Serialization / IO / Conversion
10151023
DataFrame.to_excel
10161024
DataFrame.to_json
10171025
DataFrame.to_html
1026+
DataFrame.to_feather
10181027
DataFrame.to_latex
10191028
DataFrame.to_stata
10201029
DataFrame.to_msgpack

doc/source/install.rst

+1
Original file line numberDiff line numberDiff line change
@@ -247,6 +247,7 @@ Optional Dependencies
247247
* `SciPy <http://www.scipy.org>`__: miscellaneous statistical functions
248248
* `xarray <http://xarray.pydata.org>`__: pandas like handling for > 2 dims, needed for converting Panels to xarray objects. Version 0.7.0 or higher is recommended.
249249
* `PyTables <http://www.pytables.org>`__: necessary for HDF5-based storage. Version 3.0.0 or higher required, Version 3.2.1 or higher highly recommended.
250+
* `Feather Format <https://github.com/wesm/feather>`__: necessary for feather-based storage, version 0.3.1 or higher.
250251
* `SQLAlchemy <http://www.sqlalchemy.org>`__: for SQL database support. Version 0.8.1 or higher recommended. Besides SQLAlchemy, you also need a database specific driver. You can find an overview of supported drivers for each SQL dialect in the `SQLAlchemy docs <http://docs.sqlalchemy.org/en/latest/dialects/index.html>`__. Some common drivers are:
251252

252253
- `psycopg2 <http://initd.org/psycopg/>`__: for PostgreSQL

doc/source/io.rst

+64
Original file line numberDiff line numberDiff line change
@@ -34,6 +34,7 @@ object.
3434
* :ref:`read_csv<io.read_csv_table>`
3535
* :ref:`read_excel<io.excel_reader>`
3636
* :ref:`read_hdf<io.hdf5>`
37+
* :ref:`read_feather<io.feather>`
3738
* :ref:`read_sql<io.sql>`
3839
* :ref:`read_json<io.json_reader>`
3940
* :ref:`read_msgpack<io.msgpack>` (experimental)
@@ -49,6 +50,7 @@ The corresponding ``writer`` functions are object methods that are accessed like
4950
* :ref:`to_csv<io.store_in_csv>`
5051
* :ref:`to_excel<io.excel_writer>`
5152
* :ref:`to_hdf<io.hdf5>`
53+
* :ref:`to_feather<io.feather>`
5254
* :ref:`to_sql<io.sql>`
5355
* :ref:`to_json<io.json_writer>`
5456
* :ref:`to_msgpack<io.msgpack>` (experimental)
@@ -4152,6 +4154,68 @@ object). This cannot be changed after table creation.
41524154
os.remove('store.h5')
41534155
41544156
4157+
.. _io.feather:
4158+
4159+
Feather
4160+
-------
4161+
4162+
.. versionadded:: 0.20.0
4163+
4164+
Feather provides binary columnar serialization for data frames. It is designed to make reading and writing data
4165+
frames efficient, and to make sharing data across data analysis languages easy.
4166+
4167+
Feather is designed to faithfully serialize and de-serialize DataFrames, supporting all of the pandas
4168+
dtypes, including extension dtypes such as categorical and datetime with tz.
4169+
4170+
Several caveats.
4171+
4172+
- This is a newer library, and the format, though stable, is not guaranteed to be backward compatible
4173+
to the earlier versions.
4174+
- The format will NOT write an ``Index``, or ``MultiIndex`` for the ``DataFrame`` and will raise an
4175+
error if a non-default one is provided. You can simply ``.reset_index()`` in order to store the index.
4176+
- Duplicate column names and non-string columns names are not supported
4177+
- Non supported types include ``Period`` and actual python object types. These will raise a helpful error message
4178+
on an attempt at serialization.
4179+
4180+
See the `Full Documentation <https://github.com/wesm/feather>`__
4181+
4182+
.. ipython:: python
4183+
4184+
df = pd.DataFrame({'a': list('abc'),
4185+
'b': list(range(1, 4)),
4186+
'c': np.arange(3, 6).astype('u1'),
4187+
'd': np.arange(4.0, 7.0, dtype='float64'),
4188+
'e': [True, False, True],
4189+
'f': pd.Categorical(list('abc')),
4190+
'g': pd.date_range('20130101', periods=3),
4191+
'h': pd.date_range('20130101', periods=3, tz='US/Eastern'),
4192+
'i': pd.date_range('20130101', periods=3, freq='ns')})
4193+
4194+
df
4195+
df.dtypes
4196+
4197+
Write to a feather file.
4198+
4199+
.. ipython:: python
4200+
4201+
df.to_feather('example.fth)
4202+
4203+
Read from a feather file.
4204+
4205+
.. ipython:: python
4206+
4207+
result = pd.read_feather('example.fth')
4208+
result
4209+
4210+
# we preserve dtypes
4211+
result.dtypes
4212+
4213+
.. ipython:: python
4214+
:suppress:
4215+
4216+
import os
4217+
os.remove('example.fth')
4218+
41554219
.. _io.sql:
41564220
41574221
SQL Queries

doc/source/whatsnew/v0.20.0.txt

+3
Original file line numberDiff line numberDiff line change
@@ -22,6 +22,9 @@ Check the :ref:`API Changes <whatsnew_0200.api_breaking>` and :ref:`deprecations
2222
New features
2323
~~~~~~~~~~~~
2424

25+
- Integration with the ``feather-format``, including a new top-level ``pd.read_feather()`` and ``DataFrame.to_feather()`` method, see :ref:`here <io.feather>`.
26+
27+
2528

2629
.. _whatsnew_0200.enhancements.dataio_dtype:
2730

pandas/api/tests/test_api.py

+1-1
Original file line numberDiff line numberDiff line change
@@ -95,7 +95,7 @@ class TestPDApi(Base, tm.TestCase):
9595
'read_gbq', 'read_hdf', 'read_html', 'read_json',
9696
'read_msgpack', 'read_pickle', 'read_sas', 'read_sql',
9797
'read_sql_query', 'read_sql_table', 'read_stata',
98-
'read_table']
98+
'read_table', 'read_feather']
9999

100100
# top-level to_* funcs
101101
funcs_to = ['to_datetime', 'to_msgpack',

pandas/core/frame.py

+15
Original file line numberDiff line numberDiff line change
@@ -1477,6 +1477,21 @@ def to_stata(self, fname, convert_dates=None, write_index=True,
14771477
variable_labels=variable_labels)
14781478
writer.write_file()
14791479

1480+
def to_feather(self, fname):
1481+
"""
1482+
write out the binary feather-format for DataFrames
1483+
1484+
.. versionadded:: 0.20.0
1485+
1486+
Parameters
1487+
----------
1488+
fname : str
1489+
string file path
1490+
1491+
"""
1492+
from pandas.io.feather_format import to_feather
1493+
to_feather(self, fname)
1494+
14801495
@Appender(fmt.docstring_to_string, indents=1)
14811496
def to_string(self, buf=None, columns=None, col_space=None, header=True,
14821497
index=True, na_rep='NaN', formatters=None, float_format=None,

pandas/io/api.py

+1
Original file line numberDiff line numberDiff line change
@@ -12,6 +12,7 @@
1212
from pandas.io.html import read_html
1313
from pandas.io.sql import read_sql, read_sql_table, read_sql_query
1414
from pandas.io.sas.sasreader import read_sas
15+
from pandas.io.feather_format import read_feather
1516
from pandas.io.stata import read_stata
1617
from pandas.io.pickle import read_pickle, to_pickle
1718
from pandas.io.packers import read_msgpack, to_msgpack

pandas/io/feather_format.py

+101
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,101 @@
1+
""" feather-format compat """
2+
3+
from distutils.version import LooseVersion
4+
from pandas import DataFrame, RangeIndex, Int64Index
5+
from pandas.compat import range
6+
7+
8+
def _try_import():
9+
# since pandas is a dependency of feather
10+
# we need to import on first use
11+
12+
try:
13+
import feather
14+
except ImportError:
15+
16+
# give a nice error message
17+
raise ImportError("the feather-format library is not installed\n"
18+
"you can install via conda\n"
19+
"conda install feather-format -c conda-forge\n"
20+
"or via pip\n"
21+
"pip install feather-format\n")
22+
23+
try:
24+
feather.__version__ >= LooseVersion('0.3.1')
25+
except AttributeError:
26+
raise ImportError("the feather-format library must be >= "
27+
"version 0.3.1\n"
28+
"you can install via conda\n"
29+
"conda install feather-format -c conda-forge"
30+
"or via pip\n"
31+
"pip install feather-format\n")
32+
33+
return feather
34+
35+
36+
def to_feather(df, path):
37+
"""
38+
Write a DataFrame to the feather-format
39+
40+
Parameters
41+
----------
42+
df : DataFrame
43+
path : string
44+
File path
45+
"""
46+
if not isinstance(df, DataFrame):
47+
raise ValueError("feather only support IO with DataFrames")
48+
49+
feather = _try_import()
50+
valid_types = {'string', 'unicode'}
51+
52+
# validate index
53+
# --------------
54+
55+
# validate that we have only a default index
56+
# raise on anything else as we don't serialize the index
57+
58+
if not isinstance(df.index, Int64Index):
59+
raise ValueError("feather does not serializing {} "
60+
"for the index; you can .reset_index()"
61+
"to make the index into column(s)".format(
62+
type(df.index)))
63+
64+
if not df.index.equals(RangeIndex.from_range(range(len(df)))):
65+
raise ValueError("feather does not serializing a non-default index "
66+
"for the index; you can .reset_index()"
67+
"to make the index into column(s)")
68+
69+
if df.index.name is not None:
70+
raise ValueError("feather does not serialize index meta-data on a "
71+
"default index")
72+
73+
# validate columns
74+
# ----------------
75+
76+
# must have value column names (strings only)
77+
if df.columns.inferred_type not in valid_types:
78+
raise ValueError("feather must have string column names")
79+
80+
feather.write_dataframe(df, path)
81+
82+
83+
def read_feather(path):
84+
"""
85+
Load a feather-format object from the file path
86+
87+
.. versionadded 0.20.0
88+
89+
Parameters
90+
----------
91+
path : string
92+
File path
93+
94+
Returns
95+
-------
96+
type of object stored in file
97+
98+
"""
99+
100+
feather = _try_import()
101+
return feather.read_dataframe(path)

0 commit comments

Comments
 (0)