Skip to content

Commit ebcf0c1

Browse files
committed
ENH: feather support in the pandas IO api
closes pandas-dev#13092
1 parent 33e11ad commit ebcf0c1

16 files changed

+306
-2
lines changed

appveyor.yml

+1
Original file line numberDiff line numberDiff line change
@@ -81,6 +81,7 @@ install:
8181

8282
# add the pandas channel *before* defaults to have defaults take priority
8383
- cmd: conda config --add channels pandas
84+
- cmd: conda config --add channels conda-forge
8485
- cmd: conda config --remove channels defaults
8586
- cmd: conda config --add channels defaults
8687
- cmd: conda install anaconda-client

ci/install_travis.sh

+4-1
Original file line numberDiff line numberDiff line change
@@ -71,8 +71,11 @@ else
7171
conda config --set always_yes true --set changeps1 false || exit 1
7272
conda update -q conda
7373

74-
# add the pandas channel *before* defaults to have defaults take priority
74+
# add the pandas channel to take priority
75+
# add the conda-forge channel *before* defaults
76+
# to add extra packages
7577
echo "add channels"
78+
conda config --add channels conda-forge || exit 1
7679
conda config --add channels pandas || exit 1
7780
conda config --remove channels defaults || exit 1
7881
conda config --add channels defaults || exit 1

ci/requirements-2.7.run

+1
Original file line numberDiff line numberDiff line change
@@ -9,6 +9,7 @@ openpyxl=1.6.2
99
xlrd=0.9.2
1010
sqlalchemy=0.9.6
1111
lxml=3.2.1
12+
feather-format
1213
scipy
1314
xlsxwriter=0.4.6
1415
boto=2.36.0

ci/requirements-3.5-64.run

+1
Original file line numberDiff line numberDiff line change
@@ -6,6 +6,7 @@ xlsxwriter
66
xlrd
77
xlwt
88
scipy
9+
feather-format
910
numexpr
1011
pytables
1112
matplotlib

ci/requirements-3.5.run

+1
Original file line numberDiff line numberDiff line change
@@ -9,6 +9,7 @@ scipy
99
numexpr
1010
pytables
1111
html5lib
12+
feather-format
1213
lxml
1314
matplotlib
1415
jinja2

ci/requirements-3.5_OSX.run

+1
Original file line numberDiff line numberDiff line change
@@ -5,6 +5,7 @@ xlsxwriter
55
xlrd
66
xlwt
77
numexpr
8+
feather-format
89
pytables
910
html5lib
1011
lxml

doc/source/api.rst

+8
Original file line numberDiff line numberDiff line change
@@ -83,6 +83,14 @@ HDFStore: PyTables (HDF5)
8383
HDFStore.get
8484
HDFStore.select
8585

86+
Feather
87+
~~~~~~~
88+
89+
.. autosummary::
90+
:toctree: generated/
91+
92+
read_feather
93+
8694
SAS
8795
~~~
8896

doc/source/install.rst

+1
Original file line numberDiff line numberDiff line change
@@ -247,6 +247,7 @@ Optional Dependencies
247247
* `SciPy <http://www.scipy.org>`__: miscellaneous statistical functions
248248
* `xarray <http://xarray.pydata.org>`__: pandas like handling for > 2 dims, needed for converting Panels to xarray objects. Version 0.7.0 or higher is recommended.
249249
* `PyTables <http://www.pytables.org>`__: necessary for HDF5-based storage. Version 3.0.0 or higher required, Version 3.2.1 or higher highly recommended.
250+
* `Feather Format <https://github.com/wesm/feather>`__: necessary for feather-based storage.
250251
* `SQLAlchemy <http://www.sqlalchemy.org>`__: for SQL database support. Version 0.8.1 or higher recommended. Besides SQLAlchemy, you also need a database specific driver. You can find an overview of supported drivers for each SQL dialect in the `SQLAlchemy docs <http://docs.sqlalchemy.org/en/latest/dialects/index.html>`__. Some common drivers are:
251252

252253
- `psycopg2 <http://initd.org/psycopg/>`__: for PostgreSQL

doc/source/io.rst

+59
Original file line numberDiff line numberDiff line change
@@ -34,6 +34,7 @@ object.
3434
* :ref:`read_csv<io.read_csv_table>`
3535
* :ref:`read_excel<io.excel_reader>`
3636
* :ref:`read_hdf<io.hdf5>`
37+
* :ref:`read_feather<io.feather>`
3738
* :ref:`read_sql<io.sql>`
3839
* :ref:`read_json<io.json_reader>`
3940
* :ref:`read_msgpack<io.msgpack>` (experimental)
@@ -49,6 +50,7 @@ The corresponding ``writer`` functions are object methods that are accessed like
4950
* :ref:`to_csv<io.store_in_csv>`
5051
* :ref:`to_excel<io.excel_writer>`
5152
* :ref:`to_hdf<io.hdf5>`
53+
* :ref:`to_feather<io.feather>`
5254
* :ref:`to_sql<io.sql>`
5355
* :ref:`to_json<io.json_writer>`
5456
* :ref:`to_msgpack<io.msgpack>` (experimental)
@@ -4135,6 +4137,63 @@ object). This cannot be changed after table creation.
41354137
os.remove('store.h5')
41364138
41374139
4140+
.. _io.feather:
4141+
4142+
Feather
4143+
-------
4144+
4145+
.. versionadded:: 0.19.1
4146+
4147+
Feather provides binary columnar serialization for data frames. It is designed to make reading and writing data
4148+
frames efficient, and to make sharing data across data analysis languages easy.
4149+
4150+
Feather is designed to faithfully serialize and de-serialize DataFrames, supporting all of the pandas
4151+
dtypes, including extension dtypes such as categorical and datetime with tz.
4152+
4153+
Several caveats.
4154+
4155+
- This is a newer library, and the format, though stable, is not guaranteed to be backward compatible
4156+
to the earlier versions.
4157+
- The format will NOT write an ``Index``, or ``MultiIndex`` for the ``DataFrame`` and will raise an
4158+
error if a non-default one is provided. You can simply ``.reset_index()`` in order to store the index.
4159+
- Non supported types include ``Period`` and actual python object types. These will raise a helpful error message
4160+
on an attempt at serialization.
4161+
4162+
See the `Full Documentation <https://github.com/wesm/feather>`__
4163+
4164+
.. ipython:: python
4165+
4166+
df = pd.DataFrame({'a': list('abc'),
4167+
'b': list(range(1, 4)),
4168+
'c': np.arange(3, 6).astype('u1'),
4169+
'd': np.arange(4.0, 7.0, dtype='float64'),
4170+
'e': [True, False, True],
4171+
'f': pd.Categorical(list('abc')),
4172+
'g': pd.date_range('20130101', periods=3),
4173+
'h': pd.date_range('20130101', periods=3, tz='US/Eastern'),
4174+
'g': pd.date_range('20130101', periods=3, freq='ns')})
4175+
4176+
df
4177+
df.dtypes
4178+
4179+
Write to a feather file.
4180+
4181+
.. ipython:: python
4182+
4183+
df.to_feather('example.fth)
4184+
4185+
Read from a feather file.
4186+
4187+
.. ipython:: python
4188+
4189+
pd.read_feather('example.fth')
4190+
4191+
.. ipython:: python
4192+
:suppress:
4193+
4194+
import os
4195+
os.remove('example.fth')
4196+
41384197
.. _io.sql:
41394198
41404199
SQL Queries

doc/source/whatsnew/v0.20.0.txt

+3
Original file line numberDiff line numberDiff line change
@@ -21,6 +21,9 @@ Check the :ref:`API Changes <whatsnew_0200.api_breaking>` and :ref:`deprecations
2121

2222
New features
2323
~~~~~~~~~~~~
24+
k
25+
- Integration with the ``feather-format``, including a new top-level ``pd.read_feather()`` and ``DataFrame.to_feather()`` method, see :ref:`here <io.feather>`.
26+
2427

2528

2629
``dtype`` keyword for data io

pandas/api/tests/test_api.py

+1-1
Original file line numberDiff line numberDiff line change
@@ -95,7 +95,7 @@ class TestPDApi(Base, tm.TestCase):
9595
'read_gbq', 'read_hdf', 'read_html', 'read_json',
9696
'read_msgpack', 'read_pickle', 'read_sas', 'read_sql',
9797
'read_sql_query', 'read_sql_table', 'read_stata',
98-
'read_table']
98+
'read_table', 'read_feather']
9999

100100
# top-level to_* funcs
101101
funcs_to = ['to_datetime', 'to_msgpack',

pandas/core/frame.py

+15
Original file line numberDiff line numberDiff line change
@@ -1477,6 +1477,21 @@ def to_stata(self, fname, convert_dates=None, write_index=True,
14771477
variable_labels=variable_labels)
14781478
writer.write_file()
14791479

1480+
def to_feather(self, fname):
1481+
"""
1482+
write out the binary feather-format for DataFrames
1483+
1484+
.. versionadded:: 0.19.1
1485+
1486+
Parameters
1487+
----------
1488+
fname : str
1489+
string file path
1490+
1491+
"""
1492+
from pandas.io.feather_format import to_feather
1493+
to_feather(self, fname)
1494+
14801495
@Appender(fmt.docstring_to_string, indents=1)
14811496
def to_string(self, buf=None, columns=None, col_space=None, header=True,
14821497
index=True, na_rep='NaN', formatters=None, float_format=None,

pandas/io/api.py

+1
Original file line numberDiff line numberDiff line change
@@ -12,6 +12,7 @@
1212
from pandas.io.html import read_html
1313
from pandas.io.sql import read_sql, read_sql_table, read_sql_query
1414
from pandas.io.sas.sasreader import read_sas
15+
from pandas.io.feather_format import read_feather
1516
from pandas.io.stata import read_stata
1617
from pandas.io.pickle import read_pickle, to_pickle
1718
from pandas.io.packers import read_msgpack, to_msgpack

pandas/io/feather_format.py

+90
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,90 @@
1+
""" feather-format compat """
2+
3+
from distutils.version import LooseVersion
4+
from pandas import DataFrame, RangeIndex, Int64Index
5+
from pandas.compat import range
6+
7+
8+
def _try_import():
9+
# since pandas is a dependency of feather
10+
# we need to import on first use
11+
12+
try:
13+
import feather
14+
except ImportError:
15+
16+
# give a nice error message
17+
raise ImportError("the feather-format library is not installed\n"
18+
"you can install via conda\n"
19+
"conda install feather-format -c conda-forge")
20+
21+
try:
22+
feather.__version__ >= LooseVersion('0.3.1')
23+
except AttributeError:
24+
raise ImportError("the feather-format library must be >= "
25+
"version 0.3.1\n"
26+
"you can install via conda\n"
27+
"conda install feather-format -c conda-forge")
28+
29+
return feather
30+
31+
32+
def to_feather(df, path):
33+
"""
34+
Write a DataFrame to the feather-format
35+
36+
Parameters
37+
----------
38+
df : DataFrame
39+
path : string
40+
File path
41+
"""
42+
if not isinstance(df, DataFrame):
43+
raise ValueError("feather only support IO with DataFrames")
44+
45+
feather = _try_import()
46+
valid_types = {'string', 'unicode'}
47+
48+
# validate index
49+
# --------------
50+
51+
# validate that we have only a default index
52+
# raise on anything else as we don't serialize the index
53+
54+
if not isinstance(df.index, (RangeIndex, Int64Index)):
55+
raise ValueError("feather does not serializing {} "
56+
"for the index; you can .reset_index()"
57+
"to make the index into column(s)".format(
58+
type(df.index)))
59+
60+
if not df.index.equals(RangeIndex.from_range(range(len(df)))):
61+
raise ValueError("feather does not serializing a non-default index "
62+
"for the index; you can .reset_index()"
63+
"to make the index into column(s)")
64+
65+
# validate columns
66+
# ----------------
67+
68+
# must have value column names (strings only)
69+
if df.columns.inferred_type not in valid_types:
70+
raise ValueError("feather must have string column names")
71+
72+
feather.write_dataframe(df, path)
73+
74+
75+
def read_feather(path):
76+
"""
77+
Load a feather-format object from the file path
78+
79+
Parameters
80+
----------
81+
path : string
82+
File path
83+
84+
Returns
85+
-------
86+
type of object stored in file
87+
"""
88+
89+
feather = _try_import()
90+
return feather.read_dataframe(path)

0 commit comments

Comments
 (0)