-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
ENH: feather support in the pandas IO api #14383
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -3,7 +3,7 @@ pytz | |
numpy=1.10* | ||
xlwt | ||
numexpr | ||
pytables | ||
pytables==3.2.2 | ||
matplotlib | ||
openpyxl | ||
xlrd | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,7 @@ | ||
#!/bin/bash | ||
|
||
source activate pandas | ||
|
||
echo "install 27" | ||
|
||
conda install -n pandas -c conda-forge feather-format |
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -18,6 +18,4 @@ pymysql | |
psycopg2 | ||
xarray | ||
s3fs | ||
|
||
# incompat with conda ATM | ||
# beautiful-soup | ||
beautifulsoup4 |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,7 @@ | ||
#!/bin/bash | ||
|
||
source activate pandas | ||
|
||
echo "install 35" | ||
|
||
conda install -n pandas -c conda-forge feather-format |
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -13,6 +13,4 @@ jinja2 | |
bottleneck | ||
xarray | ||
s3fs | ||
|
||
# incompat with conda ATM | ||
# beautiful-soup | ||
beautifulsoup4 |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,7 @@ | ||
#!/bin/bash | ||
|
||
source activate pandas | ||
|
||
echo "install 35_OSX" | ||
|
||
conda install -n pandas -c conda-forge feather-format |
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -34,6 +34,7 @@ object. | |
* :ref:`read_csv<io.read_csv_table>` | ||
* :ref:`read_excel<io.excel_reader>` | ||
* :ref:`read_hdf<io.hdf5>` | ||
* :ref:`read_feather<io.feather>` | ||
* :ref:`read_sql<io.sql>` | ||
* :ref:`read_json<io.json_reader>` | ||
* :ref:`read_msgpack<io.msgpack>` (experimental) | ||
|
@@ -49,6 +50,7 @@ The corresponding ``writer`` functions are object methods that are accessed like | |
* :ref:`to_csv<io.store_in_csv>` | ||
* :ref:`to_excel<io.excel_writer>` | ||
* :ref:`to_hdf<io.hdf5>` | ||
* :ref:`to_feather<io.feather>` | ||
* :ref:`to_sql<io.sql>` | ||
* :ref:`to_json<io.json_writer>` | ||
* :ref:`to_msgpack<io.msgpack>` (experimental) | ||
|
@@ -4152,6 +4154,68 @@ object). This cannot be changed after table creation. | |
os.remove('store.h5') | ||
|
||
|
||
.. _io.feather: | ||
|
||
Feather | ||
------- | ||
|
||
.. versionadded:: 0.20.0 | ||
|
||
Feather provides binary columnar serialization for data frames. It is designed to make reading and writing data | ||
frames efficient, and to make sharing data across data analysis languages easy. | ||
|
||
Feather is designed to faithfully serialize and de-serialize DataFrames, supporting all of the pandas | ||
dtypes, including extension dtypes such as categorical and datetime with tz. | ||
|
||
Several caveats. | ||
|
||
- This is a newer library, and the format, though stable, is not guaranteed to be backward compatible | ||
to the earlier versions. | ||
- The format will NOT write an ``Index``, or ``MultiIndex`` for the ``DataFrame`` and will raise an | ||
error if a non-default one is provided. You can simply ``.reset_index()`` in order to store the index. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Additional point: Non-string column names ? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. and duplicate column names There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. yep, these are raised automaticaly by feather now (as of 3.1) |
||
- Duplicate column names and non-string columns names are not supported | ||
- Non supported types include ``Period`` and actual python object types. These will raise a helpful error message | ||
on an attempt at serialization. | ||
|
||
See the `Full Documentation <https://github.com/wesm/feather>`__ | ||
|
||
.. ipython:: python | ||
|
||
df = pd.DataFrame({'a': list('abc'), | ||
'b': list(range(1, 4)), | ||
'c': np.arange(3, 6).astype('u1'), | ||
'd': np.arange(4.0, 7.0, dtype='float64'), | ||
'e': [True, False, True], | ||
'f': pd.Categorical(list('abc')), | ||
'g': pd.date_range('20130101', periods=3), | ||
'h': pd.date_range('20130101', periods=3, tz='US/Eastern'), | ||
'i': pd.date_range('20130101', periods=3, freq='ns')}) | ||
|
||
df | ||
df.dtypes | ||
|
||
Write to a feather file. | ||
|
||
.. ipython:: python | ||
|
||
df.to_feather('example.fth) | ||
|
||
Read from a feather file. | ||
|
||
.. ipython:: python | ||
|
||
result = pd.read_feather('example.fth') | ||
result | ||
|
||
# we preserve dtypes | ||
result.dtypes | ||
|
||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Maybe also show the dtypes? (so you see it is preserverd automatically) |
||
.. ipython:: python | ||
:suppress: | ||
|
||
import os | ||
os.remove('example.fth') | ||
|
||
.. _io.sql: | ||
|
||
SQL Queries | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,101 @@ | ||
""" feather-format compat """ | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Very minor, but can we call this file just There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. ah, missed that the package is imported like that, confused by the feather-format package name There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. yeah that is an annoying 'feature' in python! |
||
|
||
from distutils.version import LooseVersion | ||
from pandas import DataFrame, RangeIndex, Int64Index | ||
from pandas.compat import range | ||
|
||
|
||
def _try_import(): | ||
# since pandas is a dependency of feather | ||
# we need to import on first use | ||
|
||
try: | ||
import feather | ||
except ImportError: | ||
|
||
# give a nice error message | ||
raise ImportError("the feather-format library is not installed\n" | ||
"you can install via conda\n" | ||
"conda install feather-format -c conda-forge\n" | ||
"or via pip\n" | ||
"pip install feather-format\n") | ||
|
||
try: | ||
feather.__version__ >= LooseVersion('0.3.1') | ||
except AttributeError: | ||
raise ImportError("the feather-format library must be >= " | ||
"version 0.3.1\n" | ||
"you can install via conda\n" | ||
"conda install feather-format -c conda-forge" | ||
"or via pip\n" | ||
"pip install feather-format\n") | ||
|
||
return feather | ||
|
||
|
||
def to_feather(df, path): | ||
""" | ||
Write a DataFrame to the feather-format | ||
|
||
Parameters | ||
---------- | ||
df : DataFrame | ||
path : string | ||
File path | ||
""" | ||
if not isinstance(df, DataFrame): | ||
raise ValueError("feather only support IO with DataFrames") | ||
|
||
feather = _try_import() | ||
valid_types = {'string', 'unicode'} | ||
|
||
# validate index | ||
# -------------- | ||
|
||
# validate that we have only a default index | ||
# raise on anything else as we don't serialize the index | ||
|
||
if not isinstance(df.index, Int64Index): | ||
raise ValueError("feather does not serializing {} " | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. either "does not support serializing" or "does not serialize" |
||
"for the index; you can .reset_index()" | ||
"to make the index into column(s)".format( | ||
type(df.index))) | ||
|
||
if not df.index.equals(RangeIndex.from_range(range(len(df)))): | ||
raise ValueError("feather does not serializing a non-default index " | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. the same here |
||
"for the index; you can .reset_index()" | ||
"to make the index into column(s)") | ||
|
||
if df.index.name is not None: | ||
raise ValueError("feather does not serialize index meta-data on a " | ||
"default index") | ||
|
||
# validate columns | ||
# ---------------- | ||
|
||
# must have value column names (strings only) | ||
if df.columns.inferred_type not in valid_types: | ||
raise ValueError("feather must have string column names") | ||
|
||
feather.write_dataframe(df, path) | ||
|
||
|
||
def read_feather(path): | ||
""" | ||
Load a feather-format object from the file path | ||
|
||
.. versionadded 0.20.0 | ||
|
||
Parameters | ||
---------- | ||
path : string | ||
File path | ||
|
||
Returns | ||
------- | ||
type of object stored in file | ||
|
||
""" | ||
|
||
feather = _try_import() | ||
return feather.read_dataframe(path) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also
to_feather
in the dataframe section?