Skip to content

Default values for dropna to "False" (issue 9382) #9484

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 10 commits into from
27 changes: 27 additions & 0 deletions doc/source/whatsnew/v0.16.0.txt
Original file line number Diff line number Diff line change
Expand Up @@ -130,6 +130,33 @@ methods (:issue:`9088`).
d 7
dtype: int64

- default behavior for HDF write functions is now to keep rows that are all missing except for index. (:issue:`9382`)

Previously,

.. ipython:: python
In [1]: myFile = HDFStore('file.hdf')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • show these examples with a DataFrame with a row that has all-nans, and one that doesnt.
  • don't use camel case.
  • use to_hdf/read_hdf (because for example you are not closing the store)
  • this needs to be a code-block

seriesWithMissingRow = pd.Series([0, np.nan, 2], index = ['user1', 'user2', 'user3'])
myFile.append('fileKey', seriesWithMissingRow, append = False)
myFile['fileKey']

Out[1]:
user1 0
user3 2
dtype: float64

New behavior:
.. ipython:: python
In [2]: myFile = HDFStore('file.hdf')
seriesWithMissingRow = pd.Series([0, np.nan, 2], index = ['user1', 'user2', 'user3'])
myFile.append('fileKey', seriesWithMissingRow, append = False)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should mirror the above example, but don't use the ipython prompts, this should be runnable code

myFile['fileKey']

Out[2]:
user1 0
user2 NaN
user3 2
dtype: float64


Deprecations
Expand Down
18 changes: 9 additions & 9 deletions pandas/io/pytables.py
Original file line number Diff line number Diff line change
Expand Up @@ -219,7 +219,7 @@ class DuplicateWarning(Warning):
"""

with config.config_prefix('io.hdf'):
config.register_option('dropna_table', True, dropna_doc,
config.register_option('dropna_table', False, dropna_doc,
validator=config.is_bool)
config.register_option(
'default_format', None, format_doc,
Expand Down Expand Up @@ -801,8 +801,8 @@ def put(self, key, value, format=None, append=False, **kwargs):
This will force Table format, append the input data to the
existing.
encoding : default None, provide an encoding for strings
dropna : boolean, default True, do not write an ALL nan row to
the store settable by the option 'io.hdf.dropna_table'
dropna : boolean, default False. if True do not write an ALL nan
row to the store. Settable by the option 'io.hdf.dropna_table'
"""
if format is None:
format = get_option("io.hdf.default_format") or 'fixed'
Expand Down Expand Up @@ -883,8 +883,8 @@ def append(self, key, value, format=None, append=True, columns=None,
chunksize : size to chunk the writing
expectedrows : expected TOTAL row size of this table
encoding : default None, provide an encoding for strings
dropna : boolean, default True, do not write an ALL nan row to
the store settable by the option 'io.hdf.dropna_table'
dropna : boolean, default False. If true, do not write an ALL nan
row to the store. settable by the option 'io.hdf.dropna_table'
Notes
-----
Does *not* check if data being appended overlaps with existing
Expand All @@ -903,7 +903,7 @@ def append(self, key, value, format=None, append=True, columns=None,
**kwargs)

def append_to_multiple(self, d, value, selector, data_columns=None,
axes=None, dropna=True, **kwargs):
axes=None, dropna=False, **kwargs):
"""
Append to multiple tables

Expand All @@ -918,7 +918,7 @@ def append_to_multiple(self, d, value, selector, data_columns=None,
data_columns : list of columns to create as data columns, or True to
use all columns
dropna : if evaluates to True, drop rows from all tables if any single
row in each table has all NaN
row in each table has all NaN. Default False.

Notes
-----
Expand Down Expand Up @@ -3740,7 +3740,7 @@ class AppendableTable(LegacyTable):

def write(self, obj, axes=None, append=False, complib=None,
complevel=None, fletcher32=None, min_itemsize=None,
chunksize=None, expectedrows=None, dropna=True, **kwargs):
chunksize=None, expectedrows=None, dropna=False, **kwargs):

if not append and self.is_exists:
self._handle.remove_node(self.group, 'table')
Expand Down Expand Up @@ -3777,7 +3777,7 @@ def write(self, obj, axes=None, append=False, complib=None,
# add the rows
self.write_data(chunksize, dropna=dropna)

def write_data(self, chunksize, dropna=True):
def write_data(self, chunksize, dropna=False):
""" we form the data into a 2-d including indexes,values,mask
write chunk-by-chunk """

Expand Down