Default values for dropna to "False" (issue 9382) #9484

nickeubank · 2015-02-13T19:40:00Z

PLEASE REVIEW: This is my commit to a major project, and would appreciate a quick once over!

As per discussion in Issue 9382, changes all HDF functions from having default of dropping all rows with NA in all non-index rows.

closes #9382

PLEASE REVIEW: This is my commit to a major project, and would appreciate a quick once over! As per discussion in Issue 9382, changes all HDF functions from having default of dropping all rows with NA in all non-index rows.

jreback · 2015-02-13T21:48:40Z

all updates should be in this PR.

jreback · 2015-02-13T21:49:28Z

odd that no tests fail. pls come up with a tests that fails w/o your change and succeeds with it. I guess this was not tested.

nickeubank · 2015-02-13T22:07:28Z

OK, both on this branch.

Will look into testing -- I'm not really sure what that means, sorry.

nickeubank · 2015-02-13T22:13:16Z

Sorry Jeff, I think testing may be beyond my pay-grade. I thought I knew enough to contribute, but the testing wiki ( https://github.com/pydata/pandas/wiki/Testing ) is all greek to me. This is obviously not a big commit, so please feel free to just close the pull request and issue and ignore all this. Sorry. :/

bashtage · 2015-02-13T23:00:33Z

@nickeubank

Just write a function that matches the patters

def test_all_missing_values

That saves an reloads an array that would drop under the previous and won't drop now.

Test should be here:

https://github.com/pydata/pandas/blob/master/pandas/io/tests/test_pytables.py#L4680

nickeubank · 2015-02-13T23:04:38Z

Thanks @bashtage -- and where would I place this function?

And sounds like it needs to raise an AssertionError with descriptions for source of problem if it the calls lead to the rows being dropped when they shouldn't be?

bashtage · 2015-02-13T23:08:27Z

The test should have the followign properties:

The df should satisfy the condition that some rows woudl be dropped if saved with master
You should save the df with the code in your PR, using the default options for saving.
Verify that all rows are present using assert_frame_equal(loaded_df, original_df)

Item 3 would fail on master since the loaded df will be missing the all nan row in the original df. This will, however, not fail with your PR.

See my link about where your function goes. If you want to check that it passes, you can run

nosetests pandas.io.tests.test_pytables

Assuming you are using python setup.py develop to install your local pandas.

nickeubank · 2015-02-13T23:09:35Z

Ah, ok great. And do I just stick this into pytables.py, or is there a special place for these?

(Again, so sorry to need the hand holding!)

bashtage · 2015-02-13T23:10:35Z

Ah, ok great. And do I just stick this into pytables.py, or is there a special place for these?

pandas/io/tests/test_pytables.py

There is almost always a test folder and a corresponding file names test_xxx for every .py file.

nickeubank · 2015-02-13T23:10:49Z

Excellent, thanks. Will do now.

jreback · 2015-02-16T13:01:40Z

doc/source/whatsnew/v0.16.0.txt

+ Previously,
+
+  .. ipython:: python
+    In [1]: myFile = HDFStore('file.hdf')


show these examples with a DataFrame with a row that has all-nans, and one that doesnt.

don't use camel case.

use to_hdf/read_hdf (because for example you are not closing the store)

this needs to be a code-block

nickeubank · 2015-02-17T17:14:38Z

OK -- I've made all the suggested updates, and I have a build failure. I'm not sure how to check if that's due to the test I added, but hopefully that's what we were looking for?

bashtage · 2015-02-17T19:53:47Z

Your code. pd is not imported. Just use read_hdf directly.

ERROR: test_all_missing_values (pandas.io.tests.test_pytables.TestHDFStore)
----------------------------------------------------------------------

Traceback (most recent call last):

File "/home/travis/build/pydata/pandas/pandas/io/tests/test_pytables.py", line 4682, in test_all_missing_values

df_with_missing = pd.DataFrame({'col1':[np.nan]})

NameError: global name 'pd' is not defined

bashtage · 2015-02-17T19:56:11Z

You should probably also include a test with a slightly more complicated layout, e.g. with a full row, a row with some mulls and a row with all null.

nickeubank · 2015-02-17T20:05:35Z

@bashtage - thanks. Just found the traceback in Travis -- will check next build myself!

jreback · 2015-02-18T00:44:48Z

doc/source/whatsnew/v0.16.0.txt

+     df_without_missing = pd.DataFrame({'col1':[0, -1, 2], 'col2':[1, -1, 3]})
+     df_without_missing.to_hdf('file.h5', 'df_without_missing')
+
+     print(pd.read_hdf('file.h5', 'df_with_missing'))


you don't need print statements, paste the actual ipython output (which will have numbered In/Outs)

jreback · 2015-05-09T16:00:02Z

closing pls reopen if/when updated

nickeubank · 2015-05-09T22:43:04Z

Sorry, forgot about this. Now updated and rebased. How do I reopen? Or do I need to start a new PR?

bashtage · 2015-06-02T17:54:42Z

Seems straightforward and ready.

nickeubank · 2015-06-03T00:09:00Z

@bashtage moved to #10097 -- updating for a @jreback comment now.

Default values for dropna to "False" (issue 9382)

9ab8c23

PLEASE REVIEW: This is my commit to a major project, and would appreciate a quick once over! As per discussion in Issue 9382, changes all HDF functions from having default of dropping all rows with NA in all non-index rows.

nickeubank mentioned this pull request Feb 13, 2015

Don't make dropping missing rows a default behavior for HDF append()? #9382

Closed

jreback added API Design IO HDF5 read_hdf, HDFStore labels Feb 13, 2015

Update v0.16.0.txt

1d7808c

jreback reviewed Feb 16, 2015
View reviewed changes

nickeubank added 2 commits February 16, 2015 19:19

Update v0.16.0.txt

66dfc6b

Test for change of default setting for dropna

3e2a718

nickeubank added 2 commits February 17, 2015 11:56

dropped pd. prefix for pandas operations.

de022a9

More complicated data frame object.

5f2eae8

jreback reviewed Feb 18, 2015
View reviewed changes

nickeubank and others added 4 commits February 17, 2015 16:54

add issue number in comment

892835b

Updated to reflect suggested changes by Jeff

137c4c0

resolved merge conflict in favor of master

e137c73

moved docs to whatsnew 16.1 from 16.0

1a119d2

jreback closed this May 9, 2015

nickeubank mentioned this pull request May 9, 2015

Default values for dropna to "False" (issue 9382) #10097

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Default values for dropna to "False" (issue 9382) #9484

Default values for dropna to "False" (issue 9382) #9484

nickeubank commented Feb 13, 2015

jreback commented Feb 13, 2015

jreback commented Feb 13, 2015

nickeubank commented Feb 13, 2015

nickeubank commented Feb 13, 2015

bashtage commented Feb 13, 2015

nickeubank commented Feb 13, 2015

bashtage commented Feb 13, 2015

nickeubank commented Feb 13, 2015

bashtage commented Feb 13, 2015

nickeubank commented Feb 13, 2015

jreback Feb 16, 2015

nickeubank commented Feb 17, 2015

bashtage commented Feb 17, 2015

bashtage commented Feb 17, 2015

nickeubank commented Feb 17, 2015

jreback Feb 18, 2015

jreback commented May 9, 2015

nickeubank commented May 9, 2015

bashtage commented Jun 2, 2015

nickeubank commented Jun 3, 2015

Default values for dropna to "False" (issue 9382) #9484

Default values for dropna to "False" (issue 9382) #9484

Conversation

nickeubank commented Feb 13, 2015

jreback commented Feb 13, 2015

jreback commented Feb 13, 2015

nickeubank commented Feb 13, 2015

nickeubank commented Feb 13, 2015

bashtage commented Feb 13, 2015

nickeubank commented Feb 13, 2015

bashtage commented Feb 13, 2015

nickeubank commented Feb 13, 2015

bashtage commented Feb 13, 2015

nickeubank commented Feb 13, 2015

jreback Feb 16, 2015

Choose a reason for hiding this comment

nickeubank commented Feb 17, 2015

bashtage commented Feb 17, 2015

bashtage commented Feb 17, 2015

nickeubank commented Feb 17, 2015

jreback Feb 18, 2015

Choose a reason for hiding this comment

jreback commented May 9, 2015

nickeubank commented May 9, 2015

bashtage commented Jun 2, 2015

nickeubank commented Jun 3, 2015