Skip to content

Add tests in methods.py #23261

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 9 commits into from
Oct 30, 2018
Merged

Conversation

thoo
Copy link
Contributor

@thoo thoo commented Oct 21, 2018

Closes #23241

@pep8speaks
Copy link

pep8speaks commented Oct 21, 2018

Hello @thoo! Thanks for updating the PR.

Comment last updated on October 21, 2018 at 04:29 Hours UTC

@@ -105,6 +105,13 @@ def test_factorize_equivalence(self, data_for_grouping, na_sentinel):
tm.assert_numpy_array_equal(l1, l2)
self.assert_extension_array_equal(u1, u2)

def test_fillna_copy(self, data_for_fillna):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do you need to make a new fixture for this? you can just use data and the fill value can be the first value of the non-nulls

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was working on fillna returns a copy when there are no missing value . data has np.nan values.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about you re-use data_missing and just do

arr = data.take([1, 1])

in the test to get an array with two valid values.

@jreback jreback added ExtensionArray Extending pandas with custom dtypes or arrays. Testing pandas testing functions or related to the test suite Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate labels Oct 23, 2018
filled_val = df.iloc[0, 0]

result = df.fillna(filled_val)
assert df.values.base is not result.values.base
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that asserting df.A._values is data_for_fillna is needed here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be good to test this for Series.fillna as well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the review. Sure I will. I just want to make sure I am on the right track.

@thoo thoo force-pushed the Add_ExtensionArray_tests branch from 3b95a39 to 00bca3f Compare October 25, 2018 04:12
@thoo
Copy link
Contributor Author

thoo commented Oct 25, 2018

@TomAugspurger I am not sure how to work on fillna raises when passed an array of the wrong length. Is it passing an array to fill like df.fillna(array)?? Dataframe and Series won't take array but ExtensionArray take array. Please let me know where I should start. Thanks.

@TomAugspurger
Copy link
Contributor

Something like

with tm.assert_raises_regex(ValueError, msg):
    data_missing.fillna(data_missing.take([1])

That will try filling a length-2 EA with a length-1 EA, which should raise.

@thoo thoo force-pushed the Add_ExtensionArray_tests branch from f8fe87a to 7ab9458 Compare October 25, 2018 18:06
@codecov
Copy link

codecov bot commented Oct 25, 2018

Codecov Report

Merging #23261 into master will increase coverage by <.01%.
The diff coverage is n/a.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #23261      +/-   ##
==========================================
+ Coverage   92.18%   92.18%   +<.01%     
==========================================
  Files         161      161              
  Lines       51184    51184              
==========================================
+ Hits        47185    47186       +1     
+ Misses       3999     3998       -1
Flag Coverage Δ
#multiple 90.62% <ø> (ø) ⬆️
#single 42.23% <ø> (ø) ⬆️
Impacted Files Coverage Δ
pandas/core/arrays/base.py 98.01% <0%> (+0.66%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update a2e5994...3760dce. Read the comment docs.

@thoo
Copy link
Contributor Author

thoo commented Oct 25, 2018

Something like

with tm.assert_raises_regex(ValueError, msg):
    data_missing.fillna(data_missing.take([1])

That will try filling a length-2 EA with a length-1 EA, which should raise.

This is not working on categories, sparse array and Interval as value" parameter must be a scalar, dict or Series,. Should I skip these tests? I have four of them failing. Thanks.

@TomAugspurger
Copy link
Contributor

Yes, I think those don't implement fillna(array). Check for other skips in those files.

result = df.fillna(filled_val)
assert df.values.base is not result.values.base

if isinstance(arr, pd.SparseArray):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rather than write like this, I would just override these tests for SparseArray

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

filled_val = df.iloc[0, 0]
result = df.fillna(filled_val)

assert df.values.base is not result.values.base
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is correct. I believe that for DataFrames with EAs, df.values will all return a new ndarray, so this would be impossible to fail.

Rather, I think the best we can do is assert df.A.values is not result.A.values.

result = df.fillna(filled_val)

assert df.values.base is not result.values.base
assert df.A._values is arr
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can be removed I think. That's asserting something different.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done! Thanks.

thoo added 2 commits October 30, 2018 01:08
…y_tests

* repo_org/master: (52 commits)
  ENH: Allow rename_axis to specify index and columns arguments  (pandas-dev#20046)
  STY: proposed isort settings [ci skip] [skip ci] [ciskip] [skipci] (pandas-dev#23366)
  MAINT: Remove extraneous test.parquet file
  CLN: Follow-up comments to pandas-devgh-23392 (pandas-dev#23401)
  BUG GH23282 calling min on series of NaT returns NaT (pandas-dev#23289)
  unpin openpyxl (pandas-dev#23361)
  REF: collect ops dispatch functions in one place, try to de-duplicate SparseDataFrame methods (pandas-dev#23060)
  CLN: Remove pandas.tools module (pandas-dev#23376)
  CLN: Remove some dtype methods from API (pandas-dev#23390)
  CLN: Cleanup toplevel namespace shims (pandas-dev#23386)
  DOC: fixup whatsnew note for GH21394 (pandas-dev#23355)
  Fix import format at pandas/tests/extension directory (pandas-dev#23365)
  DOC: Remove Series.sortlevel from api.rst (pandas-dev#23395)
  API: Disallow dtypes w/o frequency when casting (pandas-dev#23392)
  BUG/TST/REF: Datetimelike Arithmetic Methods (pandas-dev#23215)
  STYLE: lint
  add np.nan* funcs to cython_table (pandas-dev#22109)
  Run Isort on tests/util single PR (pandas-dev#23347)
  BUG: Fix date_range overflow (pandas-dev#23345)
  Run Isort on tests/arrays single PR (pandas-dev#23346)
  ...
@jreback jreback added this to the 0.24.0 milestone Oct 30, 2018
@jreback jreback merged commit ea17df5 into pandas-dev:master Oct 30, 2018
@jreback
Copy link
Contributor

jreback commented Oct 30, 2018

thanks @thoo

tm9k1 pushed a commit to tm9k1/pandas that referenced this pull request Nov 19, 2018
@thoo thoo deleted the Add_ExtensionArray_tests branch January 2, 2019 20:26
Pingviinituutti pushed a commit to Pingviinituutti/pandas that referenced this pull request Feb 28, 2019
Pingviinituutti pushed a commit to Pingviinituutti/pandas that referenced this pull request Feb 28, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ExtensionArray Extending pandas with custom dtypes or arrays. Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate Testing pandas testing functions or related to the test suite
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants