-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
Small improvements to str_cat #12000
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Closed
Closed
Changes from 4 commits
Commits
Show all changes
14 commits
Select commit
Hold shift + click to select a range
6eb9a54
ENH DOC added some new doc examples to str_cat and catch an error to …
d15c0f4
ENH DOC added some new doc examples to str_cat and catch an error to …
d544301
Merge branch 'str-cat-error' of https://github.com/hack-c/pandas into…
311dd03
TST added test to make sure improved error fires
1109211
add comment with issue #
81dfb28
Merge pull request #12111 from jreback/matrix
jreback 79c5a7f
CLN: cleaned RangeIndex._min_fitting_element
kawochen 2842a43
CLN: fix flake8 warnings in pandas/stats
wesm db0094d
CLN: grab bag of flake8 fixes
wesm 8a98ff8
ENH DOC added some new doc examples to str_cat and catch an error to …
9b1be94
TST added test to make sure improved error fires
e1ebbad
add comment with issue #
fb1274c
ENH changed behavior of str_cat to drop NaNs rather than returning Na…
56acd73
merge conflict
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -126,6 +126,8 @@ def test_cat(self): | |
exp = ['aa', NA, 'bb', 'bd', 'cfoo', NA] | ||
tm.assert_almost_equal(result, exp) | ||
|
||
|
||
|
||
def test_count(self): | ||
values = ['foo', 'foofoo', NA, 'foooofooofommmfoo'] | ||
|
||
|
@@ -2057,6 +2059,17 @@ def test_method_on_bytes(self): | |
'S2').astype(object)) | ||
tm.assert_series_equal(result, expected) | ||
|
||
def test_str_cat_raises_intuitive_error(self): | ||
s = Series(['a','b','c','d']) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. add the issue number as a comment here |
||
message = "Did you mean to supply a `sep` keyword?" | ||
with tm.assertRaisesRegexp(ValueError, message): | ||
s.str.cat('|') | ||
with tm.assertRaisesRegexp(ValueError, message): | ||
s.str.cat(' ') | ||
|
||
|
||
|
||
|
||
|
||
if __name__ == '__main__': | ||
nose.runmodule(argv=[__file__, '-vvs', '-x', '--pdb', '--pdb-failure'], | ||
|
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think better to actually ignore nans (as indicated in the issue), rather than just document
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you want to just treat them as
''
?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or as if they weren't present in the original array?
i.e. do a fillna then cat, or dropna then cat?
@jreback
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think
.dropna
(which doesn't exist onIndex
) - its pretty easy (there is an issue about it).so you can emulate so it will just work (for Series or Index). If you already have the values then this would work as well.
si[pd.notnull(si))]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But then you're changing the array length and what if someone supplies
others
w/ length equal to the original array (beforedropna
)...?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The
fillna
way amounts to changing the default value ofna_rep
to''
.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe in that bad case for
others
you just throw aValueError
.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
actually you bring up a good point. doing
.fillna(na_rep)
, THEN processing might just work and do the right thing.yes if
others
is then inconsistent you need to raise.pls add as many test cases as you think are needed to represent all behavior.