Skip to content

DOC/TST: pandas.Series.str.get with dict elements supports passing hashable label #47911

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
1 task done
BeRT2me opened this issue Aug 1, 2022 · 12 comments · Fixed by #47918
Closed
1 task done

DOC/TST: pandas.Series.str.get with dict elements supports passing hashable label #47911

BeRT2me opened this issue Aug 1, 2022 · 12 comments · Fixed by #47918
Labels
Docs Nested Data Data where the values are collections (lists, sets, dicts, objects, etc.). Strings String extension data type and string data

Comments

@BeRT2me
Copy link

BeRT2me commented Aug 1, 2022

Pandas version checks

  • I have checked that the issue still exists on the latest versions of the docs on main here

Location of the documentation

https://pandas.pydata.org/docs/dev/reference/api/pandas.Series.str.get.html

Documentation problem

The fact that Series.str.get also works with Non-Numerical dict types isn't clear in the documentation.

Suggested fix for documentation

Parameters should be int or str and an example using a dictionary should be added.

@BeRT2me BeRT2me added Docs Needs Triage Issue that has not been reviewed by a pandas team member labels Aug 1, 2022
@mroeschke
Copy link
Member

I find it odd that arguments other than int are allowed in get since the value should indicate the position. I think instead, a ValueError should be raised if the value is not an int.

@mroeschke mroeschke added Error Reporting Incorrect or improved errors from pandas Strings String extension data type and string data and removed Docs Needs Triage Issue that has not been reviewed by a pandas team member labels Aug 1, 2022
@mroeschke mroeschke changed the title DOC: pandas.Series.str.get missing dict support ERR: pandas.Series.str.get should raise if int isn't passed Aug 1, 2022
@phofl
Copy link
Member

phofl commented Aug 1, 2022

This does not work anyway, this returns an all nan series

s = pd.Series(["String",
              (1, 2, 3),
              ["a", "b", "c"],
              123,
              -456,
              {1: "Hello", "2": "World"}])

print(s.str.get({1: 1}))

@xr-chen
Copy link
Contributor

xr-chen commented Aug 1, 2022

I find it odd that arguments other than int are allowed in get since the value should indicate the position. I think instead, a ValueError should be raised if the value is not an int.

Would TypeError be more suitable than ValueError?

@mroeschke
Copy link
Member

Sure, TypeError makes sense here

@BeRT2me
Copy link
Author

BeRT2me commented Aug 1, 2022

This does not work anyway, this returns an all nan series

s = pd.Series(["String",
              (1, 2, 3),
              ["a", "b", "c"],
              123,
              -456,
              {1: "Hello", "2": "World"}])

print(s.str.get({1: 1}))

I'm not sure where you managed to get that passing a dictionary is what I was referring to...

This is what works just fine:

s = pd.Series([{"name": "Hello", "value": "World"},
              {"name": "Goodbye", "value": "Planet"}])

print(s.str.get('name'))

Raising an error would destroy functionality...

@phofl
Copy link
Member

phofl commented Aug 1, 2022

You should have made this significantly clearer in your initial post.

Anyway, I don’t think this should be supported

@BeRT2me
Copy link
Author

BeRT2me commented Aug 1, 2022

There's literally an example of it being used with a dictionary already in the existing documentation.
And I was clear, you just didn't read the whole post.

Parameters should be int or str...

What is one good reason for NOT supporting this functionality? Would you prefer apply and lambda have to be used every time you want to access dict values in a series?? Intuitively, having .get work this way makes perfect sense, it's mirroring the existing dict.get functionality.

@xr-chen
Copy link
Contributor

xr-chen commented Aug 1, 2022

s = pd.Series([{"name": "Hello", "value": "World"},
              {"name": "Goodbye", "value": "Planet"}])

print(s.str.get('name'))

Raising an error would destroy functionality...

I just tried s.str['name'] works well even if we raise an error when non-integer type passed in Series.str.get.

@BeRT2me
Copy link
Author

BeRT2me commented Aug 1, 2022

What is the point of Series.str.get then? s.str[0] works just as well.
If anything, I think it should better mirror dict.get and allow a default value to be specified.

@BeRT2me
Copy link
Author

BeRT2me commented Aug 1, 2022

This is the current implementation, it's obviously not meant to be int only...

    def _str_get(self, i):
        def f(x):
            if isinstance(x, dict):
                return x.get(i)
            elif len(x) > i >= -len(x):
                return x[i]
            return self._str_na_value

        return self._str_map(f)

If anything, it should be:

    def _str_get(self, i, default=None):
        def f(x):
            if isinstance(x, dict):
                return x.get(i, default)
            elif len(x) > i >= -len(x):
                return x[i]
            return default if default else self._str_na_value

        return self._str_map(f)

@mroeschke
Copy link
Member

Thanks @BeRT2me. Although I find this "odd usage" being under str.get, I suppose this should be supported and have better documentation.

@mroeschke mroeschke added Docs and removed Error Reporting Incorrect or improved errors from pandas labels Aug 1, 2022
@mroeschke mroeschke changed the title ERR: pandas.Series.str.get should raise if int isn't passed DOC/TST: pandas.Series.str.get with dict elements supports passing hashable label Aug 1, 2022
@mroeschke mroeschke added the Nested Data Data where the values are collections (lists, sets, dicts, objects, etc.). label Aug 1, 2022
@BeRT2me
Copy link
Author

BeRT2me commented Aug 1, 2022

I agree that it's odd usage, but how Series.str works for anything (list, tuple, dict, set, etc.) that isn't a string hasn't made much sense logically for a while.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Docs Nested Data Data where the values are collections (lists, sets, dicts, objects, etc.). Strings String extension data type and string data
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants