Skip to content

ENH: Series.reset_index() should support "names" argument just like DataFrame.reset_index() does #55225

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
1 of 3 tasks
toobaz opened this issue Sep 21, 2023 · 7 comments
Open
1 of 3 tasks
Labels
Enhancement Needs Triage Issue that has not been reviewed by a pandas team member

Comments

@toobaz
Copy link
Member

toobaz commented Sep 21, 2023

Feature Type

  • Adding new functionality to pandas

  • Changing existing functionality in pandas

  • Removing existing functionality in pandas

Problem Description

df = pd.DataFrame([[1, 2], [3, 4]])
df.reset_index(names='former_index')

often comes handy. The same would hold for

s = pd.Series([1, 2])
s.reset_index(names='former_index')

... except it raises a TypeError

Feature Description

Just implementnames in Series.reset_index as it is implemented in DataFrame.reset_index.

Alternative Solutions

None I can think of_.

Additional Context

I suspect this was not done in the first place because discussion in #6878 considered pd.Series.reset_index to already have the name argument... which however does something different (renaming the former values, not the former index).

@toobaz toobaz added Enhancement Needs Triage Issue that has not been reviewed by a pandas team member labels Sep 21, 2023
@jfadia
Copy link
Contributor

jfadia commented Oct 1, 2023

take

@jfadia
Copy link
Contributor

jfadia commented Oct 1, 2023

My understanding of the issue:

  • Currently, the pandas.core.series.reset_index and pandas.core.frame.reset_index functions' name arg are not aligned
>>> s = pd.Series([1,2,3])
>>> df = s.reset_index(name="test_index")
>>> df
   index  test_index
0      0           1
1      1           2
2      2           3

Column containing original data has been renamed.
>>> df = pd.DataFrame({"A": [1,2,3]})
>>> df
   A
0  1
1  2
2  3
>>> df.reset_index(inplace=True, names="test_index")
>>> df
   test_index  A
0           0  1
1           1  2
2           2  3

Column that was formerly the index has been renamed.

Series:

  • The reset_index function definition is here
  • The name arg is passed into to_frame here which converts the Series to DataFrame
  • I believe that this name arg is then set as the column name here
  • My limited understanding of the internals is that the SingleArrayManager used for the Series gets converted to an ArrayManager in which axes = [self.axes[0], columns] here
  • A DataFrame is then returned by passing the ArrayManager here
  • I believe that this method is defined here and generates a new instance of class DataFrame by passing in the new ArrayManager and ultimately returns it here
  • How this DataFrame is actually constructed from this call (i.e. where are the column names being set) confuses me and I may need some help understanding here
  • This df is then returned and the index is reset but without passing the names arg in here so the name of the index remains unchanged

DataFrame:

  • We would pass names arg into the reset_index function which would actually rename the index/multi index columns themselves (logic starts here)

Some thoughts:

  • The name arg in a Series looks like it is materially different from the index name and I'm not sure that the tradeoff to align these 2 functions (i.e. reducing consistency within the Series object) is worth it
>>> s = pd.Series([1,2,3], name="test_series_name")
>>> s
0    1
1    2
2    3
Name: test_series_name, dtype: int64
>>> s.index.name = "test_series_index_name"
>>> s
test_series_index_name
0    1
1    2
2    3
Name: test_series_name, dtype: int64
  • I do agree that it is confusing to have names in a df refer to the index names versus name refer to the "column name" in a series
  • I think that one alternative solution would be to call pd.Series.reset_index which I believe would yield pd.DataFrame and then call pd.DataFrame.rename
>>> s = pd.Series([1,2,3], name="name")
>>> df = s.reset_index()
>>> df.rename({"index": "index_name"}, inplace=True, axis=1)
>>> df
   index_name  name
0           0     1
1           1     2
2           2     3
  • Another alternative is to give the index itself a name upon creating the series and before resetting the index
>>> s = pd.Series([1,2,3], name="name", index=["idx1", "idx2", "idx3"])
>>> s
idx1    1
idx2    2
idx3    3
Name: name, dtype: int64
>>> s.index.name = "index_name"
>>> s
index_name
idx1    1
idx2    2
idx3    3
Name: name, dtype: int64
>>> df = s.reset_index()
>>> df
  index_name  name
0       idx1     1
1       idx2     2
2       idx3     3

@toobaz
Copy link
Member Author

toobaz commented Oct 1, 2023

Currently, the pandas.core.series.reset_index and pandas.core.frame.reset_index functions' name arg are not aligned

I don't know if I see a problem of "alignment"... they are just two different things, with different names (no pun intended!). My suggestion is to add names to pd.Series and is unrelated to the already present name parameter - except for the possible confusion due to the two similarly named arguments.

As to implementation:

I think that one alternative solution would be to call pd.Series.reset_index which I believe would yield pd.DataFrame and then call pd.DataFrame.rename

... i guess I would just copy the implementation in pd.DataFrame.reset_index as far as is feasible, even reusing some code if possible.

@jfadia
Copy link
Contributor

jfadia commented Oct 1, 2023

@toobaz I see what you mean. On second thought I'm not really sure why we have the name arg in pd.Series.reset_index as a matter of process. It would make more sense to me to change the name of the Series using pd.Series.name = {name} before resetting the index if one wanted to do so. Maybe we could get rid of the name arg all together and replace it with names. Since pd.Series.reset_index calls pd.DataFrame.reset_index we would just have to pass the names into that call and it would work the same way. That would also avoid confusion between name and names. Thoughts?

@toobaz
Copy link
Member Author

toobaz commented Oct 2, 2023

Maybe we could get rid of the name arg all together and replace it with names.

They are to my eyes two different things, I can't understand how or why to "replace" one with the other.

(This said, on the (unrelated to this issue) idea of removing name argument: I'm not a fan of it. If we had to do it, my suggested replacement would be not pd.Series.name = "NAME", which we cannot insert in a commands chain, but rather pd.Series.rename("NAME"). But this said, it really does not seem to me worth the deprecation effort, regardless of what we decide concerning the present issue.)

@jfadia
Copy link
Contributor

jfadia commented Oct 2, 2023

I understand that they are different. I don't think that we should keep both because it will be confusing to have 2 similarly named params that do different things in the same function. Maybe instead of adding a param called name we rename it to index_name instead? That way it is clear

@toobaz
Copy link
Member Author

toobaz commented Oct 2, 2023

Maybe instead of adding a param called name

[notice I'm suggesting to add names, but I guess we are on the same page]

we rename it to index_name instead? That way it is clear

We could find another name for the parameter, but it should definitely be the same as in pd.DataFrame.reset_index(), so we would have to rename names there too.

As for the options, index_name reminds me the name of the new index, not the former index.
If we really don't want to have name and names - and hence we need to rename something - my preferred options would probably be that name in pd.Series.reset_index() becomes rename (it's the name of the related method, after all).

But this said, if we need to have the deprecation cycle anyway, maybe I'm not anymore so strongly against just suppressing name.

@jfadia jfadia removed their assignment Dec 3, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Enhancement Needs Triage Issue that has not been reviewed by a pandas team member
Projects
None yet
Development

No branches or pull requests

2 participants