-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
ENH: enable Series.info() #37320
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH: enable Series.info() #37320
Changes from 8 commits
a903f32
e07d6e2
0990d54
1814795
4c390a8
81929e6
824d8d6
ce68e94
ede6dc4
789e03e
f41596d
40b71f8
3e71336
739c62d
e9c5220
f7cb4f8
def5ed6
dd4205d
e74cdce
d41ecf1
98d0f55
0d1c5d8
816803e
9e0198f
1e2aaef
688080b
4e87b1a
dc999fe
4bb4e40
f114293
16ac96e
aac2954
22303dc
9428a32
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -97,6 +97,7 @@ | |
from pandas.core.tools.datetimes import to_datetime | ||
|
||
import pandas.io.formats.format as fmt | ||
from pandas.io.formats.info import SeriesInfo | ||
import pandas.plotting | ||
|
||
if TYPE_CHECKING: | ||
|
@@ -4564,6 +4565,96 @@ def replace( | |
method=method, | ||
) | ||
|
||
@Substitution( | ||
klass="Series", | ||
type_sub="", | ||
max_cols_sub="", | ||
examples_sub=( | ||
""" | ||
>>> int_values = [1, 2, 3, 4, 5] | ||
>>> text_values = ['alpha', 'beta', 'gamma', 'delta', 'epsilon'] | ||
>>> s = pd.Series(text_values, index=int_values) | ||
>>> s.info() | ||
<class 'pandas.core.series.Series'> | ||
Int64Index: 5 entries, 1 to 5 | ||
Series name: None | ||
Non-Null Count Dtype | ||
-------------- ----- | ||
5 non-null object | ||
dtypes: object(1) | ||
memory usage: 80.0+ bytes | ||
|
||
Prints a summary excluding information about its values: | ||
|
||
>>> s.info(verbose=False) | ||
<class 'pandas.core.series.Series'> | ||
Int64Index: 5 entries, 1 to 5 | ||
dtypes: object(1) | ||
memory usage: 80.0+ bytes | ||
|
||
Pipe output of Series.info to buffer instead of sys.stdout, get | ||
buffer content and writes to a text file: | ||
|
||
>>> import io | ||
>>> buffer = io.StringIO() | ||
>>> s.info(buf=buffer) | ||
>>> s = buffer.getvalue() | ||
>>> with open("df_info.txt", "w", | ||
... encoding="utf-8") as f: # doctest: +SKIP | ||
... f.write(s) | ||
260 | ||
|
||
The `memory_usage` parameter allows deep introspection mode, specially | ||
useful for big Series and fine-tune memory optimization: | ||
|
||
>>> random_strings_array = np.random.choice(['a', 'b', 'c'], 10 ** 6) | ||
>>> s = pd.Series(np.random.choice(['a', 'b', 'c'], 10 ** 6)) | ||
>>> s.info() | ||
<class 'pandas.core.series.Series'> | ||
RangeIndex: 1000000 entries, 0 to 999999 | ||
Series name: None | ||
Non-Null Count Dtype | ||
-------------- ----- | ||
1000000 non-null object | ||
dtypes: object(1) | ||
memory usage: 7.6+ MB | ||
|
||
>>> s.info(memory_usage='deep') | ||
<class 'pandas.core.series.Series'> | ||
RangeIndex: 1000000 entries, 0 to 999999 | ||
Series name: None | ||
Non-Null Count Dtype | ||
-------------- ----- | ||
1000000 non-null object | ||
dtypes: object(1) | ||
memory usage: 55.3 MB""" | ||
), | ||
see_also_sub=( | ||
""" | ||
Series.describe: Generate descriptive statistics of Series. | ||
Series.memory_usage: Memory usage of Series.""" | ||
), | ||
) | ||
@doc(SeriesInfo.to_buffer) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. SeriesInfo.to_buffer doesn't have a docstring. so this doesn't render.
and wouldn't have memory_usage, max_cols, and null_counts parameters anyway? (as an aside there appears to be a few issues with DataFrame.info docstring on master, such as alignment of console output and rogue data parameter. Not sure if always like this or from recent refactors, so if you get time, it would be great if can you check that out) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Regarding There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
I noticed not only here, but in couple of other places, that indentation gets bad, when using this kind of construct:
I never touched the docstring, so probably @MarcoGorelli can comment on the rendering issue. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Please note that I just added dedent in some parameters docs, which make |
||
def info( | ||
self, | ||
verbose: Optional[bool] = None, | ||
buf: Optional[IO[str]] = None, | ||
max_cols: Optional[int] = None, | ||
memory_usage: Optional[Union[bool, str]] = None, | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. the docstring for DataFrame.info is
I think this should be
might be able to use Literal here (see #37137) and maybe create an alias in typing . follow-on OK too. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I tried to use Literal, but looks like that is available only starting from Python 3.8. |
||
null_counts: Optional[bool] = None, | ||
) -> None: | ||
if max_cols is not None: | ||
raise ValueError( | ||
"Argument `max_cols` can only be passed " | ||
"in DataFrame.info, not Series.info" | ||
) | ||
return SeriesInfo(self, memory_usage).to_buffer( | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. it seems odd imo to have parameters other than buf passed to to_buffer() would it be better to pass verbose and show_counts to SeriesInfo constructor or rename to_buffer? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Putting the params in the constructor is possible, but in this case in There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I renamed However, I had to make the same function signature for |
||
buf=buf, | ||
verbose=verbose, | ||
show_counts=null_counts, | ||
) | ||
|
||
@doc(NDFrame.shift, klass=_shared_doc_kwargs["klass"]) | ||
def shift(self, periods=1, freq=None, axis=0, fill_value=None) -> "Series": | ||
return super().shift( | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think the Substitution decorator should be necessary with the doc decorator. (and not seen them used together)
The doc decorator was created to supersede the Appender and Substitution decorators.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess that Substitution is still necessary if we use one generic docstring for DataFrame and Series info. I could not figure out how I can replace some keywords in the base docstring, to make it suitable for both frame and series.
Probably I do not know how to use doc decorator.