Skip to content

ENH: enable Series.info() #37320

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 34 commits into from
Dec 1, 2021
Merged
Show file tree
Hide file tree
Changes from 8 commits
Commits
Show all changes
34 commits
Select commit Hold shift + click to select a range
a903f32
TST: add series info tests
ivanovmg Oct 7, 2020
e07d6e2
TST: remove test that series has no info
ivanovmg Oct 7, 2020
0990d54
ENH: add method Series.info
ivanovmg Oct 7, 2020
1814795
REF: split tests for frame and series
ivanovmg Oct 7, 2020
4c390a8
REF: param test on frame memory_usage_qualified
ivanovmg Oct 7, 2020
81929e6
Merge branch 'master' into feature/series-info
ivanovmg Oct 21, 2020
824d8d6
ENH: enable series info
ivanovmg Oct 21, 2020
ce68e94
CLN: remove extra parens
ivanovmg Oct 21, 2020
ede6dc4
REF: split series-related tests
ivanovmg Oct 23, 2020
789e03e
DOC: add release note
ivanovmg Oct 23, 2020
f41596d
CLN: merge two lines
ivanovmg Oct 23, 2020
40b71f8
DOC: unify series/frame docstrings, fix indent
ivanovmg Oct 23, 2020
3e71336
REF: to_buffer -> render, unify func signature
ivanovmg Oct 23, 2020
739c62d
Merge branch 'master' into feature/series-info
ivanovmg Oct 23, 2020
e9c5220
DOC: add versionadded tag
ivanovmg Oct 23, 2020
f7cb4f8
Merge branch 'master' into feature/series-info
ivanovmg Nov 4, 2020
def5ed6
Merge branch 'master' into feature/series-info
ivanovmg Nov 5, 2020
dd4205d
Merge branch 'master' into feature/series-info
ivanovmg Nov 7, 2020
e74cdce
DOC: maybe fix empty line problem with df.info
ivanovmg Nov 7, 2020
d41ecf1
DOC: remove trailing period in type
ivanovmg Nov 7, 2020
98d0f55
Merge branch 'master' into feature/series-info
ivanovmg Nov 12, 2020
0d1c5d8
Merge branch 'master' into feature/series-info
ivanovmg Oct 4, 2021
816803e
Fix styling
ivanovmg Oct 4, 2021
9e0198f
Merge branch 'master' into feature/series-info
ivanovmg Nov 29, 2021
1e2aaef
DOC: move whatsnew info to v1.4.0
ivanovmg Nov 29, 2021
688080b
DOC: move docs on Series.info() to io/formats/info.py
ivanovmg Nov 29, 2021
4e87b1a
FIX: newline
ivanovmg Nov 29, 2021
dc999fe
FIX: change versionadded to 1.4.0
ivanovmg Nov 29, 2021
4bb4e40
DOC: extract null_counts_sub for frames only
ivanovmg Nov 29, 2021
f114293
DOC: avoid duplication of kwargs replacement
ivanovmg Nov 29, 2021
16ac96e
DOC: unify newlines/spacing with substitutions
ivanovmg Nov 29, 2021
aac2954
Revert "DOC: unify newlines/spacing with substitutions"
ivanovmg Nov 29, 2021
22303dc
DOC: fix newlines substitutions
ivanovmg Nov 29, 2021
9428a32
DOC: another attempt to fix newline
ivanovmg Nov 30, 2021
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
91 changes: 91 additions & 0 deletions pandas/core/series.py
Original file line number Diff line number Diff line change
Expand Up @@ -97,6 +97,7 @@
from pandas.core.tools.datetimes import to_datetime

import pandas.io.formats.format as fmt
from pandas.io.formats.info import SeriesInfo
import pandas.plotting

if TYPE_CHECKING:
Expand Down Expand Up @@ -4564,6 +4565,96 @@ def replace(
method=method,
)

@Substitution(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think the Substitution decorator should be necessary with the doc decorator. (and not seen them used together)

The doc decorator was created to supersede the Appender and Substitution decorators.

Copy link
Member Author

@ivanovmg ivanovmg Oct 23, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess that Substitution is still necessary if we use one generic docstring for DataFrame and Series info. I could not figure out how I can replace some keywords in the base docstring, to make it suitable for both frame and series.
Probably I do not know how to use doc decorator.

klass="Series",
type_sub="",
max_cols_sub="",
examples_sub=(
"""
>>> int_values = [1, 2, 3, 4, 5]
>>> text_values = ['alpha', 'beta', 'gamma', 'delta', 'epsilon']
>>> s = pd.Series(text_values, index=int_values)
>>> s.info()
<class 'pandas.core.series.Series'>
Int64Index: 5 entries, 1 to 5
Series name: None
Non-Null Count Dtype
-------------- -----
5 non-null object
dtypes: object(1)
memory usage: 80.0+ bytes

Prints a summary excluding information about its values:

>>> s.info(verbose=False)
<class 'pandas.core.series.Series'>
Int64Index: 5 entries, 1 to 5
dtypes: object(1)
memory usage: 80.0+ bytes

Pipe output of Series.info to buffer instead of sys.stdout, get
buffer content and writes to a text file:

>>> import io
>>> buffer = io.StringIO()
>>> s.info(buf=buffer)
>>> s = buffer.getvalue()
>>> with open("df_info.txt", "w",
... encoding="utf-8") as f: # doctest: +SKIP
... f.write(s)
260

The `memory_usage` parameter allows deep introspection mode, specially
useful for big Series and fine-tune memory optimization:

>>> random_strings_array = np.random.choice(['a', 'b', 'c'], 10 ** 6)
>>> s = pd.Series(np.random.choice(['a', 'b', 'c'], 10 ** 6))
>>> s.info()
<class 'pandas.core.series.Series'>
RangeIndex: 1000000 entries, 0 to 999999
Series name: None
Non-Null Count Dtype
-------------- -----
1000000 non-null object
dtypes: object(1)
memory usage: 7.6+ MB

>>> s.info(memory_usage='deep')
<class 'pandas.core.series.Series'>
RangeIndex: 1000000 entries, 0 to 999999
Series name: None
Non-Null Count Dtype
-------------- -----
1000000 non-null object
dtypes: object(1)
memory usage: 55.3 MB"""
),
see_also_sub=(
"""
Series.describe: Generate descriptive statistics of Series.
Series.memory_usage: Memory usage of Series."""
),
)
@doc(SeriesInfo.to_buffer)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SeriesInfo.to_buffer doesn't have a docstring. so this doesn't render.

>>> help(pd.Series.info)
Help on function info in module pandas.core.series:

info(self, verbose: Union[bool, NoneType] = None, buf: Union[IO[str], NoneType] = None, max_cols: Union[int, NoneType] = N
one, memory_usage: Union[bool, str, NoneType] = None, null_counts: Union[bool, NoneType] = None) -> None

>>>

and wouldn't have memory_usage, max_cols, and null_counts parameters anyway?

(as an aside there appears to be a few issues with DataFrame.info docstring on master, such as alignment of console output and rogue data parameter. Not sure if always like this or from recent refactors, so if you get time, it would be great if can you check that out)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding null_counts - does it mean that we do not need series info without non-null counts?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(as an aside there appears to be a few issues with DataFrame.info docstring on master, such as alignment of console output and rogue data parameter. Not sure if always like this or from recent refactors, so if you get time, it would be great if can you check that out)

I noticed not only here, but in couple of other places, that indentation gets bad, when using this kind of construct:

        %(max_cols_sub)s

I never touched the docstring, so probably @MarcoGorelli can comment on the rendering issue.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please note that I just added dedent in some parameters docs, which make info docstrings render better, without extra indentation.

def info(
self,
verbose: Optional[bool] = None,
buf: Optional[IO[str]] = None,
max_cols: Optional[int] = None,
memory_usage: Optional[Union[bool, str]] = None,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the docstring for DataFrame.info is

memory_usage: bool, str, optional

I think this should be

memory_usage: bool or 'deep', optional

might be able to use Literal here (see #37137) and maybe create an alias in typing . follow-on OK too.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried to use Literal, but looks like that is available only starting from Python 3.8.

null_counts: Optional[bool] = None,
) -> None:
if max_cols is not None:
raise ValueError(
"Argument `max_cols` can only be passed "
"in DataFrame.info, not Series.info"
)
return SeriesInfo(self, memory_usage).to_buffer(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it seems odd imo to have parameters other than buf passed to to_buffer()

would it be better to pass verbose and show_counts to SeriesInfo constructor or rename to_buffer?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Putting the params in the constructor is possible, but in this case in SeriesInfo there will be two more attributes, which are used only in one method (smaller cohesion within the class).
I would prefer renaming the method. I will look into that.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I renamed to_buffer -> render.

However, I had to make the same function signature for DataFrameInfo and SeriesInfo to avoid typing errors.
Thus, I pass max_cols into render and raise ValueError there instead of pandas.core.series.info.
How does it look?

buf=buf,
verbose=verbose,
show_counts=null_counts,
)

@doc(NDFrame.shift, klass=_shared_doc_kwargs["klass"])
def shift(self, periods=1, freq=None, axis=0, fill_value=None) -> "Series":
return super().shift(
Expand Down
Loading