Skip to content

ENH: add Series.info #31796

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 71 commits into from
Closed
Show file tree
Hide file tree
Changes from 13 commits
Commits
Show all changes
71 commits
Select commit Hold shift + click to select a range
2b1e5fc
first draft
Feb 7, 2020
a4ad077
add whatsnew
Feb 7, 2020
c7bfb94
docstring sharing
Feb 7, 2020
01fd802
wip
Feb 7, 2020
abbae9a
add series tests
Feb 19, 2020
1a474fe
formatting
Feb 19, 2020
b30ce1b
formatting
Feb 19, 2020
6d8c765
remove old file
Feb 19, 2020
99411e4
clean
Feb 19, 2020
4651bd7
add test
Feb 19, 2020
7de4703
add test
Feb 19, 2020
99472fd
isort
Feb 19, 2020
2902fe7
remove test
Feb 19, 2020
8b8adfa
Merge remote-tracking branch 'upstream/master' into series-info
MarcoGorelli Feb 23, 2020
c6d8a76
use isinstance abcdataframe, disallow max_cols for series.info
MarcoGorelli Feb 23, 2020
d0b2e1f
refactor
MarcoGorelli Feb 23, 2020
2225810
aint no autoformatter gonna unnecessarily split my strings
MarcoGorelli Feb 23, 2020
8c6c6f5
isort
MarcoGorelli Feb 23, 2020
8afcb82
Merge branch 'master' into series-info
MarcoGorelli Feb 26, 2020
71260f3
Merge remote-tracking branch 'upstream/master' into series-info
MarcoGorelli Apr 18, 2020
127f84f
fix failing tests due to refactoring, merge conflicts
MarcoGorelli Apr 18, 2020
acae58f
Merge branch 'master' into series-info
MarcoGorelli Apr 20, 2020
9654198
resolve conflicts
MarcoGorelli Apr 20, 2020
c1006a7
replace appender with doc
MarcoGorelli Apr 20, 2020
27e45e1
Merge remote-tracking branch 'upstream/master' into series-info
MarcoGorelli Apr 22, 2020
3592e8e
indent series.info subs
MarcoGorelli Apr 22, 2020
af771e6
revert deleted line
MarcoGorelli Apr 22, 2020
317a148
fix indentation in doctests
MarcoGorelli Apr 22, 2020
5082bc5
Merge remote-tracking branch 'upstream/master' into series-info
MarcoGorelli May 12, 2020
ae0065b
reuse col_count
MarcoGorelli May 12, 2020
c36d4c4
reorder to reduce diff size
MarcoGorelli May 12, 2020
751d346
help mypy
MarcoGorelli May 16, 2020
631d914
aftermentioned 'help' should only be applied for DataFrame case
MarcoGorelli May 16, 2020
23bd173
add docstring to _get_ids_and_dtypes
MarcoGorelli May 16, 2020
304f445
correct return type of _get_ids_and_dtypes, as in Series case dtypes …
MarcoGorelli May 19, 2020
a2d6e43
return Series for dtypes in all cases
MarcoGorelli May 19, 2020
f33f0df
black bug
MarcoGorelli May 19, 2020
05c9091
Merge remote-tracking branch 'upstream/master' into series-info
MarcoGorelli May 20, 2020
22de3c5
reduce if/then
MarcoGorelli May 30, 2020
8a58bd6
simplify diff
MarcoGorelli May 30, 2020
9568d03
Merge remote-tracking branch 'origin/series-info' into series-info
MarcoGorelli May 30, 2020
21d263c
factor out memory usage
MarcoGorelli May 30, 2020
cfa8039
clarify docstring
MarcoGorelli May 30, 2020
3811545
initial OOP approach
MarcoGorelli Jun 6, 2020
6bcbef7
space method
MarcoGorelli Jun 6, 2020
a245484
add _verbose_repr method
MarcoGorelli Jun 6, 2020
c04dabf
wip
MarcoGorelli Jun 7, 2020
d9993ee
some typing / removing unnecessary methods
MarcoGorelli Jun 13, 2020
700801b
Merge remote-tracking branch 'upstream/master' into series-info
MarcoGorelli Jun 30, 2020
ad39d85
resolve better
MarcoGorelli Jun 30, 2020
cad1391
remove docstrings from inherited class
MarcoGorelli Jun 30, 2020
a53033b
fix typing
MarcoGorelli Jun 30, 2020
f0e2290
:art:
MarcoGorelli Jun 30, 2020
53e8c20
:art:, fix doctests
MarcoGorelli Jul 1, 2020
ee717c8
factor out _get_count_configs
MarcoGorelli Jul 1, 2020
4d7a211
factor _get_count_configs out of Series._verbose_repr as well
MarcoGorelli Jul 1, 2020
6eccf00
fix typing
MarcoGorelli Jul 2, 2020
81d22eb
factor out _display_counts_and_dtypes
MarcoGorelli Jul 2, 2020
f2ca520
fix typing, factor out _get_header_and_spaces
MarcoGorelli Jul 2, 2020
669ff38
document _get_count_configs
MarcoGorelli Jul 4, 2020
6f8f8b1
document _display_counts_and_dtypes and _get_header_and_spaces
MarcoGorelli Jul 4, 2020
97dc73c
remove breakpoints
MarcoGorelli Jul 4, 2020
0707f32
fix docstring substitution
MarcoGorelli Jul 4, 2020
2a2324b
Merge remote-tracking branch 'upstream/master' into series-info
MarcoGorelli Jul 5, 2020
0c08335
Merge remote-tracking branch 'upstream/master' into series-info
MarcoGorelli Sep 13, 2020
c93f1ad
Merge remote-tracking branch 'upstream/master' into series-info
MarcoGorelli Sep 19, 2020
ddf9efc
fix failing doctest
MarcoGorelli Sep 19, 2020
21d94b2
use CountConfigs namedtuple
MarcoGorelli Sep 19, 2020
a213d9c
:fire:
MarcoGorelli Sep 19, 2020
089ce24
remove trailing comma
MarcoGorelli Sep 19, 2020
4581385
fix failing doctest
MarcoGorelli Sep 19, 2020
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion doc/source/whatsnew/v1.1.0.rst
Original file line number Diff line number Diff line change
Expand Up @@ -43,7 +43,7 @@ Other enhancements

- :class:`Styler` may now render CSS more efficiently where multiple cells have the same styling (:issue:`30876`)
- When writing directly to a sqlite connection :func:`to_sql` now supports the ``multi`` method (:issue:`29921`)
-
- :meth:`Series.info` has been added, for compatibility with :meth:`DataFrame.info` (:issue:`5167`)
-

.. ---------------------------------------------------------------------------
Expand Down
104 changes: 103 additions & 1 deletion pandas/core/frame.py
Original file line number Diff line number Diff line change
Expand Up @@ -2224,9 +2224,111 @@ def to_html(
)

# ----------------------------------------------------------------------
@Substitution(
klass="DataFrame",
type_sub=" and columns",
max_cols_sub="""
max_cols : int, optional
When to switch from the verbose to the truncated output. If the
DataFrame has more than `max_cols` columns, the truncated output
is used. By default, the setting in
``pandas.options.display.max_info_columns`` is used.
""",
examples_sub="""
>>> int_values = [1, 2, 3, 4, 5]
>>> text_values = ['alpha', 'beta', 'gamma', 'delta', 'epsilon']
>>> float_values = [0.0, 0.25, 0.5, 0.75, 1.0]
>>> df = pd.DataFrame({"int_col": int_values, "text_col": text_values,
... "float_col": float_values})
>>> df
int_col text_col float_col
0 1 alpha 0.00
1 2 beta 0.25
2 3 gamma 0.50
3 4 delta 0.75
4 5 epsilon 1.00

Prints information of all columns:

>>> df.info(verbose=True)
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 5 entries, 0 to 4
Data columns (total 3 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 int_col 5 non-null int64
1 text_col 5 non-null object
2 float_col 5 non-null float64
dtypes: float64(1), int64(1), object(1)
memory usage: 248.0+ bytes

Prints a summary of columns count and its dtypes but not per column
information:

>>> df.info(verbose=False)
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 5 entries, 0 to 4
Columns: 3 entries, int_col to float_col
dtypes: float64(1), int64(1), object(1)
memory usage: 248.0+ bytes

Pipe output of DataFrame.info to buffer instead of sys.stdout, get
buffer content and writes to a text file:

>>> import io
>>> buffer = io.StringIO()
>>> df.info(buf=buffer)
>>> s = buffer.getvalue()
>>> with open("df_info.txt", "w",
... encoding="utf-8") as f: # doctest: +SKIP
... f.write(s)
260

The `memory_usage` parameter allows deep introspection mode, specially
useful for big DataFrames and fine-tune memory optimization:

>>> random_strings_array = np.random.choice(['a', 'b', 'c'], 10 ** 6)
>>> df = pd.DataFrame({
... 'column_1': np.random.choice(['a', 'b', 'c'], 10 ** 6),
... 'column_2': np.random.choice(['a', 'b', 'c'], 10 ** 6),
... 'column_3': np.random.choice(['a', 'b', 'c'], 10 ** 6)
... })
>>> df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1000000 entries, 0 to 999999
Data columns (total 3 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 column_1 1000000 non-null object
1 column_2 1000000 non-null object
2 column_3 1000000 non-null object
dtypes: object(3)
memory usage: 22.9+ MB

>>> df.info(memory_usage='deep')
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1000000 entries, 0 to 999999
Data columns (total 3 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 column_1 1000000 non-null object
1 column_2 1000000 non-null object
2 column_3 1000000 non-null object
dtypes: object(3)
memory usage: 188.8 MB""",
see_also_sub="""
DataFrame.describe: Generate descriptive statistics of DataFrame
columns.
DataFrame.memory_usage: Memory usage of DataFrame columns.""",
)
@Appender(info.__doc__)
def info(
self, verbose=None, buf=None, max_cols=None, memory_usage=None, null_counts=None
self,
verbose: Optional[bool] = None,
buf: Optional[IO[str]] = None,
max_cols: Optional[int] = None,
memory_usage: Optional[Union[bool, str]] = None,
null_counts: Optional[bool] = None,
) -> None:
return info(self, verbose, buf, max_cols, memory_usage, null_counts)

Expand Down
79 changes: 79 additions & 0 deletions pandas/core/series.py
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,7 @@
Optional,
Tuple,
Type,
Union,
)
import warnings

Expand Down Expand Up @@ -92,6 +93,7 @@
from pandas.core.tools.datetimes import to_datetime

import pandas.io.formats.format as fmt
from pandas.io.formats.info import info
import pandas.plotting

if TYPE_CHECKING:
Expand Down Expand Up @@ -4137,6 +4139,83 @@ def replace(
method=method,
)

@Substitution(
klass="Series",
type_sub="",
max_cols_sub="",
examples_sub="""
>>> int_values = [1, 2, 3, 4, 5]
>>> text_values = ['alpha', 'beta', 'gamma', 'delta', 'epsilon']
>>> s = pd.Series(text_values, index=int_values)
>>> s.info()
<class 'pandas.core.series.Series'>
Int64Index: 5 entries, 1 to 5
Series name: None
# Non-Null Count Dtype
--- -------------- -----
0 5 non-null object
dtypes: object(1)
memory usage: 80.0+ bytes

Prints a summary excluding information about its values:

>>> s.info(verbose=False)
<class 'pandas.core.series.Series'>
Int64Index: 5 entries, 1 to 5
dtypes: object(1)
memory usage: 80.0+ bytes

Pipe output of Series.info to buffer instead of sys.stdout, get
buffer content and writes to a text file:

>>> import io
>>> buffer = io.StringIO()
>>> s.info(buf=buffer)
>>> s = buffer.getvalue()
>>> with open("df_info.txt", "w",
... encoding="utf-8") as f: # doctest: +SKIP
... f.write(s)
260

The `memory_usage` parameter allows deep introspection mode, specially
useful for big Series and fine-tune memory optimization:

>>> random_strings_array = np.random.choice(['a', 'b', 'c'], 10 ** 6)
>>> s = pd.Series(np.random.choice(['a', 'b', 'c'], 10 ** 6))
>>> s.info()
<class 'pandas.core.series.Series'>
RangeIndex: 1000000 entries, 0 to 999999
Series name: None
# Non-Null Count Dtype
--- -------------- -----
0 1000000 non-null object
dtypes: object(1)
memory usage: 7.6+ MB

>>> s.info(memory_usage='deep')
<class 'pandas.core.series.Series'>
RangeIndex: 1000000 entries, 0 to 999999
Series name: None
# Non-Null Count Dtype
--- -------------- -----
0 1000000 non-null object
dtypes: object(1)
memory usage: 62.9 MB""",
see_also_sub="""
Series.describe: Generate descriptive statistics of Series.
Series.memory_usage: Memory usage of Series.""",
)
@Appender(info.__doc__)
def info(
self,
verbose: Optional[bool] = None,
buf: Optional[IO[str]] = None,
max_cols: Optional[int] = None,
memory_usage: Optional[Union[bool, str]] = None,
null_counts: Optional[bool] = None,
) -> None:
return info(self, verbose, buf, None, memory_usage, null_counts)

@Appender(generic._shared_docs["shift"] % _shared_doc_kwargs)
def shift(self, periods=1, freq=None, axis=0, fill_value=None) -> "Series":
return super().shift(
Expand Down
Loading