Skip to content

BUG-22984 Fix truncation of DataFrame representations #22987

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 124 commits into from
Nov 15, 2018

Conversation

JustinZhengBC
Copy link
Contributor

When printing a DataFrame to terminal, an extra column's worth of space is added to the calculated width of the DataFrame. This is presumably to help edge cases, but the calculated difference between the DataFrame width and the terminal window width is incremented by 1 a few lines later, seemingly to fix the same problem. Do any more experienced developers know of a reason to pad the DataFrame width even more?

@pep8speaks
Copy link

pep8speaks commented Oct 4, 2018

Hello @JustinZhengBC! Thanks for updating the PR.

Comment last updated on October 07, 2018 at 01:59 Hours UTC

@codecov
Copy link

codecov bot commented Oct 4, 2018

Codecov Report

Merging #22987 into master will decrease coverage by <.01%.
The diff coverage is n/a.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #22987      +/-   ##
==========================================
- Coverage   92.24%   92.24%   -0.01%     
==========================================
  Files         161      161              
  Lines       51340    51315      -25     
==========================================
- Hits        47361    47336      -25     
  Misses       3979     3979
Flag Coverage Δ
#multiple 90.63% <ø> (-0.01%) ⬇️
#single 42.31% <ø> (-0.04%) ⬇️
Impacted Files Coverage Δ
pandas/io/formats/format.py 97.88% <ø> (-0.01%) ⬇️
pandas/core/arrays/timedeltas.py 95.08% <0%> (-0.56%) ⬇️
pandas/core/dtypes/concat.py 96.26% <0%> (-0.41%) ⬇️
pandas/core/arrays/datetimelike.py 95.92% <0%> (-0.22%) ⬇️
pandas/core/arrays/datetimes.py 98.44% <0%> (-0.04%) ⬇️
pandas/core/indexes/datetimes.py 96.12% <0%> (ø) ⬆️
pandas/core/arrays/period.py 98.49% <0%> (+0.04%) ⬆️
pandas/tseries/offsets.py 97.07% <0%> (+0.08%) ⬆️
pandas/core/indexes/datetimelike.py 98.01% <0%> (+0.27%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update fb4405d...139235a. Read the comment docs.

@@ -194,6 +194,7 @@ Other Enhancements
- :meth:`Index.to_frame` now supports overriding column name(s) (:issue:`22580`).
- New attribute :attr:`__git_version__` will return git commit sha of current build (:issue:`21295`).
- Compatibility with Matplotlib 3.0 (:issue:`22790`).
- Representation of :class:`DataFrame` fills up the terminal window better
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add the issue number here, I would call this more of a bug fix, no?

@@ -616,11 +616,6 @@ def to_string(self):
else: # max_cols == 0. Try to fit frame to terminal
text = self.adj.adjoin(1, *strcols).split('\n')
max_len = Series(text).str.len().max()
headers = [ele[0] for ele in strcols]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we have a number of tests for this, suprised this didn't break anything, can you make a test if we have no coverage for this case now? (e.g. you use option_context to set the width, then check the output string)

@jreback
Copy link
Contributor

jreback commented Oct 6, 2018

cc @jorisvandenbossche

@jreback jreback added Output-Formatting __repr__ of pandas objects, to_string Bug labels Oct 6, 2018
@jreback jreback added this to the 0.24.0 milestone Nov 14, 2018
@jreback
Copy link
Contributor

jreback commented Nov 14, 2018

lgtm. @TomAugspurger a glance if you can.

@@ -1312,6 +1312,7 @@ Notice how we now instead output ``np.nan`` itself instead of a stringified form
- :func:`read_sas()` will correctly parse sas7bdat files with data page types having also bit 7 set (so page type is 128 + 256 = 384) (:issue:`16615`)
- Bug in :meth:`detect_client_encoding` where potential ``IOError`` goes unhandled when importing in a mod_wsgi process due to restricted access to stdout. (:issue:`21552`)
- Bug in :func:`to_string()` that broke column alignment when ``index=False`` and width of first column's values is greater than the width of first column's header (:issue:`16839`, :issue:`13032`)
- Bug in :func:`to_string()` that caused representations of :class:`DataFrame` to not take up the whole window (:issue:`22984`)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be :class:`DataFrame.to_string` right? We don't have a top-level pandas.to_string.

@@ -343,6 +343,16 @@ def test_repr_truncates_terminal_size(self):

assert df2.columns[0] in result.split('\n')[0]

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nitpick: we have a pytest fixture formock now, that does the try / except / skip done above.

I think it'd be cleanest to split this into a new test here, and accept the mock parameter.

def test_repr_truncates_terminal_size_full(self, mock):
    ...

Any if you're feeling adventurous, you could change the try / except / skip mock import above to use the fixture as well. Not a big deal though.

JustinZhengBC and others added 20 commits November 14, 2018 08:58
* Add documentation line with example for the ambiguous parameter of tz_locaclize

* Updating 'ambiguous'-param doc + update it on Timestamp, DatetimeIndex and NaT

This is following the discussion at
pandas-dev#23408 (comment)
benoxoft and others added 13 commits November 14, 2018 08:59
* BUG: Identify SparseDataFrame as sparse

The is_sparse function checks to see if
an array-like is spare by checking to see
if it is an instance of ABCSparseArray or
ABCSparseSeries. This commit adds
ABCSparseDataFrame to that list -- so it
can detect that a DataFrame (which is an
array-like object) is sparse.

Added a test for this.

* Revert "BUG: Identify SparseDataFrame as sparse"

This reverts commit 10dffd1.

The previous commit's change was not necessary.
Will add a docstring to clarify the behaviour of the method.

* DOC: Revise is_sparce docstring

Clean up the docstring for is_sparse so it confirms to the
documentation style guide.

Add additional examples and clarify that is_sparse
expect a 1-dimensional array-like.

* DOC: Adjust is_sparse docstring.

Responding to pull request comments.
@codecov
Copy link

codecov bot commented Nov 14, 2018

Codecov Report

Merging #22987 into master will decrease coverage by <.01%.
The diff coverage is n/a.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #22987      +/-   ##
==========================================
- Coverage   92.24%   92.24%   -0.01%     
==========================================
  Files         161      161              
  Lines       51339    51336       -3     
==========================================
- Hits        47360    47357       -3     
  Misses       3979     3979
Flag Coverage Δ
#multiple 90.64% <ø> (-0.01%) ⬇️
#single 42.34% <ø> (ø) ⬆️
Impacted Files Coverage Δ
pandas/io/formats/format.py 97.88% <ø> (-0.01%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update e413c49...82fa50c. Read the comment docs.

@jreback
Copy link
Contributor

jreback commented Nov 14, 2018

@JustinZhengBC need to merge master

@TomAugspurger
Copy link
Contributor

Another hypothesis failure on azure. OK to merge @jreback? I can investigate the failing test separately.

@jreback
Copy link
Contributor

jreback commented Nov 14, 2018

yep

@TomAugspurger
Copy link
Contributor

TomAugspurger commented Nov 14, 2018

Merged master to fix the confict. May as well let the CI run.

ping on green.

@JustinZhengBC
Copy link
Contributor Author

@TomAugspurger green. Also thanks for fixing the merge issue

@TomAugspurger
Copy link
Contributor

TomAugspurger commented Nov 15, 2018

I caused it my merging my own PR, so it's the least I could do :)

Thanks!

@TomAugspurger TomAugspurger merged commit 6920363 into pandas-dev:master Nov 15, 2018
thoo added a commit to thoo/pandas that referenced this pull request Nov 15, 2018
* upstream/master:
  BUG: to_html misses truncation indicators (...) when index=False (pandas-dev#22786)
  API/DEPR: replace "raise_conflict" with "errors" for df.update (pandas-dev#23657)
  BUG: Append DataFrame to Series with dateutil timezone (pandas-dev#23685)
  CLN/CI: Catch that stderr-warning! (pandas-dev#23706)
  ENH: Allow for join between two multi-index dataframe instances (pandas-dev#20356)
  Ensure Index._data is an ndarray (pandas-dev#23628)
  DOC: flake8-per-pr for windows users (pandas-dev#23707)
  DOC: Handle exceptions when computing contributors. (pandas-dev#23714)
  DOC: Validate space before colon docstring parameters pandas-dev#23483 (pandas-dev#23506)
  BUG-22984 Fix truncation of DataFrame representations (pandas-dev#22987)
tm9k1 pushed a commit to tm9k1/pandas that referenced this pull request Nov 19, 2018
* BUG-22984 Fix truncation of DataFrame representations
Pingviinituutti pushed a commit to Pingviinituutti/pandas that referenced this pull request Feb 28, 2019
* BUG-22984 Fix truncation of DataFrame representations
Pingviinituutti pushed a commit to Pingviinituutti/pandas that referenced this pull request Feb 28, 2019
* BUG-22984 Fix truncation of DataFrame representations
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Output-Formatting __repr__ of pandas objects, to_string
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BUG: wrong detection if truncated repr is needed