Skip to content

DataFrame.__repr__ raises TypeError after pd.show_versions() was run #13684

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
xflr6 opened this issue Jul 17, 2016 · 14 comments · Fixed by #14126
Closed

DataFrame.__repr__ raises TypeError after pd.show_versions() was run #13684

xflr6 opened this issue Jul 17, 2016 · 14 comments · Fixed by #14126
Labels
Compat pandas objects compatability with Numpy or Python functions
Milestone

Comments

@xflr6
Copy link
Contributor

xflr6 commented Jul 17, 2016

Maybe one of the imports in show_versions has unwanted side effects?

>>> import pandas as pd
>>> pd.DataFrame({'spam': range(10)})
   spam
0     0
1     1
2     2
3     3
4     4
5     5
6     6
7     7
8     8
9     9
>>> pd.show_versions()

INSTALLED VERSIONS
------------------
commit: None
python: 2.7.12.final.0
python-bits: 64
OS: Windows
OS-release: 7
machine: AMD64
processor: Intel64 Family 6 Model 42 Stepping 7, GenuineIntel
byteorder: little
LC_ALL: None
LANG: None

pandas: 0.18.1
nose: 1.3.7
pip: 8.1.2
setuptools: 24.0.3
Cython: 0.24.1
numpy: 1.11.1
scipy: 0.18.0rc2
statsmodels: None
xarray: None
IPython: 5.0.0
sphinx: 1.4.5
patsy: 0.4.1
dateutil: 2.5.3
pytz: 2016.6.1
blosc: None
bottleneck: 1.1.0
tables: None
numexpr: 2.6.0
matplotlib: 1.5.1
openpyxl: 2.3.5
xlrd: 1.0.0
xlwt: None
xlsxwriter: None
lxml: 3.6.0
bs4: None
html5lib: 0.999999999
httplib2: 0.9.2
apiclient: None
sqlalchemy: 1.0.14
pymysql: None
psycopg2: 2.6.2 (dt dec pq3 ext lo64)
jinja2: 2.8
boto: None
pandas_datareader: None
>>> pd.DataFrame({'spam': range(10)})

Traceback (most recent call last):
  File "<pyshell#3>", line 1, in <module>
    pd.DataFrame({'spam': range(10)})
  File "C:\Program Files\Python27\lib\site-packages\pandas\core\base.py", line 67, in __repr__
    return str(self)
  File "C:\Program Files\Python27\lib\site-packages\pandas\core\base.py", line 47, in __str__
    return self.__bytes__()
  File "C:\Program Files\Python27\lib\site-packages\pandas\core\base.py", line 59, in __bytes__
    return self.__unicode__().encode(encoding, 'replace')
  File "C:\Program Files\Python27\lib\site-packages\pandas\core\frame.py", line 535, in __unicode__
    line_width=width, show_dimensions=show_dimensions)
  File "C:\Program Files\Python27\lib\site-packages\pandas\core\frame.py", line 1488, in to_string
    formatter.to_string()
  File "C:\Program Files\Python27\lib\site-packages\pandas\formats\format.py", line 549, in to_string
    strcols = self._to_str_columns()
  File "C:\Program Files\Python27\lib\site-packages\pandas\formats\format.py", line 467, in _to_str_columns
    str_index = self._get_formatted_index(frame)
  File "C:\Program Files\Python27\lib\site-packages\pandas\formats\format.py", line 746, in _get_formatted_index
    fmt_index = [index.format(name=show_index_names, formatter=fmt)]
  File "C:\Program Files\Python27\lib\site-packages\pandas\indexes\base.py", line 1462, in format
    return self._format_with_header(header, **kwargs)
  File "C:\Program Files\Python27\lib\site-packages\pandas\indexes\base.py", line 1486, in _format_with_header
    result = _trim_front(format_array(values, None, justify='left'))
  File "C:\Program Files\Python27\lib\site-packages\pandas\formats\format.py", line 2007, in format_array
    return fmt_obj.get_result()
  File "C:\Program Files\Python27\lib\site-packages\pandas\formats\format.py", line 2027, in get_result
    return _make_fixed_width(fmt_values, self.justify)
  File "C:\Program Files\Python27\lib\site-packages\pandas\formats\format.py", line 2394, in _make_fixed_width
    max_len = np.max([adj.len(x) for x in strings])
  File "C:\Program Files\Python27\lib\site-packages\numpy\core\fromnumeric.py", line 2293, in amax
    out=out, **kwargs)
  File "C:\Program Files\Python27\lib\site-packages\numpy\core\_methods.py", line 26, in _amax
    return umr_maximum(a, axis, None, out, keepdims)
TypeError: an integer is required
@jorisvandenbossche
Copy link
Member

I cannot reproduce this using Windows and python 2.7.

Could you try to debug this? To see where the error is coming from (seems there is something wrong with the np.max([adj.len(x) for x in strings])).
Or eg create an isolated environment with only required dependencies to see of the problem occurs there as well (using conda or virtualenv).

@jorisvandenbossche jorisvandenbossche added the Needs Info Clarification about behavior needed to assess issue label Jul 17, 2016
@xflr6
Copy link
Contributor Author

xflr6 commented Jul 17, 2016

Tracked it to this import of numpy, which according to the docs does a reload().

Indeed numpy (at least on my machines) seems to dislike being reloaded:

>>> import numpy as np
>>> np.max([42])
42
>>> reload(np)
<module 'numpy' from 'C:\Program Files\Python27\lib\site-packages\numpy\__init__.pyc'>
>>> np.max([42])

Traceback (most recent call last):
  File "<pyshell#3>", line 1, in <module>
    np.max([42])
  File "C:\Program Files\Python27\lib\site-packages\numpy\core\fromnumeric.py", line 2293, in amax
    out=out, **kwargs)
  File "C:\Program Files\Python27\lib\site-packages\numpy\core\_methods.py", line 26, in _amax
    return umr_maximum(a, axis, None, out, keepdims)
TypeError: an integer is required

The line was changed in b4e2d34, so I guess that should fix this for the next version (though I am still curious if others have this numpy issue with reload).

@sinhrks
Copy link
Member

sinhrks commented Jul 17, 2016

I can't reproduce it using NumPy 1.10.4 and 1.11.1. Can you report it to NumPy?

@xflr6
Copy link
Contributor Author

xflr6 commented Jul 17, 2016

Seems to be a problem with the binaries here (does not occur with the PyPI wheels): I'll report to the packager, closing.

@xflr6 xflr6 closed this as completed Jul 17, 2016
@sinhrks sinhrks added this to the No action milestone Jul 17, 2016
@xflr6
Copy link
Contributor Author

xflr6 commented Jul 18, 2016

The numpy issue: numpy/numpy#7844

@jorisvandenbossche
Copy link
Member

@xflr6 Thanks for tracking it down!

@njsmith
Copy link

njsmith commented Jul 20, 2016

I think there is a real pandas bug here. The bug is that show_versions calls importlib.import_module, and apparently -- this is not documented anywhere, and may vary between py2 and py3 -- import_module may reload modules. show_versions should not be reloading all these modules. I'd suggest replacing that line with something like

if modname in sys.modules:
    mod = sys.modules[modname]
else:
    mod = importlib.import_module(modname)

@jreback
Copy link
Contributor

jreback commented Jul 20, 2016

@njsmith but seems numpy is not robust to being reloaded.

@charris
Copy link

charris commented Jul 20, 2016

@jreback I suspect a lot of modules are not robust against reloading. For instance if you define a class in foo.py, use it in bar.py, instanciate a = foo.MyClass() in another module, then use isinstance(a, foo.MyClass) that statement will fail if foo is reloaded. I suspect what is wanted in many cases is a simple import, but I haven't checked. Numpy also uses load_module in a few places that should probably be audited.

@njsmith The reload property is documented in the imp module documentation. It always happens for existing modules. Note that nonexisting modules get created...

@jreback jreback reopened this Jul 20, 2016
@jreback jreback added Compat pandas objects compatability with Numpy or Python functions and removed Can't Repro Needs Info Clarification about behavior needed to assess issue labels Jul 20, 2016
@jreback jreback modified the milestones: Next Major Release, No action, 0.19.0 Jul 20, 2016
@jreback
Copy link
Contributor

jreback commented Jul 20, 2016

ok this should be easy to fix then

@charris
Copy link

charris commented Jul 20, 2016

AFAICT, load_module is useful when you need to use a module that is not installed and not located in the current directory. For instance, during the numpy install process. If numpy is installed you should be able to simply import it.

@charris
Copy link

charris commented Jul 21, 2016

Maybe __import__?

@charris
Copy link

charris commented Jul 21, 2016

Or importlib.import_module

@njsmith
Copy link

njsmith commented Jul 21, 2016

Pandas actually uses importlib.import_module, which isn't documented to reload, but I guess it must eventually call load_module because otherwise we wouldn't have this problem. (I haven't tried tracing the details, and importlib has completely different implementations on different versions of python, so that's something to watch out for if anyone wants to figure out exactly what's happening).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Compat pandas objects compatability with Numpy or Python functions
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants