Skip to content

Pandas creates a large number of unnecessary threads #9394

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
thomasj02 opened this issue Feb 2, 2015 · 7 comments
Closed

Pandas creates a large number of unnecessary threads #9394

thomasj02 opened this issue Feb 2, 2015 · 7 comments
Labels
Multithreading Parallelism in pandas

Comments

@thomasj02
Copy link

Simply importing pandas creates a huge number of threads:

$ gdb python
GNU gdb (Ubuntu 7.7.1-0ubuntu5~14.04.2) 7.7.1
Copyright (C) 2014 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from python...Reading symbols from /usr/lib/debug//usr/bin/python2.7...done.
done.
(gdb) run
Starting program: /usr/bin/python 
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Python 2.7.6 (default, Mar 22 2014, 22:59:56) 
[GCC 4.8.2] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import pandas
[New Thread 0x7ffff42b6700 (LWP 19093)]
[New Thread 0x7ffff3ab5700 (LWP 19094)]
[New Thread 0x7ffff12b4700 (LWP 19095)]
[New Thread 0x7fffeeab3700 (LWP 19096)]
[New Thread 0x7fffec2b2700 (LWP 19097)]
[New Thread 0x7fffe9ab1700 (LWP 19098)]
[New Thread 0x7fffe72b0700 (LWP 19099)]
[Thread 0x7ffff42b6700 (LWP 19093) exited]
[Thread 0x7fffe72b0700 (LWP 19099) exited]
[Thread 0x7ffff3ab5700 (LWP 19094) exited]
[Thread 0x7fffec2b2700 (LWP 19097) exited]
[Thread 0x7ffff12b4700 (LWP 19095) exited]
[Thread 0x7fffe9ab1700 (LWP 19098) exited]
[Thread 0x7fffeeab3700 (LWP 19096) exited]
[New Thread 0x7fffe72b0700 (LWP 19103)]
[New Thread 0x7fffe9ab1700 (LWP 19104)]
[New Thread 0x7fffec2b2700 (LWP 19105)]
[New Thread 0x7fffeeab3700 (LWP 19106)]
[New Thread 0x7ffff3cd1700 (LWP 19107)]
[New Thread 0x7ffff12b4700 (LWP 19108)]
[New Thread 0x7fffde6e8700 (LWP 19109)]
[New Thread 0x7fffddee7700 (LWP 19110)]
[New Thread 0x7fffdac85700 (LWP 19111)]
[New Thread 0x7fffda484700 (LWP 19112)]
[New Thread 0x7fffd9c83700 (LWP 19113)]
[New Thread 0x7fffd9482700 (LWP 19114)]
[New Thread 0x7fffd8c81700 (LWP 19115)]
[New Thread 0x7fffd8480700 (LWP 19116)]
[New Thread 0x7fffd7c7f700 (LWP 19117)]
[New Thread 0x7fffd747e700 (LWP 19118)]
>>> 

Setting OMP_NUM_THREADS=1 and NUMEXPR_NUM_THREADS=1 reduces the number of threads created, but it looks like a bunch of blosc threads are still being created.

More seriously, simply importing a small pandas component also creates a ton of threads:

$ export OMP_NUM_THREADS=1
$ export NUMEXPR_NUM_THREADS=1
$ gdb python
GNU gdb (Ubuntu 7.7.1-0ubuntu5~14.04.2) 7.7.1
Copyright (C) 2014 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from python...Reading symbols from /usr/lib/debug//usr/bin/python2.7...done.
done.
(gdb) run
Starting program: /usr/bin/python 
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Python 2.7.6 (default, Mar 22 2014, 22:59:56) 
[GCC 4.8.2] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> from pandas.tslib import Timestamp
[New Thread 0x7fffecca9700 (LWP 19238)]
[New Thread 0x7fffec4a8700 (LWP 19239)]
[New Thread 0x7fffebca7700 (LWP 19240)]
[New Thread 0x7fffeb4a6700 (LWP 19241)]
[New Thread 0x7fffeaca5700 (LWP 19242)]
[New Thread 0x7fffea4a4700 (LWP 19243)]
[New Thread 0x7fffe9ca3700 (LWP 19244)]
[New Thread 0x7fffe94a2700 (LWP 19245)]

This is kind of wasteful, and it makes it difficult to optimize thread handling on multicore systems using thread pinning or other scheduling techniques.

@shoyer
Copy link
Member

shoyer commented Feb 2, 2015

To help narrow this down: do you see the same behavior if you just import numpy or scipy?

@jreback
Copy link
Contributor

jreback commented Feb 2, 2015

pandas doesn't create a thread pool at all, but does create a single global thread local to handle some caching for date format parsing.
numexpr and blosc (used by pytables and others IIRC) implcity use threading/multi-cores to do work.

It is typical to setup a thread pool at import time as this is a global resource.

Since these threads are not actually doing anything unless the code for them takes over, how does this actually matter? how is this unclean?

@thomasj02
Copy link
Author

Numpy does create some BLAS threads by default, but if I set OMP_NUM_THREADS=1 and NUMEXPR_NUM_THREADS=1 I don't see extra threads:

$ export NUMEXPR_NUM_THREADS=1
$ export OMP_NUM_THREADS=1
$ gdb python
(gdb) run
Starting program: /usr/bin/python 
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Python 2.7.6 (default, Mar 22 2014, 22:59:56) 
[GCC 4.8.2] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import numpy
>>> import scipy

@thomasj02
Copy link
Author

Apologies for the huge stacktrace below, but you can see that after setting the environment variables, at least some of the new threads are blosc-related stuff created by what looks like pandas.io:

(gdb) b __pthread_create_2_1
Function "__pthread_create_2_1" not defined.
Make breakpoint pending on future shared library load? (y or [n]) y
Breakpoint 1 (__pthread_create_2_1) pending.
(gdb) run
Starting program: /usr/bin/python 
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Python 2.7.6 (default, Mar 22 2014, 22:59:56) 
[GCC 4.8.2] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> from pandas.tslib import Timestamp

Breakpoint 1, __pthread_create_2_1 (newthread=newthread@entry=0x7fffec5f2fa0 <threads>, 
    attr=attr@entry=0x7fffec5f2b60 <ct_attr>, start_routine=start_routine@entry=0x7fffec3d17d0 <t_blosc>, 
    arg=arg@entry=0x7fffec5f2ba0 <tids>) at pthread_create.c:466
466 pthread_create.c: No such file or directory.
(gdb) bt
#0  __pthread_create_2_1 (newthread=newthread@entry=0x7fffec5f2fa0 <threads>, attr=attr@entry=0x7fffec5f2b60 <ct_attr>, 
    start_routine=start_routine@entry=0x7fffec3d17d0 <t_blosc>, arg=arg@entry=0x7fffec5f2ba0 <tids>) at pthread_create.c:466
#1  0x00007fffec3d2242 in init_threads () at c-blosc/blosc/blosc.c:1562
#2  blosc_set_nthreads_ (nthreads_new=nthreads_new@entry=8) at c-blosc/blosc/blosc.c:1645
#3  0x00007fffec3d2e45 in blosc_set_nthreads (nthreads_new=8) at c-blosc/blosc/blosc.c:1596
#4  0x00007fffec3d098b in PyBlosc_set_nthreads (self=<optimized out>, args=<optimized out>) at blosc/blosc_extension.c:38
#5  0x000000000052c6d5 in call_function (oparg=<optimized out>, pp_stack=0x7fffffffc030) at ../Python/ceval.c:4020
#6  PyEval_EvalFrameEx (f=f@entry=
    Frame 0x7fffedb55e50, for file /usr/local/lib/python2.7/dist-packages/blosc/toplevel.py, line 95, in set_nthreads (nthreads=8), throwflag=throwflag@entry=0) at ../Python/ceval.c:2666
#7  0x000000000052cf32 in fast_function (nk=<optimized out>, na=<optimized out>, n=1, pp_stack=0x7fffffffc170, 
    func=<function at remote 0x7fffec8438c0>) at ../Python/ceval.c:4106
#8  call_function (oparg=<optimized out>, pp_stack=0x7fffffffc170) at ../Python/ceval.c:4041
#9  PyEval_EvalFrameEx (
    f=f@entry=Frame 0x139b050, for file /usr/local/lib/python2.7/dist-packages/blosc/__init__.py, line 52, in <module> (), 
    throwflag=throwflag@entry=0) at ../Python/ceval.c:2666
#10 0x000000000055c594 in PyEval_EvalCodeEx (co=0x7fffec845e30, globals=<optimized out>, locals=<optimized out>, 
    args=<optimized out>, argcount=<optimized out>, kws=<optimized out>, kwcount=0, defs=0x0, defcount=0, closure=0x0)
    at ../Python/ceval.c:3252
#11 0x00000000005b7392 in PyEval_EvalCode (co=<optimized out>, globals=<optimized out>, locals=<optimized out>)
    at ../Python/ceval.c:667
#12 0x00000000005b744a in PyImport_ExecCodeModuleEx (name=name@entry=0x14dcd7a "blosc", 
    co=co@entry=<code at remote 0x7fffec845e30>, 
    pathname=pathname@entry=0x14ec430 "/usr/local/lib/python2.7/dist-packages/blosc/__init__.pyc") at ../Python/import.c:709
#13 0x0000000000579f0f in load_source_module.39194 (name=name@entry=0x14dcd7a "blosc", 
    pathname=0x14ec430 "/usr/local/lib/python2.7/dist-packages/blosc/__init__.pyc", 
    pathname@entry=0x14eb420 "/usr/local/lib/python2.7/dist-packages/blosc/__init__.py", fp=<optimized out>)
    at ../Python/import.c:1099
#14 0x00000000005b7541 in load_module.39237 (name=name@entry=0x14dcd7a "blosc", fp=<optimized out>, 
    pathname=pathname@entry=0x14eb420 "/usr/local/lib/python2.7/dist-packages/blosc/__init__.py", type=<optimized out>, 
    loader=loader@entry=0x0) at ../Python/import.c:1906
#15 0x000000000046b0e7 in load_package.39273 (name=name@entry=0x14dcd7a "blosc", 
    pathname=pathname@entry=0x14ddd80 "/usr/local/lib/python2.7/dist-packages/blosc") at ../Python/import.c:1166
#16 0x00000000005b75bd in load_module.39237 (name=name@entry=0x14dcd7a "blosc", fp=<optimized out>, 
    pathname=pathname@entry=0x14ddd80 "/usr/local/lib/python2.7/dist-packages/blosc", type=<optimized out>, 
    loader=<optimized out>) at ../Python/import.c:1920
#17 0x000000000055d3a3 in import_submodule.39248 (mod=mod@entry=None, subname=subname@entry=0x14dcd7a "blosc", 
    fullname=fullname@entry=0x14dcd7a "blosc") at ../Python/import.c:2700
#18 0x000000000055d830 in load_next (mod=<module at remote 0x7fffef32ba28>, altmod=None, p_name=p_name@entry=0x7fffffffc560, 
    buf=buf@entry=0x14dcd70 "pandas.io.blosc", p_buflen=p_buflen@entry=0x7fffffffc570) at ../Python/import.c:2519
#19 0x000000000055db37 in import_module_level.isra.3 (level=-1, fromlist=None, globals=<optimized out>, name=0x0)
    at ../Python/import.c:2224
---Type <return> to continue, or q <return> to quit---
#20 PyImport_ImportModuleLevel (name=<optimized out>, globals=<optimized out>, locals=<optimized out>, fromlist=None, 
    level=-1) at ../Python/import.c:2288
#21 0x00000000004755e7 in builtin___import__.32997 (self=<optimized out>, args=<optimized out>, kwds=<optimized out>)
    at ../Python/bltinmodule.c:49
#22 0x00000000004da20b in PyObject_Call (kw=0x0, 
    arg=('blosc', {'Index': <type at remote 0x1052620>, 'SparsePanel': <type at remote 0x1350880>, 'make_block': <function at remote 0x7fffee9eb938>, 'PY3': False, 'Panel4D': <type at remote 0x10cc070>, 'DataFrame': <type at remote 0x12ba690>, 'datetime': <type at remote 0x7ffff549ca00>, 'parse': <function at remote 0x7ffff07f4e60>, '_Packer': <type at remote 0x7fffec811e20>, 'SparseSeries': <type at remote 0x1347830>, 'IntIndex': <type at remote 0x7fffeef2dc80>, 'PeriodIndex': <type at remote 0xba9fa0>, 'BlockManager': <type at remote 0x1131170>, 'compat': <module at remote 0x7ffff0b0c2b8>, 'DatetimeIndex': <type at remote 0x10a0b70>, '_Unpacker': <type at remote 0x7fffec811b80>, 'Timestamp': <type at remote 0xe9fbe0>, 'NDFrame': <type at remote 0x114d3c0>, '__package__': 'pandas.io', 'Float64Index': <type at remote 0x1055a70>, 'NaT': <NaTType at remote 0x7ffff0ec2750>, 'np': <module at remote 0x7ffff4318bb0>, '__doc__': '\nMsgpack serializer support for reading and writing pandas data structures\nto disk\n', 'Panel...(truncated), func=<built-in function __import__>) at ../Objects/abstract.c:2529
#23 PyEval_CallObjectWithKeywords (func=func@entry=<built-in function __import__>, 
    arg=arg@entry=('blosc', {'Index': <type at remote 0x1052620>, 'SparsePanel': <type at remote 0x1350880>, 'make_block': <function at remote 0x7fffee9eb938>, 'PY3': False, 'Panel4D': <type at remote 0x10cc070>, 'DataFrame': <type at remote 0x12ba690>, 'datetime': <type at remote 0x7ffff549ca00>, 'parse': <function at remote 0x7ffff07f4e60>, '_Packer': <type at remote 0x7fffec811e20>, 'SparseSeries': <type at remote 0x1347830>, 'IntIndex': <type at remote 0x7fffeef2dc80>, 'PeriodIndex': <type at remote 0xba9fa0>, 'BlockManager': <type at remote 0x1131170>, 'compat': <module at remote 0x7ffff0b0c2b8>, 'DatetimeIndex': <type at remote 0x10a0b70>, '_Unpacker': <type at remote 0x7fffec811b80>, 'Timestamp': <type at remote 0xe9fbe0>, 'NDFrame': <type at remote 0x114d3c0>, '__package__': 'pandas.io', 'Float64Index': <type at remote 0x1055a70>, 'NaT': <NaTType at remote 0x7ffff0ec2750>, 'np': <module at remote 0x7ffff4318bb0>, '__doc__': '\nMsgpack serializer support for reading and writing pandas data structures\nto disk\n', 'Panel...(truncated), kw=kw@entry=0x0) at ../Python/ceval.c:3889
#24 0x000000000052e6d7 in PyEval_EvalFrameEx (
    f=f@entry=Frame 0x7fffee15bd38, for file /usr/local/lib/python2.7/dist-packages/pandas/io/packers.py, line 67, in <module> (), throwflag=throwflag@entry=0) at ../Python/ceval.c:2333
#25 0x000000000055c594 in PyEval_EvalCodeEx (co=0x7fffec8458b0, globals=<optimized out>, locals=<optimized out>, 
    args=<optimized out>, argcount=<optimized out>, kws=<optimized out>, kwcount=0, defs=0x0, defcount=0, closure=0x0)
    at ../Python/ceval.c:3252
#26 0x00000000005b7392 in PyEval_EvalCode (co=<optimized out>, globals=<optimized out>, locals=<optimized out>)
    at ../Python/ceval.c:667
#27 0x00000000005b744a in PyImport_ExecCodeModuleEx (name=name@entry=0xf3cf80 "pandas.io.packers", 
    co=co@entry=<code at remote 0x7fffec8458b0>, 
    pathname=pathname@entry=0x14e0db0 "/usr/local/lib/python2.7/dist-packages/pandas/io/packers.pyc")
    at ../Python/import.c:709
#28 0x0000000000579f0f in load_source_module.39194 (name=name@entry=0xf3cf80 "pandas.io.packers", 
    pathname=0x14e0db0 "/usr/local/lib/python2.7/dist-packages/pandas/io/packers.pyc", 
    pathname@entry=0x133b600 "/usr/local/lib/python2.7/dist-packages/pandas/io/packers.py", fp=<optimized out>)
    at ../Python/import.c:1099
#29 0x00000000005b7541 in load_module.39237 (name=name@entry=0xf3cf80 "pandas.io.packers", fp=<optimized out>, 
    pathname=pathname@entry=0x133b600 "/usr/local/lib/python2.7/dist-packages/pandas/io/packers.py", type=<optimized out>, 
---Type <return> to continue, or q <return> to quit---
    loader=<optimized out>) at ../Python/import.c:1906
#30 0x000000000055d3a3 in import_submodule.39248 (mod=mod@entry=<module at remote 0x7fffef32ba28>, 
    subname=subname@entry=0xf3cf8a "packers", fullname=0xf3cf80 "pandas.io.packers") at ../Python/import.c:2700
#31 0x000000000055d77c in load_next (mod=mod@entry=<module at remote 0x7fffef32ba28>, 
    altmod=altmod@entry=<module at remote 0x7fffef32ba28>, p_name=p_name@entry=0x7fffffffcb30, 
    buf=buf@entry=0xf3cf80 "pandas.io.packers", p_buflen=p_buflen@entry=0x7fffffffcb40) at ../Python/import.c:2515
#32 0x000000000055e054 in import_module_level.isra.3 (level=<optimized out>, fromlist=('read_msgpack', 'to_msgpack'), 
    globals=<optimized out>, name=0x0) at ../Python/import.c:2232
#33 PyImport_ImportModuleLevel (name=<optimized out>, globals=<optimized out>, locals=<optimized out>, 
    fromlist=('read_msgpack', 'to_msgpack'), level=<optimized out>) at ../Python/import.c:2288
#34 0x00000000004755e7 in builtin___import__.32997 (self=<optimized out>, args=<optimized out>, kwds=<optimized out>)
    at ../Python/bltinmodule.c:49
#35 0x00000000004da20b in PyObject_Call (kw=0x0, 
    arg=('pandas.io.packers', {'read_sql': <function at remote 0x7fffec82a488>, 'read_table': <function at remote 0x7fffee162230>, 'read_html': <function at remote 0x7fffec81d668>, 'ExcelWriter': <ABCMeta(engine=<abstractproperty(__doc__='name of engine') at remote 0x7fffee173808>, check_extension=<classmethod at remote 0x7fffee17f328>, __module__='pandas.io.excel', __abstractmethods__=frozenset(['engine', 'save', 'write_cells', 'supported_extensions']), __exit__=<function at remote 0x7fffee1806e0>, _abc_negative_cache=<WeakSet(_remove=<function at remote 0x7fffee180848>, _pending_removals=[], _iterating=set([]), data=set([])) at remote 0x7fffee179950>, path=None, __dict__=<getset_descriptor at remote 0x7fffee17c5f0>, close=<function at remote 0x7fffee180758>, __weakref__=<getset_descriptor at remote 0x7fffee17c638>, __init__=<function at remote 0x7fffee180500>, _abc_cache=<WeakSet(_remove=<function at remote 0x7fffee1807d0>, _pending_removals=[], _iterating=set([]), data=set([])) at remote 0x7fffee1798d0>, supported...(truncated), func=<built-in function __import__>) at ../Objects/abstract.c:2529
#36 PyEval_CallObjectWithKeywords (func=func@entry=<built-in function __import__>, 
    arg=arg@entry=('pandas.io.packers', {'read_sql': <function at remote 0x7fffec82a488>, 'read_table': <function at remote 0x7fffee162230>, 'read_html': <function at remote 0x7fffec81d668>, 'ExcelWriter': <ABCMeta(engine=<abstractproperty(__doc__='name of engine') at remote 0x7fffee173808>, check_extension=<classmethod at remote 0x7fffee17f328>, __module__='pandas.io.excel', __abstractmethods__=frozenset(['engine', 'save', 'write_cells', 'supported_extensions']), __exit__=<function at remote 0x7fffee1806e0>, _abc_negative_cache=<WeakSet(_remove=<function at remote 0x7fffee180848>, _pending_removals=[], _iterating=set([]), data=set([])) at remote 0x7fffee179950>, path=None, __dict__=<getset_descriptor at remote 0x7fffee17c5f0>, close=<function at remote 0x7fffee180758>, __weakref__=<getset_descriptor at remote 0x7fffee17c638>, __init__=<function at remote 0x7fffee180500>, _abc_cache=<WeakSet(_remove=<function at remote 0x7fffee1807d0>, _pending_removals=[], _iterating=set([]), data=set([])) at remote 0x7fffee1798d0>, supported...(truncated), kw=kw@entry=0x0) at ../Python/ceval.c:3889
#37 0x000000000052e6d7 in PyEval_EvalFrameEx (
    f=f@entry=Frame 0x7ffff0832560, for file /usr/local/lib/python2.7/dist-packages/pandas/io/api.py, line 14, in <module> (), throwflag=throwflag@entry=0) at ../Python/ceval.c:2333
#38 0x000000000055c594 in PyEval_EvalCodeEx (co=0x7fffeefbe2b0, globals=<optimized out>, locals=<optimized out>, 
    args=<optimized out>, argcount=<optimized out>, kws=<optimized out>, kwcount=0, defs=0x0, defcount=0, closure=0x0)
    at ../Python/ceval.c:3252
#39 0x00000000005b7392 in PyEval_EvalCode (co=<optimized out>, globals=<optimized out>, locals=<optimized out>)
    at ../Python/ceval.c:667
#40 0x00000000005b744a in PyImport_ExecCodeModuleEx (name=name@entry=0xa14b80 "pandas.io.api", 
    co=co@entry=<code at remote 0x7fffeefbe2b0>, 
---Type <return> to continue, or q <return> to quit---
    pathname=pathname@entry=0x1359f00 "/usr/local/lib/python2.7/dist-packages/pandas/io/api.pyc") at ../Python/import.c:709
#41 0x0000000000579f0f in load_source_module.39194 (name=name@entry=0xa14b80 "pandas.io.api", 
    pathname=0x1359f00 "/usr/local/lib/python2.7/dist-packages/pandas/io/api.pyc", 
    pathname@entry=0x10b4590 "/usr/local/lib/python2.7/dist-packages/pandas/io/api.py", fp=<optimized out>)
    at ../Python/import.c:1099
#42 0x00000000005b7541 in load_module.39237 (name=name@entry=0xa14b80 "pandas.io.api", fp=<optimized out>, 
    pathname=pathname@entry=0x10b4590 "/usr/local/lib/python2.7/dist-packages/pandas/io/api.py", type=<optimized out>, 
    loader=<optimized out>) at ../Python/import.c:1906
#43 0x000000000055d3a3 in import_submodule.39248 (mod=mod@entry=<module at remote 0x7fffef32ba28>, 
    subname=subname@entry=0xa14b8a "api", fullname=0xa14b80 "pandas.io.api") at ../Python/import.c:2700
#44 0x000000000055d77c in load_next (mod=mod@entry=<module at remote 0x7fffef32ba28>, 
    altmod=altmod@entry=<module at remote 0x7fffef32ba28>, p_name=p_name@entry=0x7fffffffd100, 
    buf=buf@entry=0xa14b80 "pandas.io.api", p_buflen=p_buflen@entry=0x7fffffffd110) at ../Python/import.c:2515
#45 0x000000000055e054 in import_module_level.isra.3 (level=<optimized out>, fromlist=('*',), globals=<optimized out>, 
    name=0x0) at ../Python/import.c:2232
#46 PyImport_ImportModuleLevel (name=<optimized out>, globals=<optimized out>, locals=<optimized out>, fromlist=('*',), 
    level=<optimized out>) at ../Python/import.c:2288
#47 0x00000000004755e7 in builtin___import__.32997 (self=<optimized out>, args=<optimized out>, kwds=<optimized out>)
    at ../Python/bltinmodule.c:49
#48 0x00000000004da20b in PyObject_Call (kw=0x0, 
    arg=('pandas.io.api', {'SparseArray': <type at remote 0x1175460>, 'expanding_median': <function at remote 0x7fffee1902a8>, 'ewmvol': <function at remote 0x7fffee18e230>, 'Categorical': <type at remote 0x10c4830>, 'LooseVersion': <classobj at remote 0x7ffff07e7808>, 'datetime': <type at remote 0x7ffff549ca00>, '__path__': ['/usr/local/lib/python2.7/dist-packages/pandas'], 'computation': <module at remote 0x7fffeef6d6a8>, 'to_timedelta': <function at remote 0x7fffee9eb758>, 'expanding_corr': <function at remote 0x7fffee190848>, 'expanding_quantile': <function at remote 0x7fffee190758>, 'rolling_corr': <function at remote 0x7fffee200cf8>, 'rolling_apply': <function at remote 0x7fffee18ed70>, 'tseries': <module at remote 0x7fffef34ac58>, 'save': <function at remote 0x7ffff04b7e60>, 'match': <function at remote 0x7fffef0bd410>, 'bdate_range': <function at remote 0x7fffef041320>, 'expanding_cov': <function at remote 0x7fffee1907d0>, '__file__': '/usr/local/lib/python2.7/dist-packages/pandas/__init__.pyc', 'util': <modu...(truncated), func=<built-in function __import__>) at ../Objects/abstract.c:2529
#49 PyEval_CallObjectWithKeywords (func=func@entry=<built-in function __import__>, 
    arg=arg@entry=('pandas.io.api', {'SparseArray': <type at remote 0x1175460>, 'expanding_median': <function at remote 0x7fffee1902a8>, 'ewmvol': <function at remote 0x7fffee18e230>, 'Categorical': <type at remote 0x10c4830>, 'LooseVersion': <classobj at remote 0x7ffff07e7808>, 'datetime': <type at remote 0x7ffff549ca00>, '__path__': ['/usr/local/lib/python2.7/dist-packages/pandas'], 'computation': <module at remote 0x7fffeef6d6a8>, 'to_timedelta': <function at remote 0x7fffee9eb758>, 'expanding_corr': <function at remote 0x7fffee190848>, 'expanding_quantile': <function at remote 0x7fffee190758>, 'rolling_corr': <function at remote 0x7fffee200cf8>, 'rolling_apply': <function at remote 0x7fffee18ed70>, 'tseries': <module at remote 0x7fffef34ac58>, 'save': <function at remote 0x7ffff04b7e60>, 'match': <function at remote 0x7fffef0bd410>, 'bdate_range': <function at remote 0x7fffef041320>, 'expanding_cov': <function at remote 0x7fffee1907d0>, '__file__': '/usr/local/lib/python2.7/dist-packages/pandas/__init__.pyc', 'util': <modu...(truncated), kw=kw@entry=0x0) at ../Python/ceval.c:3889
#50 0x000000000052e6d7 in PyEval_EvalFrameEx (
    f=f@entry=Frame 0x7ffff42e5218, for file /usr/local/lib/python2.7/dist-packages/pandas/__init__.py, line 53, in <module> (), throwflag=throwflag@entry=0) at ../Python/ceval.c:2333
---Type <return> to continue, or q <return> to quit---
#51 0x000000000055c594 in PyEval_EvalCodeEx (co=0x7ffff4315630, globals=<optimized out>, locals=<optimized out>, 
    args=<optimized out>, argcount=<optimized out>, kws=<optimized out>, kwcount=0, defs=0x0, defcount=0, closure=0x0)
    at ../Python/ceval.c:3252
#52 0x00000000005b7392 in PyEval_EvalCode (co=<optimized out>, globals=<optimized out>, locals=<optimized out>)
    at ../Python/ceval.c:667
#53 0x00000000005b744a in PyImport_ExecCodeModuleEx (name=name@entry=0xb15530 "pandas", 
    co=co@entry=<code at remote 0x7ffff4315630>, 
    pathname=pathname@entry=0xa0fcc0 "/usr/local/lib/python2.7/dist-packages/pandas/__init__.pyc") at ../Python/import.c:709
#54 0x0000000000579f0f in load_source_module.39194 (name=name@entry=0xb15530 "pandas", 
    pathname=0xa0fcc0 "/usr/local/lib/python2.7/dist-packages/pandas/__init__.pyc", 
    pathname@entry=0xa9ca30 "/usr/local/lib/python2.7/dist-packages/pandas/__init__.py", fp=<optimized out>)
    at ../Python/import.c:1099
#55 0x00000000005b7541 in load_module.39237 (name=name@entry=0xb15530 "pandas", fp=<optimized out>, 
    pathname=pathname@entry=0xa9ca30 "/usr/local/lib/python2.7/dist-packages/pandas/__init__.py", type=<optimized out>, 
    loader=loader@entry=0x0) at ../Python/import.c:1906
#56 0x000000000046b0e7 in load_package.39273 (name=name@entry=0xb15530 "pandas", 
    pathname=pathname@entry=0xa2f6d0 "/usr/local/lib/python2.7/dist-packages/pandas") at ../Python/import.c:1166
#57 0x00000000005b75bd in load_module.39237 (name=name@entry=0xb15530 "pandas", fp=<optimized out>, 
    pathname=pathname@entry=0xa2f6d0 "/usr/local/lib/python2.7/dist-packages/pandas", type=<optimized out>, 
    loader=<optimized out>) at ../Python/import.c:1920
#58 0x000000000055d3a3 in import_submodule.39248 (mod=mod@entry=None, subname=subname@entry=0xb15530 "pandas", 
    fullname=0xb15530 "pandas") at ../Python/import.c:2700
#59 0x000000000055d77c in load_next (mod=None, altmod=None, p_name=p_name@entry=0x7fffffffd720, 
    buf=buf@entry=0xb15530 "pandas", p_buflen=p_buflen@entry=0x7fffffffd730) at ../Python/import.c:2515
#60 0x000000000055db37 in import_module_level.isra.3 (level=-1, fromlist=('Timestamp',), globals=<optimized out>, 
    name=0x7ffff4318333 "tslib") at ../Python/import.c:2224
#61 PyImport_ImportModuleLevel (name=<optimized out>, globals=<optimized out>, locals=<optimized out>, 
    fromlist=('Timestamp',), level=-1) at ../Python/import.c:2288
#62 0x00000000004755e7 in builtin___import__.32997 (self=<optimized out>, args=<optimized out>, kwds=<optimized out>)
    at ../Python/bltinmodule.c:49
#63 0x00000000004da20b in PyObject_Call (kw=0x0, 
    arg=('pandas.tslib', {'__builtins__': <module at remote 0x7ffff7f82b08>, '__name__': '__main__', '__doc__': None, '__package__': None}, {...}, ('Timestamp',)), func=<built-in function __import__>) at ../Objects/abstract.c:2529
#64 PyEval_CallObjectWithKeywords (func=func@entry=<built-in function __import__>, 
    arg=arg@entry=('pandas.tslib', {'__builtins__': <module at remote 0x7ffff7f82b08>, '__name__': '__main__', '__doc__': None, '__package__': None}, {...}, ('Timestamp',)), kw=kw@entry=0x0) at ../Python/ceval.c:3889
#65 0x000000000052e6d7 in PyEval_EvalFrameEx (f=f@entry=Frame 0x7ffff7e2edd0, for file <stdin>, line 1, in <module> (), 
    throwflag=throwflag@entry=0) at ../Python/ceval.c:2333
#66 0x000000000055c594 in PyEval_EvalCodeEx (co=0x7ffff7e71330, globals=<optimized out>, locals=<optimized out>, 
    args=<optimized out>, argcount=<optimized out>, kws=<optimized out>, kwcount=0, defs=0x0, defcount=0, closure=0x0)
    at ../Python/ceval.c:3252
#67 0x00000000005b7392 in PyEval_EvalCode (co=co@entry=0x7ffff7e71330, 

@thomasj02
Copy link
Author

@jreback The particular use case that's causing me trouble is when I use a thread pinning tool like likwid-pin (https://code.google.com/p/likwid/). Likwid does round-robin allocation of threads onto CPU cores. But since there are a ton of extra threads created by Pandas, I have to try to trick likwid into putting the "real" threads on distinct cores while ignoring the Pandas threads.

I use likwid as an example, but it would be just as much of a pain for any other general-purpose thread pinning tool.

You could imagine other problems, like increased script run time due to thread creation / destruction when running a very large number of small python tasks.

I think a simple solution would be to have Pandas take XXX_NUM_THREADS settings for all the multithreaded libraries it uses.

A better solution would make Pandas not pull in all sorts of extra machinery when you just import a simple class (e.g., Timestamp)

@jreback
Copy link
Contributor

jreback commented Feb 3, 2015

well as I said
pandss doesn't use threads
the dependencies do

so
a) can't do anything about this
b) this is thread pool creation not actual thread creation

@jreback jreback closed this as completed Feb 3, 2015
@jreback jreback added Multithreading Parallelism in pandas Won't Fix and removed Won't Fix labels Feb 3, 2015
@thomasj02
Copy link
Author

Yeah, you're probably right that the solution of not importing the entire world when small classes are imported is the better way to go.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Multithreading Parallelism in pandas
Projects
None yet
Development

No branches or pull requests

3 participants