Skip to content

#30214 (Parallelized Build / CI) caused a build failure for me #30356

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
topper-123 opened this issue Dec 19, 2019 · 14 comments · Fixed by #30585
Closed

#30214 (Parallelized Build / CI) caused a build failure for me #30356

topper-123 opened this issue Dec 19, 2019 · 14 comments · Fixed by #30585
Labels
Build Library building on various platforms Windows Windows OS
Milestone

Comments

@topper-123
Copy link
Contributor

Currently I can't get pandas build.

I get the message when running python setup.py build_ext --inplace -j 4:

Compiling pandas\_libs/parsers.pyx because it changed.
Compiling pandas\_libs/tslibs/timestamps.pyx because it changed.
Compiling pandas\_libs/window/aggregations.pyx because it changed.
Compiling pandas\io/msgpack/_packer.pyx because it changed.
Compiling pandas\io/msgpack/_unpacker.pyx because it changed.
Compiling pandas\_libs/groupby.pyx because it changed.
Compiling pandas\_libs/index.pyx because it changed.
Compiling pandas\_libs/internals.pyx because it changed.
Compiling pandas\_libs/lib.pyx because it changed.
Compiling pandas\_libs/parsers.pyx because it changed.
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "C:\Users\TP\Miniconda3\envs\pandas-dev\lib\multiprocessing\spawn.py", line 105, in spawn_main
Compiling pandas\_libs/tslibs/timestamps.pyx because it changed.
Compiling pandas\_libs/window/aggregations.pyx because it changed.
Compiling pandas\io/msgpack/_packer.pyx because it changed.
Compiling pandas\io/msgpack/_unpacker.pyx because it changed.
Compiling pandas\_libs/groupby.pyx because it changed.
Compiling pandas\_libs/index.pyx because it changed.
Compiling pandas\_libs/internals.pyx because it changed.
Compiling pandas\_libs/lib.pyx because it changed.
Compiling pandas\_libs/parsers.pyx because it changed.
    exitcode = _main(fd)
  File "C:\Users\TP\Miniconda3\envs\pandas-dev\lib\multiprocessing\spawn.py", line 114, in _main
    prepare(preparation_data)
  File "C:\Users\TP\Miniconda3\envs\pandas-dev\lib\multiprocessing\spawn.py", line 225, in prepare
    _fixup_main_from_path(data['init_main_from_path'])
  File "C:\Users\TP\Miniconda3\envs\pandas-dev\lib\multiprocessing\spawn.py", line 277, in _fixup_main_from_path
    run_name="__mp_main__")
  File "C:\Users\TP\Miniconda3\envs\pandas-dev\lib\runpy.py", line 263, in run_path
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "C:\Users\TP\Miniconda3\envs\pandas-dev\lib\multiprocessing\spawn.py", line 105, in spawn_main
    exitcode = _main(fd)
  File "C:\Users\TP\Miniconda3\envs\pandas-dev\lib\multiprocessing\spawn.py", line 114, in _main
    prepare(preparation_data)
  File "C:\Users\TP\Miniconda3\envs\pandas-dev\lib\multiprocessing\spawn.py", line 225, in prepare
    _fixup_main_from_path(data['init_main_from_path'])
  File "C:\Users\TP\Miniconda3\envs\pandas-dev\lib\multiprocessing\spawn.py", line 277, in _fixup_main_from_path
    run_name="__mp_main__")
  File "C:\Users\TP\Miniconda3\envs\pandas-dev\lib\runpy.py", line 263, in run_path
Compiling pandas\_libs/tslibs/timestamps.pyx because it changed.
Compiling pandas\_libs/window/aggregations.pyx because it changed.
Compiling pandas\io/msgpack/_packer.pyx because it changed.
Compiling pandas\io/msgpack/_unpacker.pyx because it changed.
    pkg_name=pkg_name, script_name=fname)
    pkg_name=pkg_name, script_name=fname)
  File "C:\Users\TP\Miniconda3\envs\pandas-dev\lib\runpy.py", line 96, in _run_module_code
    mod_name, mod_spec, pkg_name, script_name)
  File "C:\Users\TP\Miniconda3\envs\pandas-dev\lib\runpy.py", line 85, in _run_code
  File "C:\Users\TP\Miniconda3\envs\pandas-dev\lib\runpy.py", line 96, in _run_module_code
    exec(code, run_globals)
  File "C:\Users\TP\Documents\Python\pandasdev\pandasdev\setup.py", line 815, in <module>
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "C:\Users\TP\Miniconda3\envs\pandas-dev\lib\multiprocessing\spawn.py", line 105, in spawn_main
    mod_name, mod_spec, pkg_name, script_name)
  File "C:\Users\TP\Miniconda3\envs\pandas-dev\lib\runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "C:\Users\TP\Documents\Python\pandasdev\pandasdev\setup.py", line 815, in <module>
    exitcode = _main(fd)
  File "C:\Users\TP\Miniconda3\envs\pandas-dev\lib\multiprocessing\spawn.py", line 114, in _main
    ext_modules=maybe_cythonize(extensions, compiler_directives=directives),
  File "C:\Users\TP\Documents\Python\pandasdev\pandasdev\setup.py", line 543, in maybe_cythonize
    prepare(preparation_data)
  File "C:\Users\TP\Miniconda3\envs\pandas-dev\lib\multiprocessing\spawn.py", line 225, in prepare
    _fixup_main_from_path(data['init_main_from_path'])
  File "C:\Users\TP\Miniconda3\envs\pandas-dev\lib\multiprocessing\spawn.py", line 277, in _fixup_main_from_path
    run_name="__mp_main__")
  File "C:\Users\TP\Miniconda3\envs\pandas-dev\lib\runpy.py", line 263, in run_path
    return cythonize(extensions, *args, **kwargs)
  File "C:\Users\TP\Miniconda3\envs\pandas-dev\lib\site-packages\Cython\Build\Dependencies.py", line 1073, in cythonize
    pkg_name=pkg_name, script_name=fname)
  File "C:\Users\TP\Miniconda3\envs\pandas-dev\lib\runpy.py", line 96, in _run_module_code
    ext_modules=maybe_cythonize(extensions, compiler_directives=directives),
  File "C:\Users\TP\Documents\Python\pandasdev\pandasdev\setup.py", line 543, in maybe_cythonize
    mod_name, mod_spec, pkg_name, script_name)
  File "C:\Users\TP\Miniconda3\envs\pandas-dev\lib\runpy.py", line 85, in _run_code
    return cythonize(extensions, *args, **kwargs)
  File "C:\Users\TP\Miniconda3\envs\pandas-dev\lib\site-packages\Cython\Build\Dependencies.py", line 1073, in cythonize
    exec(code, run_globals)
  File "C:\Users\TP\Documents\Python\pandasdev\pandasdev\setup.py", line 815, in <module>
    ext_modules=maybe_cythonize(extensions, compiler_directives=directives),
    nthreads, initializer=_init_multiprocessing_helper)
  File "C:\Users\TP\Miniconda3\envs\pandas-dev\lib\multiprocessing\context.py", line 119, in Pool
  File "C:\Users\TP\Documents\Python\pandasdev\pandasdev\setup.py", line 543, in maybe_cythonize
    nthreads, initializer=_init_multiprocessing_helper)
    return cythonize(extensions, *args, **kwargs)
  File "C:\Users\TP\Miniconda3\envs\pandas-dev\lib\site-packages\Cython\Build\Dependencies.py", line 1073, in cythonize
  File "C:\Users\TP\Miniconda3\envs\pandas-dev\lib\multiprocessing\context.py", line 119, in Pool
    nthreads, initializer=_init_multiprocessing_helper)
  File "C:\Users\TP\Miniconda3\envs\pandas-dev\lib\multiprocessing\context.py", line 119, in Pool
Compiling pandas\_libs/groupby.pyx because it changed.
Compiling pandas\_libs/index.pyx because it changed.
    context=self.get_context())
    context=self.get_context())
    context=self.get_context())
Compiling pandas\_libs/internals.pyx because it changed.
  File "C:\Users\TP\Miniconda3\envs\pandas-dev\lib\multiprocessing\pool.py", line 176, in __init__
  File "C:\Users\TP\Miniconda3\envs\pandas-dev\lib\multiprocessing\pool.py", line 176, in __init__
  File "C:\Users\TP\Miniconda3\envs\pandas-dev\lib\multiprocessing\pool.py", line 176, in __init__
Compiling pandas\_libs/lib.pyx because it changed.
Compiling pandas\_libs/parsers.pyx because it changed.
Compiling pandas\_libs/tslibs/timestamps.pyx because it changed.
Compiling pandas\_libs/window/aggregations.pyx because it changed.
Compiling pandas\io/msgpack/_packer.pyx because it changed.
Compiling pandas\io/msgpack/_unpacker.pyx because it changed.
    self._repopulate_pool()
    self._repopulate_pool()
    self._repopulate_pool()
  File "C:\Users\TP\Miniconda3\envs\pandas-dev\lib\multiprocessing\pool.py", line 241, in _repopulate_pool
  File "C:\Users\TP\Miniconda3\envs\pandas-dev\lib\multiprocessing\pool.py", line 241, in _repopulate_pool
  File "C:\Users\TP\Miniconda3\envs\pandas-dev\lib\multiprocessing\pool.py", line 241, in _repopulate_pool
Traceback (most recent call last):
    w.start()
    w.start()
    w.start()
  File "<string>", line 1, in <module>
  File "C:\Users\TP\Miniconda3\envs\pandas-dev\lib\multiprocessing\process.py", line 112, in start
  File "C:\Users\TP\Miniconda3\envs\pandas-dev\lib\multiprocessing\process.py", line 112, in start
  File "C:\Users\TP\Miniconda3\envs\pandas-dev\lib\multiprocessing\process.py", line 112, in start
  File "C:\Users\TP\Miniconda3\envs\pandas-dev\lib\multiprocessing\spawn.py", line 105, in spawn_main
    exitcode = _main(fd)
  File "C:\Users\TP\Miniconda3\envs\pandas-dev\lib\multiprocessing\spawn.py", line 114, in _main
    prepare(preparation_data)
  File "C:\Users\TP\Miniconda3\envs\pandas-dev\lib\multiprocessing\spawn.py", line 225, in prepare
    _fixup_main_from_path(data['init_main_from_path'])
  File "C:\Users\TP\Miniconda3\envs\pandas-dev\lib\multiprocessing\spawn.py", line 277, in _fixup_main_from_path
    run_name="__mp_main__")
  File "C:\Users\TP\Miniconda3\envs\pandas-dev\lib\runpy.py", line 263, in run_path
    pkg_name=pkg_name, script_name=fname)
  File "C:\Users\TP\Miniconda3\envs\pandas-dev\lib\runpy.py", line 96, in _run_module_code
    mod_name, mod_spec, pkg_name, script_name)
  File "C:\Users\TP\Miniconda3\envs\pandas-dev\lib\runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "C:\Users\TP\Documents\Python\pandasdev\pandasdev\setup.py", line 815, in <module>
    ext_modules=maybe_cythonize(extensions, compiler_directives=directives),
  File "C:\Users\TP\Documents\Python\pandasdev\pandasdev\setup.py", line 543, in maybe_cythonize
    return cythonize(extensions, *args, **kwargs)
  File "C:\Users\TP\Miniconda3\envs\pandas-dev\lib\site-packages\Cython\Build\Dependencies.py", line 1073, in cythonize
    nthreads, initializer=_init_multiprocessing_helper)
  File "C:\Users\TP\Miniconda3\envs\pandas-dev\lib\multiprocessing\context.py", line 119, in Pool
    context=self.get_context())
  File "C:\Users\TP\Miniconda3\envs\pandas-dev\lib\multiprocessing\pool.py", line 176, in __init__
    self._repopulate_pool()
  File "C:\Users\TP\Miniconda3\envs\pandas-dev\lib\multiprocessing\pool.py", line 241, in _repopulate_pool
    w.start()
  File "C:\Users\TP\Miniconda3\envs\pandas-dev\lib\multiprocessing\process.py", line 112, in start
    self._popen = self._Popen(self)
    self._popen = self._Popen(self)
    self._popen = self._Popen(self)
    self._popen = self._Popen(self)
  File "C:\Users\TP\Miniconda3\envs\pandas-dev\lib\multiprocessing\context.py", line 322, in _Popen
  File "C:\Users\TP\Miniconda3\envs\pandas-dev\lib\multiprocessing\context.py", line 322, in _Popen
  File "C:\Users\TP\Miniconda3\envs\pandas-dev\lib\multiprocessing\context.py", line 322, in _Popen
  File "C:\Users\TP\Miniconda3\envs\pandas-dev\lib\multiprocessing\context.py", line 322, in _Popen
    return Popen(process_obj)
    return Popen(process_obj)
    return Popen(process_obj)
    return Popen(process_obj)
  File "C:\Users\TP\Miniconda3\envs\pandas-dev\lib\multiprocessing\popen_spawn_win32.py", line 46, in __init__
  File "C:\Users\TP\Miniconda3\envs\pandas-dev\lib\multiprocessing\popen_spawn_win32.py", line 46, in __init__
  File "C:\Users\TP\Miniconda3\envs\pandas-dev\lib\multiprocessing\popen_spawn_win32.py", line 46, in __init__
  File "C:\Users\TP\Miniconda3\envs\pandas-dev\lib\multiprocessing\popen_spawn_win32.py", line 46, in __init__
    prep_data = spawn.get_preparation_data(process_obj._name)
    prep_data = spawn.get_preparation_data(process_obj._name)
    prep_data = spawn.get_preparation_data(process_obj._name)
    prep_data = spawn.get_preparation_data(process_obj._name)
  File "C:\Users\TP\Miniconda3\envs\pandas-dev\lib\multiprocessing\spawn.py", line 143, in get_preparation_data
  File "C:\Users\TP\Miniconda3\envs\pandas-dev\lib\multiprocessing\spawn.py", line 143, in get_preparation_data
  File "C:\Users\TP\Miniconda3\envs\pandas-dev\lib\multiprocessing\spawn.py", line 143, in get_preparation_data
  File "C:\Users\TP\Miniconda3\envs\pandas-dev\lib\multiprocessing\spawn.py", line 143, in get_preparation_data
    _check_not_importing_main()
    _check_not_importing_main()
    _check_not_importing_main()
  File "C:\Users\TP\Miniconda3\envs\pandas-dev\lib\multiprocessing\spawn.py", line 136, in _check_not_importing_main
  File "C:\Users\TP\Miniconda3\envs\pandas-dev\lib\multiprocessing\spawn.py", line 136, in _check_not_importing_main
  File "C:\Users\TP\Miniconda3\envs\pandas-dev\lib\multiprocessing\spawn.py", line 136, in _check_not_importing_main
    _check_not_importing_main()
    is not going to be frozen to produce an executable.''')
RuntimeError:
        An attempt has been made to start a new process before the
        current process has finished its bootstrapping phase.

        This probably means that you are not using fork to start your
        child processes and you have forgotten to use the proper idiom
        in the main module:

            if __name__ == '__main__':
                freeze_support()
                ...

        The "freeze_support()" line can be omitted if the program
        is not going to be frozen to produce an executable.    is not going to be frozen to produce an executable.''')
  File "C:\Users\TP\Miniconda3\envs\pandas-dev\lib\multiprocessing\spawn.py", line 136, in _check_not_importing_main
    is not going to be frozen to produce an executable.''')

RuntimeError:
        An attempt has been made to start a new process before the
        current process has finished its bootstrapping phase.

        This probably means that you are not using fork to start your
        child processes and you have forgotten to use the proper idiom
        in the main module:

            if __name__ == '__main__':
                freeze_support()
                ...

        The "freeze_support()" line can be omitted if the program
        is not going to be frozen to produce an executable.    is not going to be frozen to produce an executable.''')
RuntimeError:
        An attempt has been made to start a new process before the
        current process has finished its bootstrapping phase.

        This probably means that you are not using fork to start your
        child processes and you have forgotten to use the proper idiom
        in the main module:

            if __name__ == '__main__':
                freeze_support()
                ...

I can't really parse what's going on, but the build seems to go back to the same RuntimeError repeatedly.

I've triaged the issue to stem from #30214. Any idea what's happening, @WillAyd ?

Workaround

I can work around the issue by building with parallelization, i.e. run python setup.py build_ext --inplace -j 0 instead of python setup.py build_ext --inplace -j 4.

Output of pd.show_versions()

INSTALLED VERSIONS

commit : 95e1a63
python : 3.7.5.final.0
python-bits : 64
OS : Windows
OS-release : 10
machine : AMD64
processor : Intel64 Family 6 Model 78 Stepping 3, GenuineIntel
byteorder : little
LC_ALL : None
LANG : None
LOCALE : None.None

pandas : 0.26.0.dev0+1358.g95e1a63dd
numpy : 1.17.4
pytz : 2019.1
dateutil : 2.8.0
pip : 19.3.1
setuptools : 42.0.2.post20191203
Cython : 0.29.13
pytest : 5.2.2
hypothesis : 4.28.2
sphinx : 2.2.0
blosc : None
feather : None
xlsxwriter : None
lxml.etree : None
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : 2.10.3
IPython : 7.6.1
pandas_datareader: None
bs4 : None
bottleneck : None
fastparquet : None
gcsfs : None
lxml.etree : None
matplotlib : 3.1.1
numexpr : 2.6.9
odfpy : None
openpyxl : 3.0.0
pandas_gbq : None
pyarrow : None
pytables : None
pytest : 5.2.2
s3fs : None
scipy : 1.3.1
sqlalchemy : None
tables : None
xarray : None
xlrd : 1.2.0
xlwt : None
xlsxwriter : None

@WillAyd
Copy link
Member

WillAyd commented Dec 19, 2019

Not sure about Windows so if you have time / interest in taking a look that would be great. The PR reference actually allows parallel builds; before -j0 and -j4 were doing the same thing (i.e. all sequential), so you can not specify it and be in the same place as before

@simonjayhawkins
Copy link
Member

Yeah, I have the same problem in Windows and am now using just python setup.py build_ext --inplace. maybe should have a doc update in the meantime.

@topper-123
Copy link
Contributor Author

I'm not too familiar with building Cython and that traceback doesn't give many clues AFAICT.

If this is a Windows problem, maybe then only pass the parallel arguments to build cython if we're not on windows?

@jbrockmendel
Copy link
Member

@scoder does the traceback here mean anything to you?

@jbrockmendel jbrockmendel added Build Library building on various platforms Windows Windows OS labels Dec 19, 2019
@scoder
Copy link

scoder commented Dec 19, 2019

@jbrockmendel no, never seen this before. Probably specific to MS-Windows. The error message seems misleading – unless there is really a kind of "freezing" (py2exe etc.) going on on the user side, which I very much doubt. But it's some kind of multiprocessing hickup.

@WillAyd
Copy link
Member

WillAyd commented Dec 19, 2019 via email

@topper-123
Copy link
Contributor Author

topper-123 commented Dec 19, 2019

I tried adding from multiprocessing import freeze_support; freeze_support() in various locations in setup.py (top level and inside maybe_cythonize). Neither worked.

@alimcmaster1
Copy link
Member

@topper-123

I can maybe help debug..
Weirdly on Win10 running python setup.py build_ext --inplace -j 4 worked fine for me.
(Tried several times to check if it was random issue - but I can't seem to reproduce.)

I might try on a PR with the change to see if we can repro on our Azure windows jobs?

INSTALLED VERSIONS
------------------
commit           : c0f6428b097f6e1a1765d8d07cb695169f442e66
python           : 3.7.5.final.0
python-bits      : 64
OS               : Windows
OS-release       : 10
machine          : AMD64
processor        : Intel64 Family 6 Model 63 Stepping 2, GenuineIntel
byteorder        : little
LC_ALL           : None
LANG             : None
LOCALE           : None.None

pandas           : 0.26.0.dev0+1206.gc0f6428b0
numpy            : 1.17.3
pytz             : 2019.3
dateutil         : 2.8.1
pip              : 19.3.1
setuptools       : 42.0.2.post20191201
Cython           : 0.29.13
pytest           : 5.3.2
hypothesis       : 4.56.3
sphinx           : 2.3.1
blosc            : None
feather          : None
xlsxwriter       : 1.2.7
lxml.etree       : 4.4.2
html5lib         : 1.0.1
pymysql          : None
psycopg2         : None
jinja2           : 2.10.3
IPython          : 7.10.2
pandas_datareader: None
bs4              : 4.8.2
bottleneck       : 1.3.1
fastparquet      : 0.3.2
gcsfs            : None
lxml.etree       : 4.4.2
matplotlib       : 3.1.2
numexpr          : 2.7.0
odfpy            : None
openpyxl         : 3.0.1
pandas_gbq       : None
pyarrow          : 0.15.1
pytables         : None
pytest           : 5.3.2
s3fs             : 0.4.0
scipy            : 1.3.1
sqlalchemy       : 1.3.12
tables           : 3.6.1
xarray           : 0.14.1
xlrd             : 1.2.0
xlwt             : 1.3.0
xlsxwriter       : 1.2.7

@WillAyd
Copy link
Member

WillAyd commented Dec 27, 2019

@alimcmaster1 can you try upgrading Cython? I think this is a noop until 0.29.14 so you might not run into it on previous versions

@WillAyd
Copy link
Member

WillAyd commented Dec 27, 2019

If it is helpful I believe in theory there is a way for the Windows compiler to perform parallel builds:

https://docs.microsoft.com/en-us/cpp/build/reference/mp-build-with-multiple-processes?view=vs-2019

How distutils maps the -j argument back to that I'm not sure, but potentially worth investigating for anyone with interest

Also note that in our own setup file we reuse the -j argument to the nthreads argument of Cythonize:

https://cython.readthedocs.io/en/latest/src/userguide/source_files_and_compilation.html#Cython.Build.cythonize

Which maybe could be the "culprit" for the issue above, and maybe that just needs to be decoupled for Windows users

@WillAyd
Copy link
Member

WillAyd commented Dec 27, 2019

So from testing I do think it's the nthread argument that isn't working on Windows:

pandas/setup.py

Line 523 in 1d36851

nthreads = 0

If this gets ignored for Windows I think the parallel build works, so could temporarily patch for now or see if there's an upstream fix for Cython

@alimcmaster1 and/or @topper-123 are you seeing the same?

@Dr-Irv
Copy link
Contributor

Dr-Irv commented Dec 31, 2019

Might be related to this issue on cython: cython/cython#3262

FWIW, I always used to use -j 4 and thought it worked. Just tried it again on a year-old pandas dev environment and discovered it was never helping!

@WillAyd
Copy link
Member

WillAyd commented Dec 31, 2019 via email

@Dr-Irv
Copy link
Contributor

Dr-Irv commented Dec 31, 2019

Here's more. The way multiprocessing works in python you have to use if __name__ == '__main__':, which would be a big change for us. See https://docs.python.org/3.7/library/multiprocessing.html?highlight=multiprocessing#multiprocessing-programming where if you scroll down it says "Make sure that the main module can be safely imported by a new Python interpreter without causing unintended side effects (such a starting a new process)." followed by "Instead one should protect the “entry point” of the program by using if __name__ == '__main__':"

I'll create a patch for Windows.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Build Library building on various platforms Windows Windows OS
Projects
None yet
Development

Successfully merging a pull request may close this issue.

8 participants