Skip to content

explicitly set 'include' to numpy_incls #18112

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 11 commits into from
Nov 16, 2017
Merged

Conversation

jbrockmendel
Copy link
Member

  • closes #xxxx
  • tests added / passed
  • passes git diff upstream/master -u -- "*.py" | flake8 --diff
  • whatsnew entry

@pep8speaks
Copy link

pep8speaks commented Nov 4, 2017

Hello @jbrockmendel! Thanks for updating the PR.

Cheers ! There are no PEP8 issues in this Pull Request. 🍻

Comment last updated on November 13, 2017 at 14:19 Hours UTC

@@ -460,6 +460,13 @@ def pxd(name):
return os.path.abspath(pjoin('pandas', name + '.pxd'))


if _have_setuptools:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this whole setuptools optional biz is a mess. we should just require it (and we do in our recipes, so this is all old code). note nothing to do with this PR.

@jreback
Copy link
Contributor

jreback commented Nov 4, 2017

this is ok..

@jreback jreback added the Build Library building on various platforms label Nov 4, 2017
@jreback jreback added this to the 0.22.0 milestone Nov 4, 2017
setup.py Outdated
@@ -570,14 +580,16 @@ def pxd(name):
'_libs.tslibs.timezones': {
'pyxfile': '_libs/tslibs/timezones'},
'_libs.testing': {
'pyxfile': '_libs/testing'},
'pyxfile': '_libs/testing',
'include': []},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is the difference betwee not passing include and an empty list? (from a compilation perspective)?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this can be removed IINM.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Without the 'include' key, it gets set to a default of common_include. This has two unwanted side effects. First: the search path for numpy now includes src/numpy.pxd which we should move away from. Second: making it explicit avoids a problem a year from now when we see that common_include is added to _libs.testing but no one remembers that it isn't actually needed.

@codecov
Copy link

codecov bot commented Nov 5, 2017

Codecov Report

Merging #18112 into master will increase coverage by <.01%.
The diff coverage is n/a.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #18112      +/-   ##
==========================================
+ Coverage   91.25%   91.26%   +<.01%     
==========================================
  Files         163      163              
  Lines       50123    50124       +1     
==========================================
+ Hits        45741    45745       +4     
+ Misses       4382     4379       -3
Flag Coverage Δ
#multiple 89.07% <ø> (+0.02%) ⬆️
#single 40.32% <ø> (-0.06%) ⬇️
Impacted Files Coverage Δ
pandas/io/gbq.py 25% <0%> (-58.34%) ⬇️
pandas/core/frame.py 97.75% <0%> (-0.1%) ⬇️
pandas/core/indexes/timedeltas.py 91.19% <0%> (ø) ⬆️
pandas/core/indexes/datetimes.py 95.51% <0%> (ø) ⬆️
pandas/plotting/_converter.py 65.2% <0%> (+1.81%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 3268f4e...37ad0f2. Read the comment docs.

@codecov
Copy link

codecov bot commented Nov 5, 2017

Codecov Report

Merging #18112 into master will increase coverage by <.01%.
The diff coverage is n/a.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #18112      +/-   ##
==========================================
+ Coverage   91.38%   91.38%   +<.01%     
==========================================
  Files         164      164              
  Lines       49884    49884              
==========================================
+ Hits        45584    45587       +3     
+ Misses       4300     4297       -3
Flag Coverage Δ
#multiple 89.19% <ø> (+0.02%) ⬆️
#single 39.42% <ø> (-0.07%) ⬇️
Impacted Files Coverage Δ
pandas/io/gbq.py 25% <0%> (-58.34%) ⬇️
pandas/core/frame.py 97.8% <0%> (-0.1%) ⬇️
pandas/plotting/_converter.py 65.25% <0%> (+1.81%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 7f4c960...0274152. Read the comment docs.

@jreback
Copy link
Contributor

jreback commented Nov 5, 2017

rebase

@jbrockmendel
Copy link
Member Author

How did this happen in master:

    '_libs.tslibs.parsing': {
        'pyxfile': '_libs/tslibs/parsing',
        'pxdfiles': ['_libs/src/util',
                     '_libs/src/khash']},

?

tslibs.parsing doesnt use either of those.

@jbrockmendel
Copy link
Member Author

    '_libs.tslibs.timedeltas': {
        'pyxfile': '_libs/tslibs/timedeltas',
        'pxdfiles': ['_libs/src/util'],
        'depends': np_datetime_headers,
        'sources': np_datetime_sources},

nope on the depends and sources.

@jbrockmendel
Copy link
Member Author

    '_libs.tslibs.strptime': {
        'pyxfile': '_libs/tslibs/strptime',
        'pxdfiles': ['_libs/src/util',
                     '_libs/tslibs/nattype'],
        'depends': tseries_depends,
        'sources': np_datetime_sources},

nope on depends and sources, missing np_datetime

@jbrockmendel
Copy link
Member Author

    '_libs.index': {
        'pyxfile': '_libs/index',
        'pxdfiles': ['_libs/src/util', '_libs/hashtable'],
        'depends': _pxi_dep['index'],
        'sources': np_datetime_sources},

Nope on the sources

@jreback
Copy link
Contributor

jreback commented Nov 5, 2017

np_datetime is now part of tseries_depends

setup.py Outdated
@@ -576,31 +585,29 @@ def pxd(name):
'_libs/tslibs/conversion']},
'_libs.tslibs.parsing': {
'pyxfile': '_libs/tslibs/parsing',
'pxdfiles': ['_libs/src/util',
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pls don’t change this; if u look in the source these depend on these

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we looking at different source files? tslibs.parsing doesn't cimport from util or khash.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you are right here

setup.py Outdated
'_libs.tslibs.strptime': {
'pyxfile': '_libs/tslibs/strptime',
'pxdfiles': ['_libs/src/util',
'_libs/tslibs/nattype'],
'depends': tseries_depends,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tseries_depends already incldes np_datetime, so no need to add this explicty.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think this one needs to change back

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From the diff it isn't clear which file you're referring to. Best guess is tslibs.strptime?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for tslibs.strptime, still needs to have tseries_depends, np? (e.g. np_datetime is there)

setup.py Outdated
'_libs.tslibs.timedeltas': {
'pyxfile': '_libs/tslibs/timedeltas',
'pxdfiles': ['_libs/src/util'],
'depends': np_datetime_headers,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same.

@jbrockmendel
Copy link
Member Author

It's almost as if dumping a bunch of unnecessary stuff into setup.py made it easier for people in new and exciting ways.

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you need to rebase. some of your comments are on old sources.

setup.py Outdated
'_libs.tslibs.strptime': {
'pyxfile': '_libs/tslibs/strptime',
'pxdfiles': ['_libs/src/util',
'_libs/tslibs/nattype'],
'depends': tseries_depends,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tseries_depends already incldes np_datetime, so no need to add this explicty.

setup.py Outdated
@@ -576,31 +585,29 @@ def pxd(name):
'_libs/tslibs/conversion']},
'_libs.tslibs.parsing': {
'pyxfile': '_libs/tslibs/parsing',
'pxdfiles': ['_libs/src/util',
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you are right here

@jreback
Copy link
Contributor

jreback commented Nov 7, 2017

this looks ok, but why is this necessary? what is wrong with the current?

@jbrockmendel
Copy link
Member Author

this looks ok, but why is this necessary? what is wrong with the current?

Dependencies are over-specified. For extensions that just have include=numpy_incls, it is explicitly saying "this is not tightly coupled"

Also I really don't like having both numpy_incl and src/numpy.pxd on the search path.

@jreback
Copy link
Contributor

jreback commented Nov 7, 2017

Dependencies are over-specified. For extensions that just have include=numpy_incls, it is explicitly saying "this is not tightly coupled"

can you show how you determine this?

@jbrockmendel
Copy link
Member Author

can you show how you determine this?

I look through the module for cimports. If the only non-cython/cpython cimports it uses are numpy, then it has no pandas c-deps and the build only needs np.get_include().

setup.py Outdated
'_libs.tslibs.strptime': {
'pyxfile': '_libs/tslibs/strptime',
'pxdfiles': ['_libs/src/util',
'_libs/tslibs/nattype'],
'depends': tseries_depends,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think this one needs to change back

@@ -578,14 +587,12 @@ def pxd(name):
'_libs/tslibs/frequencies']},
'_libs.tslibs.parsing': {
'pyxfile': '_libs/tslibs/parsing',
'pxdfiles': ['_libs/src/util',
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

side note, should parsing depend on nattype? (then for example nat strings are in 1 place).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well nat_strings is defined but not c-defined in nattype, so we could import it into parsing without needing to specify the dependency.

I'd be fine with that, or with having it defined one extra place. parsing has no intra-pandas dependencies at the moment (neither import nor cimport) which makes it really low-stress.

setup.py Outdated
'_libs.tslibs.strptime': {
'pyxfile': '_libs/tslibs/strptime',
'pxdfiles': ['_libs/src/util',
'_libs/tslibs/nattype'],
'depends': tseries_depends,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for tslibs.strptime, still needs to have tseries_depends, np? (e.g. np_datetime is there)

@jbrockmendel
Copy link
Member Author

for tslibs.strptime, still needs to have tseries_depends, np? (e.g. np_datetime is there)

I thought we landed on not specifying recursive dependencies. No?

@jreback
Copy link
Contributor

jreback commented Nov 11, 2017

I thought we landed on not specifying recursive dependencies. No?

you mean transitive I suppose.

well this is a c-include and is needed for compilation. compile it in isolation and see if it works when you don't have these. anywhere we are implicity or explicity including np_datetime which is almost everywhere in tslibs these need to be specified, unless you can prove unambiguously that they are not needed.

@jreback
Copy link
Contributor

jreback commented Nov 11, 2017

I still also dont' understand why this is actually needed. what is broken if this is not included (the status quo)?

@jbrockmendel
Copy link
Member Author

you mean transitive I suppose.

Sure. To be extra-clear: if A cimports from B and B cimports or "cdef extern"s from C, then A transitively/recursively depends on C, but does not directly depend on C. Same page?

@jbrockmendel
Copy link
Member Author

anywhere we are implicity or explicity including np_datetime which is almost everywhere in tslibs these need to be specified,

My understanding was that the policy is to specify only direct dependencies. Otherwise, say, why aren't hashtable's dependencies specified in all of the other extensions that cimport from hashtable?

@jbrockmendel
Copy link
Member Author

well this is a c-include and is needed for compilation. compile it in isolation and see if it works when you don't have these.

Yah this gets back to the original thing of me not understanding why you wanted to add a bunch of junk to these specifications in the first place. As long as direct dependencies are listed it compiles fine. (wouldn't the CI here scoff otherwise?)

@jbrockmendel
Copy link
Member Author

unless you can prove unambiguously that they are not needed.

https://groups.google.com/forum/#!topic/cython-users/YdGpGqi6Qw4

@jbrockmendel
Copy link
Member Author

jbrockmendel commented Nov 11, 2017

I still also dont' understand why this is actually needed. what is broken if this is not included (the status quo)?

The problem is that too much is specified, and it is specified in too many places, which is a recipe for confusion.

Explicitly setting the include key to either [] or numpy_incls where possible pins that down to the minimal viable set of dependencies.

@jbrockmendel
Copy link
Member Author

The numpy_incls part of the PR is largely orthogonal to the discussion about transitive dependencies. How about I revert the changes that affect transitive dependencies and we can address that elsewhere/elsewhen?

@jbrockmendel
Copy link
Member Author

Just pushed a commit reverting the subset of changes on which there has been controversy.

Now this just sets the include key to either [] or numpy_incls where applicable, ignores specification (or possible over-specification) of transitive dependencies.

@jbrockmendel
Copy link
Member Author

Clipboard.

@jreback
Copy link
Contributor

jreback commented Nov 12, 2017

As long as direct dependencies are listed it compiles fine. (wouldn't the CI here scoff otherwise?)

well the CI won't complain, but building from a previous state, DOES complain. I don't want to have to constantly make clean; the current building is actually nice, if a .pxd changes or a dep changes, then it only recompiles what is necessary.

So back to this PR. I don't see any effect (pos or neg), so what exactly is symptom that this is trying to fix?

@jreback jreback removed this from the 0.22.0 milestone Nov 12, 2017
@jbrockmendel
Copy link
Member Author

So back to this PR. I don't see any effect (pos or neg), so what exactly is symptom that this is trying to fix?

Specifying dependencies (or "includes") which the relevant extensions are not actually dependent on is an anti-pattern. This PR fixes that for the subset of pyx files that have zero intra-pandas dependencies.

@jbrockmendel
Copy link
Member Author

On green let’s either merge or kill, avoid clogging the CI queue. Especially in light of reopening the depends issue, I think this is worthwhile.

@jbrockmendel
Copy link
Member Author

Doing some googling on what affects compile/build times, one thing that came up frequently was unnecessary includes. No idea how applicable it is here, but its plausible that trimming unnecessary includes and sources could cut build time.

@jreback jreback added this to the 0.22.0 milestone Nov 16, 2017
@jreback jreback merged commit 54f2a5e into pandas-dev:master Nov 16, 2017
@jreback
Copy link
Contributor

jreback commented Nov 16, 2017

ok thanks. We do need to refactor setup.py to use cythonize that will reduce the code a lot, while preserving the dependencies (w/o us having to do anything).

@jreback
Copy link
Contributor

jreback commented Nov 19, 2017

so I am pretty sure this broke our wheel building for 2.7, see here: https://travis-ci.org/MacPython/pandas-wheels/jobs/304148523

this is not easy to try a fix, and we have to commit it and see (though you could try forking that repo if you want).

jreback added a commit to jreback/pandas that referenced this pull request Nov 19, 2017
@jbrockmendel
Copy link
Member Author

Huh.

ImportError: No module named _build_utils.apple_accelerate

jreback added a commit that referenced this pull request Nov 19, 2017
@jbrockmendel jbrockmendel deleted the np_deps2 branch December 8, 2017 19:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Build Library building on various platforms
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants