Skip to content

CI/COMPAT: read_stata failing in numpy dev pipeline #35426

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
3 tasks done
AlexKirko opened this issue Jul 27, 2020 · 7 comments · Fixed by #35427
Closed
3 tasks done

CI/COMPAT: read_stata failing in numpy dev pipeline #35426

AlexKirko opened this issue Jul 27, 2020 · 7 comments · Fixed by #35427
Labels
CI Continuous Integration Compat pandas objects compatability with Numpy or Python functions IO Stata read_stata, to_stata

Comments

@AlexKirko
Copy link
Member

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • (optional) I have confirmed this bug exists on the master branch of pandas.


Problem description

Looks like a change in numpy broke our read_stata code, so now we have a bunch of read_stata test errors popping up during CI in the numpy_dev pipeline. Maybe this Numpy PR somehow broke things? I don't think any other PR merged during the last 24 hours could have possibly done so, although I might be missing something.

Error example:

pandas/tests/io/test_stata.py:52: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
pandas/io/stata.py:1928: in read_stata
    data = reader.read()
pandas/io/stata.py:1646: in read
    cols_ = np.where(self.dtyplist)[0]
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

args = ([<class 'numpy.int8'>, <class 'numpy.int16'>, <class 'numpy.int32'>, <class 'numpy.float32'>, <class 'numpy.float64'>, <class 'numpy.float32'>, ...],)
kwargs = {}
relevant_args = ([<class 'numpy.int8'>, <class 'numpy.int16'>, <class 'numpy.int32'>, <class 'numpy.float32'>, <class 'numpy.float64'>, <class 'numpy.float32'>, ...], None, None)

>   ???
E   ValueError: invalid __array_struct__
@AlexKirko AlexKirko added Bug Needs Triage Issue that has not been reviewed by a pandas team member CI Continuous Integration Compat pandas objects compatability with Numpy or Python functions and removed Needs Triage Issue that has not been reviewed by a pandas team member labels Jul 27, 2020
@AlexKirko AlexKirko changed the title CI/COMPAT: numpy dev pipeline tests failing CI/COMPAT: read_stata failing in numpy dev pipeline Jul 27, 2020
@AlexKirko AlexKirko removed the Bug label Jul 27, 2020
@jreback
Copy link
Contributor

jreback commented Jul 27, 2020

cc @bashtage

@bashtage
Copy link
Contributor

That is a NumPy bug. numpy/numpy#16939

bashtage added a commit to bashtage/pandas that referenced this issue Jul 27, 2020
Avoid creating an array of dtypes to workaround NumPy future change

closes pandas-dev#35426
@bashtage
Copy link
Contributor

Fixed it anyway. No reason to create an array of dtypes.

bashtage added a commit to bashtage/pandas that referenced this issue Jul 27, 2020
Avoid creating an array of dtypes to workaround NumPy future change

closes pandas-dev#35426
@simonjayhawkins simonjayhawkins added the IO Stata read_stata, to_stata label Jul 27, 2020
@simonjayhawkins simonjayhawkins added this to the 1.1 milestone Jul 27, 2020
@AlexKirko
Copy link
Member Author

AlexKirko commented Jul 27, 2020

@bashtage , thanks!
Looks like there are also a bunch of other places where we create an array of dtypes and then assign them to a numpy array with something like result[:] = values.

@simonjayhawkins simonjayhawkins modified the milestones: 1.1, 1.1.1 Jul 27, 2020
@martindurant martindurant mentioned this issue Jul 27, 2020
5 tasks
@AlexKirko
Copy link
Member Author

AlexKirko commented Jul 28, 2020

should we alter this issue to incorporate other test fails (basically anywhere we create a list of dtypes and assign it to a numpy array) or open a new issue?
Nevermind, I somehow missed that it's basically an upstream bug and not intended numpy behavior. numpy/numpy#16941 should fix this. Thanks, @TomAugspurger !

@TomAugspurger
Copy link
Contributor

We've already opened an issue upstream: numpy/numpy#16939

bashtage added a commit to bashtage/pandas that referenced this issue Jul 29, 2020
Avoid creating an array of dtypes to workaround NumPy future change

closes pandas-dev#35426
@simonjayhawkins simonjayhawkins removed this from the 1.1.1 milestone Jul 31, 2020
@simonjayhawkins
Copy link
Member

closing in favour of #35481 as Stata tests no longer failing

bashtage added a commit to bashtage/pandas that referenced this issue Aug 2, 2020
Avoid creating an array of dtypes to workaround NumPy future change

closes pandas-dev#35426
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CI Continuous Integration Compat pandas objects compatability with Numpy or Python functions IO Stata read_stata, to_stata
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants