Skip to content

issue #5791, dims & cords inference from xarray #6514

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

michaelraczycki
Copy link
Contributor

added dim inference from xarray, deprecation warning and unittest for the new feature

What is this PR about?
Addresses changes requested in Issue #5791

  • Inferring dims from named pandas indexes
  • Inferring dims from DataArrays
  • Inferring coords from DataArrays
  • A unit test
  • Backwards compatible with deprecation warning rename of the export_index_as_coords to infer_dims_and_coords.

New features

  • It's now possible to infer dims & coords from xarray variables passed to pm.Data

Documentation

  • inline documentation added to data.py

@michaelraczycki
Copy link
Contributor Author

Do you have any tips on how to make the build pass? It seems really not descriptive for someone who runs it first time, and in the documentation I can't find anything to pin the reason for it failing. I've run the pre-commit checks and it seems fine:
Zrzut ekranu 2023-02-12 o 13 33 53

@codecov
Copy link

codecov bot commented Feb 13, 2023

Codecov Report

Merging #6514 (4038a7d) into main (28bac77) will decrease coverage by 12.20%.
The diff coverage is 76.47%.

Additional details and impacted files

Impacted file tree graph

@@             Coverage Diff             @@
##             main    #6514       +/-   ##
===========================================
- Coverage   94.74%   82.54%   -12.20%     
===========================================
  Files         146      147        +1     
  Lines       27807    27951      +144     
===========================================
- Hits        26346    23073     -3273     
- Misses       1461     4878     +3417     
Impacted Files Coverage Δ
pymc/data.py 90.17% <76.47%> (-1.60%) ⬇️
pymc/tests/gp/test_gp.py 0.00% <0.00%> (-100.00%) ⬇️
pymc/tests/gp/test_mean.py 0.00% <0.00%> (-100.00%) ⬇️
pymc/tests/gp/test_util.py 0.00% <0.00%> (-100.00%) ⬇️
pymc/tests/smc/test_smc.py 0.00% <0.00%> (-100.00%) ⬇️
pymc/tests/step_methods/test_slicer.py 0.00% <0.00%> (-100.00%) ⬇️
pymc/tests/step_methods/hmc/test_nuts.py 0.00% <0.00%> (-100.00%) ⬇️
pymc/tests/step_methods/test_compound.py 0.00% <0.00%> (-100.00%) ⬇️
pymc/tests/step_methods/test_metropolis.py 0.00% <0.00%> (-100.00%) ⬇️
pymc/tests/gp/test_cov.py 0.00% <0.00%> (-99.82%) ⬇️
... and 31 more

Copy link
Member

@twiecki twiecki left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

        split_key = key.split(".", 1)
        if len(split_key) != 2:
>           raise KeyError(key)
E           KeyError: 'dim'

../../../miniconda3/envs/pymc-test/lib/python3.9/site-packages/xarray/core/dataset.py:185: KeyError

@michaelraczycki
Copy link
Contributor Author

oh, good catch. It seems like I put the quotes around dim in line 241 by mistake

@michaelraczycki
Copy link
Contributor Author

@twiecki , it seems like my expectations on how the coords values should look like aren't good, could you check if I'm going in the right direction? I'm not sure if my initial assumption is correct and there's a bug I need to find, or the assumption is wrong and I should change it

@twiecki
Copy link
Member

twiecki commented Feb 15, 2023

CC @michaelosthege

@michaelosthege
Copy link
Member

Best run mypy locally by

pip install types-cachetools "typing-extensions>=3.7.4" "mypy==0.990"
python scripts/run_mypy.py --verbose

@michaelraczycki
Copy link
Contributor Author

Can I change the type hinting within the Data.coords? It expects to have the keys with type Sequence, but unfortunately ndarray from numpy acts 1:1 like a sequence, but it doesn't inherit from Sequence, therefore it's not of a Sequence type.
If you really want to keep the sequence type check, I suggest using coords: Optional[Dict[str, Union(Sequence, np.ndarray)]], that and adjusting the type hinting for return value for determine_coords locally fixed mypy complaining

@twiecki
Copy link
Member

twiecki commented Feb 22, 2023

@michaelosthege Can we merge this?

@twiecki twiecki merged commit 562e51e into pymc-devs:main Feb 22, 2023
@twiecki
Copy link
Member

twiecki commented Feb 22, 2023

Thanks @michaelraczycki

@michaelraczycki michaelraczycki deleted the infer_dims_coords_from_xarray_issue_5791 branch February 23, 2023 14:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants