Skip to content

Fixup putting workflow #39

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 11 commits into from
May 6, 2021
Merged

Conversation

MarcoGorelli
Copy link
Contributor

@MarcoGorelli MarcoGorelli commented Feb 23, 2021

fixing it up to use inference data and ArviZ

Addresses issue #50.

@review-notebook-app
Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@twiecki
Copy link
Member

twiecki commented Mar 28, 2021

Is it worth to resolve these conflicts? The changes here make the NB better.

@MarcoGorelli
Copy link
Contributor Author

Thanks! I'll fix it up, hope to get back to this soon-ish

@OriolAbril
Copy link
Member

I would rather have this changes than the current ones, if it's easier I think you can just ignore the current changes and use your version.

@review-notebook-app
Copy link

review-notebook-app bot commented Mar 29, 2021

View / edit / reply to this conversation on ReviewNB

OriolAbril commented on 2021-03-29T01:44:26Z
----------------------------------------------------------------

I would suggest using xarray for computation here, I think it will make calculations more explicit and clear:

t = xr.DataArray(
    np.linspace(0, golf_data.distance.max(), 200),
    dims=["distance"]
)
logit_trace_combined = logit_trace.posterior.stack(sample=("chain", "draw"))
expit = xr.apply_ufunc(
    scipy.special.expit,
    logit_trace_combined["a"] * t + logit_trace_combined["b"],
)

ax.plot(
    t,
    expit.isel(sample=np.random.randint(0, expit.dims["sample"], 50)),
    lw=1,
    color="C1",
    alpha=0.5,
)
ax.plot(t, expit.mean("sample"), color="C2",)

I have not tried the code, but something like that should work, matplotlib can take dataarrays the same way it can take pandas series as if they were numpy arrays, and it accepts 2d arrays to plot multiple lines all at the same time too (maybe a transpose is needed though). Also, better names are probably needed.

And last, expit.dims["sample"] will return an integer, because xarray dimensions have a name and a length only. Then there are the coordinates that have name, values and are linked to a dimension. In most ArviZ cases, dims and coordinates share the name, so the distinction is not really clear, i.e. len(expit.sample) will give the same result, but in general xarray, there can be dimensions without coordinate values, multiple coordinates for the same dimension, coordinates that index multiple dimensions and other crazy combinations.



MarcoGorelli commented on 2021-04-07T14:25:20Z
----------------------------------------------------------------

@OriolAbril I tried this, but expit.dims returns ('sample', 'distance')for me, I didn't see

And last, expit.dims["sample"] will return an integer,

OriolAbril commented on 2021-04-07T15:02:55Z
----------------------------------------------------------------

True, my bad completely. the .dims["dim_name"] only works on datasets, and that is a DataArray. For dataarrays we have to stick to len(expit.sample)

OriolAbril commented on 2021-04-07T15:06:02Z
----------------------------------------------------------------

Actually, side note, the sample in expit.dims["sample"] is a dimension name, whereas the sample in len(expit.sample) is a coordinate. In ArviZ we always index dimensions with coordinates that have the same name, but in xarray dims and the coords that index them don't need to be the same, in fact, dimensions don't even need to have a coordinate that indexs them at all.

@review-notebook-app
Copy link

View / edit / reply to this conversation on ReviewNB

OriolAbril commented on 2021-03-29T01:44:26Z
----------------------------------------------------------------

we could add coordinates when creating t to convert this into a slicing expit problem


@MarcoGorelli MarcoGorelli requested a review from twiecki April 4, 2021 10:00
@MarcoGorelli
Copy link
Contributor Author

MarcoGorelli commented Apr 4, 2021

I would suggest using xarray for computation here

Thanks for your suggestion! I haven't yet really made sense of xarray and its magic, so for now I've just resolved the conflicts (i.e. git merge upstream/main --strategy-option ours - I'd repeated many of the changes from the sd->sigma PR here anyway)

EDIT

hoping to address the xarray comments next week

@MarcoGorelli MarcoGorelli removed the request for review from twiecki April 7, 2021 14:23
Copy link
Contributor Author

@OriolAbril I tried this, but expit.dims returns ('sample', 'distance')for me, I didn't see

And last, expit.dims["sample"] will return an integer,

View entire conversation on ReviewNB

Copy link
Member

True, my bad completely. the .dims["dim_name"] only works on datasets, and that is a DataArray. For dataarrays we have to stick to len(expit.sample)


View entire conversation on ReviewNB

Copy link
Member

Actually, side note, the sample in expit.dims["sample"] is a dimension name, whereas the sample in len(expit.sample) is a coordinate. In ArviZ we always index dimensions with coordinates that have the same name, but in xarray dims and the coords that index them don't need to be the same, in fact, dimensions don't even need to have a coordinate that indexs them at all.


View entire conversation on ReviewNB

@MarcoGorelli MarcoGorelli marked this pull request as draft April 23, 2021 11:23
@MarcoGorelli
Copy link
Contributor Author

MarcoGorelli commented Apr 23, 2021

Something I don't understand about this notebook is that there's a cell which says

The lesson from this is that

$$ \mathbb{E}[f(\theta)] \ne f(\mathbb{E}[\theta]). $$

this appeared here in using

# Right!
scipy.special.expit(logit_trace_combined["a"] * 50 + logit_trace_combined["b"]).mean()

rather than

# Wrong!
scipy.special.expit(logit_trace_combined["a"].mean() * 50 + logit_trace_combined["b"].mean())

to calculate our expectation at 50 feet.

But then, it proceeds to do

ax.plot(t, forward_angle_model(angle_trace["variance_of_shot"].mean(), t))

However, based on the above, I think I would've expected something like

ax.plot(t, forward_angle_model(angle_trace["variance_of_shot"], t).mean())

@MarcoGorelli MarcoGorelli marked this pull request as ready for review April 23, 2021 16:17
@MarcoGorelli MarcoGorelli requested a review from OriolAbril April 23, 2021 16:17
@MarcoGorelli
Copy link
Contributor Author

MarcoGorelli commented Apr 23, 2021

not sure why the diff isn't being picked up nicely by review-nb, will take another look later

EDIT

solved, just had to remove the metadata added when running this on Kaggle. Will check over this again tomorrow

@review-notebook-app
Copy link

View / edit / reply to this conversation on ReviewNB

MarcoGorelli commented on 2021-04-23T20:12:44Z
----------------------------------------------------------------

IMO the correlation plot is a bit of a distraction here, so I've removed it


@MarcoGorelli MarcoGorelli requested a review from twiecki April 23, 2021 20:28
@OriolAbril
Copy link
Member

not sure why the diff isn't being picked up nicely by review-nb, will take another look later

Don't worry about it, it happens quite often (at least it has happened quite often in PRs I review) reviewnb just misses the point completely from time to time and thinks the whole notebook was updated.

@OriolAbril
Copy link
Member

However, based on the above, I think I would've expected something like

ax.plot(t, forward_angle_model(angle_trace["variance_of_shot"], t).mean())

I don't know what forward_angle_model does, but if this work we should use this and always leave the averaging for last.

@review-notebook-app
Copy link

View / edit / reply to this conversation on ReviewNB

OriolAbril commented on 2021-04-24T17:22:05Z
----------------------------------------------------------------

The first line doesn't look like it has any effect.


@review-notebook-app
Copy link

View / edit / reply to this conversation on ReviewNB

OriolAbril commented on 2021-04-24T17:22:06Z
----------------------------------------------------------------

The uncombined trace can be used here, everything should still work the same. I also believe that the float(xarray_object) can be changed to xarray_object.item().

I prefer using the uncombined trace because then below we can also use the uncombined trace and call mean(("chain", "draw")) to show how to reduce after operating. This should work in all cases, independently of the number and order of dimensions of logit_trace. Precisely because xarray doesn't care about the number or order of dimensions, everything that can be done with chain, draw can also be done with sample as stacked dim with nearly the same code, only reductions need to change the dim they use. But there is an important setback to stacking which is that multiindex can't be stored as netcdf, so I think we should recommend working with chain, draw and stack only when necessary (like above to get random subsamples)


@review-notebook-app
Copy link

View / edit / reply to this conversation on ReviewNB

OriolAbril commented on 2021-04-24T17:22:07Z
----------------------------------------------------------------

I don't know if scipy.stats functions act as ufuncs, but if so we can use xarray to broadcast and avoid any loops. It is fine either way. I have added a comment on ArviZ to look into that and try to provide a good interface, maybe wrapping them ourselves with xarray.apply_ufunc.


@MarcoGorelli
Copy link
Contributor Author

MarcoGorelli commented Apr 25, 2021

Thanks @OriolAbril for your really helpful reviews!

@OriolAbril OriolAbril merged commit dda021c into pymc-devs:main May 6, 2021
@MarcoGorelli MarcoGorelli deleted the fixup-putting branch May 6, 2021 15:42
twiecki pushed a commit that referenced this pull request Jan 17, 2023
* Add pre-commit config

Closes #39

* Apply pre-commit formatting
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants