Skip to content

Remove pymc.* and pymc3.* tags from left bar on website #374

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
drbenvincent opened this issue Jun 10, 2022 · 14 comments
Closed

Remove pymc.* and pymc3.* tags from left bar on website #374

drbenvincent opened this issue Jun 10, 2022 · 14 comments

Comments

@drbenvincent
Copy link
Contributor

Do we have agreement that removing these tags from the left of the site is a good idea?
Screenshot 2022-06-10 at 13 27 30

If so, would that be done with a PR which manually removes all such tags from all example notebooks? Or would it be done by some clever filtering web code in https://github.com/pymc-devs/pymc.io ?

If it's the former then I can probably do that. If it's the latter than I can't, or would need some help.

@OriolAbril
Copy link
Member

Yes, they all need to be removed. I am not sure about a PR removing them all at once, there are many notebooks with open PRs to update them to v4. Creating git conflicts for this seems too much trouble for what willbe getting fixed.

I'd suggest reviewing all open PRs to make sure if they are editing a notebook with those tags they get removed and a PR removing them for those without open PRs

@drbenvincent
Copy link
Contributor Author

drbenvincent commented Jun 10, 2022

Have been through all open PR's and requested removal of those tags. I will try to find time for more thorough reviews to help push them along.

@twiecki
Copy link
Member

twiecki commented Jun 10, 2022

We could probably write a small script to do them all.

@drbenvincent
Copy link
Contributor Author

drbenvincent commented Jun 11, 2022

New at this, but nearly there

import os
import codecs
import re
from copy import copy


path = "examples"
exclude_directories = set(['.ipynb_checkpoints'])
filelist = []
for root, dirs, files in os.walk(path):
    dirs[:] = [d for d in dirs if d not in exclude_directories]
    for file in files:
        file_extension = os.path.splitext(file)[1]
        if file_extension == ".ipynb":
            filelist.append(os.path.join(root,file))
        
        
for file in filelist:
    print(f"\n{file}")
    target = None
    
    # read file
    f = codecs.open(file, 'r')
    source = f.read()
    
    # extract tags text
    target = re.findall(r'":tags:(.+?)\\n', source)
    
    if len(target) == 0:
        print("\tno tags found")
        continue
        
    target = target[0] + ","
    print(f"\tTAGS FOUND: {target}")
    
    # extract tags to remove
    kill_these = re.findall(r'(pymc)(.+?,)', target)
    kill_these = [''.join(tups) for tups in kill_these]
    if len(kill_these) == 0:
        print("\tno changes to be made")
        continue
        
    print(f"\tTAGS TO REMOVE: {kill_these}")
    
    new = copy(target)
    for to_kill in kill_these:
        new = new.replace(to_kill, "")
        
    print(f"\t REPLACE WITH: {new}")
    
    # replace
    source = source.replace(target[:-1], new)
    print(source[:500])
    
    # TODO: write to file here

@drbenvincent
Copy link
Contributor Author

So this seems like a better approach, and that it should work.

import json
import os
    
path = "examples"
exclude_directories = set(['.ipynb_checkpoints'])
filelist = []
for root, dirs, files in os.walk(path):
    dirs[:] = [d for d in dirs if d not in exclude_directories]
    for file in files:
        file_extension = os.path.splitext(file)[1]
        if file_extension == ".ipynb":
            filelist.append(os.path.join(root,file))

for file in filelist:
    with open(file, 'r+') as openfile:
        json_object = json.load(openfile)

        for i, string in enumerate(json_object['cells'][0]['source']):
            if string.startswith(":tags:"):
                elements = string.split()
                filtered_elements = [token for token in elements if not token.startswith('pymc')]
                new = " ".join(filtered_elements)
                # update the dictionary
                json_object['cells'][0]['source'][i] = new
                # write out to disk
                json.dump(json_object, openfile)
                # end the iteration
                break

But the resulting .ipynb files are not readable. Error when trying to open an edited file in Jupyter lab:

Unreadable Notebook: /Users/benjamv/git/pymc-examples/examples/generalized_linear_models/GLM-binomial-regression.ipynb NotJSONError('Notebook does not appear to be JSON: \'{\\n "cells": [\\n {\\n "cell_type": "m...')

Any ideas what's going on here?

@michaelosthege
Copy link
Member

Can you post a few lines of before/after JSON?

Just a rough guess, but from looking at the code this could be a lot easier to do with RegEx via re.sub(pattern, replacement, text).

Otherwise, if the JSON looks fine visually, double check the encoding. Jupyter is a little picky about that.

@drbenvincent
Copy link
Contributor Author

drbenvincent commented Jun 11, 2022

start

{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "domestic-remove",
   "metadata": {},
   "source": [
    "(GLM-binomial-regression)=\n",
    "# Binomial regression\n",
    "\n",
    ":::{post} February, 2022\n",
    ":tags: binomial regression, generalized linear model, pymc.Binomial, pymc.ConstantData, pymc.Deterministic, pymc.Model, pymc.Normal, pymc3.Binomial, pymc3.ConstantData, pymc3.Deterministic, pymc3.Model, pymc3.Normal\n",
    ":category: beginner\n",
    ":author: Benjamin T. Vincent\n",
    ":::"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "72588976-efc3-4adc-bec2-bc5b6ac4b7e1",
   "metadata": {},
   "source": [
    "This notebook covers the logic behind [Binomial regression](https://en.wikipedia.org/wiki/Binomial_regression), a specific instance of Generalized Linear Modelling. The example is kept very simple, with a single predictor variable. \n",
    "\n",

end

"source": ["## Watermark"]}, {"cell_type": "code", "execution_count": 11, "id": "sound-calculation", "metadata": {"tags": []}, "outputs": [{"name": "stdout", "output_type": "stream", "text": ["Last updated: Sun Feb 06 2022\n", "\n", "Python implementation: CPython\n", "Python version       : 3.9.9\n", "IPython version      : 7.31.0\n", "\n", "aesara: 2.3.2\n", "aeppl : 0.0.18\n", "\n", "arviz     : 0.11.4\n", "pymc      : 4.0.0b1\n", "pandas    : 1.4.0\n", "matplotlib: 3.4.3\n", "numpy     : 1.21.5\n", "\n", "Watermark: 2.3.0\n", "\n"]}], "source": ["%load_ext watermark\n", "%watermark -n -u -v -iv -w -p aesara,aeppl"]}, {"cell_type": "markdown", "id": "1e4386fc-4de9-4535-a160-d929315633ef", "metadata": {}, "source": [":::{include} ../page_footer.md :::"]}], "metadata": {"kernelspec": {"display_name": "pymc-dev-py39", "language": "python", "name": "pymc-dev-py39"}, "language_info": {"codemirror_mode": {"name": "ipython", "version": 3}, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.9.9"}}, "nbformat": 4, "nbformat_minor": 5}

@drbenvincent
Copy link
Contributor Author

Just a rough guess, but from looking at the code this could be a lot easier to do with RegEx via re.sub(pattern, replacement, text).

Quite possibly, but I'm not that experienced with regular expressions.

Otherwise, if the JSON looks fine visually, double check the encoding. Jupyter is a little picky about that.

Will take a look

@michaelosthege
Copy link
Member

It might be enough to do .replace("pymc.", "").replace("pymc3.", "") on the entire document here.
In the code we don't usually have pymc. occurences and in the end you'll see it in the git diff.

If you open the project with VS Code, you could even do this with the Find & Replace across all files (Ctrl+Shift+H).

If you want to continue the programmatic route, maybe unindent your json.dump lines out of the for loop?
Also pass json.dump(..., indent=2) to get the output nicely formatted.

@drbenvincent
Copy link
Contributor Author

The formatting and the end of the file seems messed up. Problem seems to happen even if you just write the file with zero changes to it.

@michaelosthege
Copy link
Member

But then it's definitely an encoding issue. VS Code shows the encoding in the bottom right corner.
You'll probably have to specify the encoding like here:

with open(fp, "r", encoding="utf-8") as file:
lines = file.read()
for pattern, substitute in REPLACEMENTS.items():
lines = lines.replace(pattern, substitute)
with open(fp, "w", encoding="utf-8") as file:

@drbenvincent
Copy link
Contributor Author

drbenvincent commented Jun 11, 2022

It might be enough to do .replace("pymc.", "").replace("pymc3.", "") on the entire document here. In the code we don't usually have pymc. occurences and in the end you'll see it in the git diff.

Well we want to remove the whole tag, not just the pymc prefix. And we only want to do that in the :tags: block, not in the model code for example. Will experiment...

If you want to continue the programmatic route, maybe unindent your json.dump lines out of the for loop? Also pass json.dump(..., indent=2) to get the output nicely formatted.

That definitely helps. Although I think the issue is something fundamental with writing to the file... if I comment out any changes so you just read and write the file then the problem stays the same

@drbenvincent
Copy link
Contributor Author

drbenvincent commented Jun 11, 2022

Got it. The file read/write pattern used in rerun.py works.

import json
import os


path = "examples"
exclude_directories = set(['.ipynb_checkpoints'])
filelist = []
for root, dirs, files in os.walk(path):
    dirs[:] = [d for d in dirs if d not in exclude_directories]
    for file in files:
        file_extension = os.path.splitext(file)[1]
        if file_extension == ".ipynb":
            filelist.append(os.path.join(root,file))

for filename in filelist:

    with open(filename, "r", encoding="utf-8") as file:
        lines = file.read()
        json_object = json.loads(lines)
    
    made_changes = False
    for i, string in enumerate(json_object['cells'][0]['source']):
        if string.startswith(":tags:"):
            elements = string.split()
            filtered_elements = [token for token in elements if not token.startswith('pymc')]
            new = " ".join(filtered_elements)
            # update the dictionary
            json_object['cells'][0]['source'][i] = new
            made_changes = True
            break
            
    if made_changes:
        json_string = json.dumps(json_object, indent=2)
        with open(filename, "w", encoding="utf-8") as file:
            file.write(json_string)

This was referenced Jun 11, 2022
@drbenvincent
Copy link
Contributor Author

This is mostly done. All we need to do now is ensure all pull requests remove any pymc* tags.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants