Remove `pymc.` and `pymc3.` tags from left bar on website #374

drbenvincent · 2022-06-10T12:30:34Z

Do we have agreement that removing these tags from the left of the site is a good idea?

If so, would that be done with a PR which manually removes all such tags from all example notebooks? Or would it be done by some clever filtering web code in https://github.com/pymc-devs/pymc.io ?

If it's the former then I can probably do that. If it's the latter than I can't, or would need some help.

OriolAbril · 2022-06-10T13:30:58Z

Yes, they all need to be removed. I am not sure about a PR removing them all at once, there are many notebooks with open PRs to update them to v4. Creating git conflicts for this seems too much trouble for what willbe getting fixed.

I'd suggest reviewing all open PRs to make sure if they are editing a notebook with those tags they get removed and a PR removing them for those without open PRs

drbenvincent · 2022-06-10T14:06:17Z

Have been through all open PR's and requested removal of those tags. I will try to find time for more thorough reviews to help push them along.

twiecki · 2022-06-10T16:51:32Z

We could probably write a small script to do them all.

drbenvincent · 2022-06-11T09:13:20Z

New at this, but nearly there

import os
import codecs
import re
from copy import copy


path = "examples"
exclude_directories = set(['.ipynb_checkpoints'])
filelist = []
for root, dirs, files in os.walk(path):
    dirs[:] = [d for d in dirs if d not in exclude_directories]
    for file in files:
        file_extension = os.path.splitext(file)[1]
        if file_extension == ".ipynb":
            filelist.append(os.path.join(root,file))
        
        
for file in filelist:
    print(f"\n{file}")
    target = None
    
    # read file
    f = codecs.open(file, 'r')
    source = f.read()
    
    # extract tags text
    target = re.findall(r'":tags:(.+?)\\n', source)
    
    if len(target) == 0:
        print("\tno tags found")
        continue
        
    target = target[0] + ","
    print(f"\tTAGS FOUND: {target}")
    
    # extract tags to remove
    kill_these = re.findall(r'(pymc)(.+?,)', target)
    kill_these = [''.join(tups) for tups in kill_these]
    if len(kill_these) == 0:
        print("\tno changes to be made")
        continue
        
    print(f"\tTAGS TO REMOVE: {kill_these}")
    
    new = copy(target)
    for to_kill in kill_these:
        new = new.replace(to_kill, "")
        
    print(f"\t REPLACE WITH: {new}")
    
    # replace
    source = source.replace(target[:-1], new)
    print(source[:500])
    
    # TODO: write to file here

drbenvincent · 2022-06-11T11:12:55Z

So this seems like a better approach, and that it should work.

import json
import os
    
path = "examples"
exclude_directories = set(['.ipynb_checkpoints'])
filelist = []
for root, dirs, files in os.walk(path):
    dirs[:] = [d for d in dirs if d not in exclude_directories]
    for file in files:
        file_extension = os.path.splitext(file)[1]
        if file_extension == ".ipynb":
            filelist.append(os.path.join(root,file))

for file in filelist:
    with open(file, 'r+') as openfile:
        json_object = json.load(openfile)

        for i, string in enumerate(json_object['cells'][0]['source']):
            if string.startswith(":tags:"):
                elements = string.split()
                filtered_elements = [token for token in elements if not token.startswith('pymc')]
                new = " ".join(filtered_elements)
                # update the dictionary
                json_object['cells'][0]['source'][i] = new
                # write out to disk
                json.dump(json_object, openfile)
                # end the iteration
                break

But the resulting .ipynb files are not readable. Error when trying to open an edited file in Jupyter lab:

Unreadable Notebook: /Users/benjamv/git/pymc-examples/examples/generalized_linear_models/GLM-binomial-regression.ipynb NotJSONError('Notebook does not appear to be JSON: \'{\\n "cells": [\\n {\\n "cell_type": "m...')

Any ideas what's going on here?

michaelosthege · 2022-06-11T12:00:27Z

Can you post a few lines of before/after JSON?

Just a rough guess, but from looking at the code this could be a lot easier to do with RegEx via re.sub(pattern, replacement, text).

Otherwise, if the JSON looks fine visually, double check the encoding. Jupyter is a little picky about that.

drbenvincent · 2022-06-11T12:06:55Z

start

{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "domestic-remove",
   "metadata": {},
   "source": [
    "(GLM-binomial-regression)=\n",
    "# Binomial regression\n",
    "\n",
    ":::{post} February, 2022\n",
    ":tags: binomial regression, generalized linear model, pymc.Binomial, pymc.ConstantData, pymc.Deterministic, pymc.Model, pymc.Normal, pymc3.Binomial, pymc3.ConstantData, pymc3.Deterministic, pymc3.Model, pymc3.Normal\n",
    ":category: beginner\n",
    ":author: Benjamin T. Vincent\n",
    ":::"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "72588976-efc3-4adc-bec2-bc5b6ac4b7e1",
   "metadata": {},
   "source": [
    "This notebook covers the logic behind [Binomial regression](https://en.wikipedia.org/wiki/Binomial_regression), a specific instance of Generalized Linear Modelling. The example is kept very simple, with a single predictor variable. \n",
    "\n",

end

"source": ["## Watermark"]}, {"cell_type": "code", "execution_count": 11, "id": "sound-calculation", "metadata": {"tags": []}, "outputs": [{"name": "stdout", "output_type": "stream", "text": ["Last updated: Sun Feb 06 2022\n", "\n", "Python implementation: CPython\n", "Python version       : 3.9.9\n", "IPython version      : 7.31.0\n", "\n", "aesara: 2.3.2\n", "aeppl : 0.0.18\n", "\n", "arviz     : 0.11.4\n", "pymc      : 4.0.0b1\n", "pandas    : 1.4.0\n", "matplotlib: 3.4.3\n", "numpy     : 1.21.5\n", "\n", "Watermark: 2.3.0\n", "\n"]}], "source": ["%load_ext watermark\n", "%watermark -n -u -v -iv -w -p aesara,aeppl"]}, {"cell_type": "markdown", "id": "1e4386fc-4de9-4535-a160-d929315633ef", "metadata": {}, "source": [":::{include} ../page_footer.md :::"]}], "metadata": {"kernelspec": {"display_name": "pymc-dev-py39", "language": "python", "name": "pymc-dev-py39"}, "language_info": {"codemirror_mode": {"name": "ipython", "version": 3}, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.9.9"}}, "nbformat": 4, "nbformat_minor": 5}

drbenvincent · 2022-06-11T12:09:50Z

Just a rough guess, but from looking at the code this could be a lot easier to do with RegEx via re.sub(pattern, replacement, text).

Quite possibly, but I'm not that experienced with regular expressions.

Otherwise, if the JSON looks fine visually, double check the encoding. Jupyter is a little picky about that.

Will take a look

michaelosthege · 2022-06-11T12:18:02Z

It might be enough to do .replace("pymc.", "").replace("pymc3.", "") on the entire document here.
In the code we don't usually have pymc. occurences and in the end you'll see it in the git diff.

If you open the project with VS Code, you could even do this with the Find & Replace across all files (Ctrl+Shift+H).

If you want to continue the programmatic route, maybe unindent your json.dump lines out of the for loop?
Also pass json.dump(..., indent=2) to get the output nicely formatted.

drbenvincent · 2022-06-11T12:19:34Z

The formatting and the end of the file seems messed up. Problem seems to happen even if you just write the file with zero changes to it.

michaelosthege · 2022-06-11T12:23:27Z

But then it's definitely an encoding issue. VS Code shows the encoding in the bottom right corner.
You'll probably have to specify the encoding like here:

pymc-examples/scripts/rerun.py

Lines 52 to 58 in c323de9

    
           with open(fp, "r", encoding="utf-8") as file: 
        
               lines = file.read() 
        
           for pattern, substitute in REPLACEMENTS.items(): 
        
               lines = lines.replace(pattern, substitute) 
        
           with open(fp, "w", encoding="utf-8") as file:

drbenvincent · 2022-06-11T12:23:33Z

It might be enough to do .replace("pymc.", "").replace("pymc3.", "") on the entire document here. In the code we don't usually have pymc. occurences and in the end you'll see it in the git diff.

Well we want to remove the whole tag, not just the pymc prefix. And we only want to do that in the :tags: block, not in the model code for example. Will experiment...

If you want to continue the programmatic route, maybe unindent your json.dump lines out of the for loop? Also pass json.dump(..., indent=2) to get the output nicely formatted.

That definitely helps. Although I think the issue is something fundamental with writing to the file... if I comment out any changes so you just read and write the file then the problem stays the same

drbenvincent · 2022-06-11T12:59:14Z

Got it. The file read/write pattern used in rerun.py works.

import json
import os


path = "examples"
exclude_directories = set(['.ipynb_checkpoints'])
filelist = []
for root, dirs, files in os.walk(path):
    dirs[:] = [d for d in dirs if d not in exclude_directories]
    for file in files:
        file_extension = os.path.splitext(file)[1]
        if file_extension == ".ipynb":
            filelist.append(os.path.join(root,file))

for filename in filelist:

    with open(filename, "r", encoding="utf-8") as file:
        lines = file.read()
        json_object = json.loads(lines)
    
    made_changes = False
    for i, string in enumerate(json_object['cells'][0]['source']):
        if string.startswith(":tags:"):
            elements = string.split()
            filtered_elements = [token for token in elements if not token.startswith('pymc')]
            new = " ".join(filtered_elements)
            # update the dictionary
            json_object['cells'][0]['source'][i] = new
            made_changes = True
            break
            
    if made_changes:
        json_string = json.dumps(json_object, indent=2)
        with open(filename, "w", encoding="utf-8") as file:
            file.write(json_string)

drbenvincent · 2022-06-11T14:30:44Z

This is mostly done. All we need to do now is ensure all pull requests remove any pymc* tags.

drbenvincent closed this as completed Jun 11, 2022

drbenvincent reopened this Jun 11, 2022

This was referenced Jun 11, 2022

Batch remove pymc tags #375

Closed

Batch remove pymc tags #376

Merged

drbenvincent closed this as completed Jun 11, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Remove `pymc.` and `pymc3.` tags from left bar on website #374

Remove `pymc.` and `pymc3.` tags from left bar on website #374

drbenvincent commented Jun 10, 2022

OriolAbril commented Jun 10, 2022

Uh oh!

drbenvincent commented Jun 10, 2022 •

edited

Loading

Uh oh!

twiecki commented Jun 10, 2022

Uh oh!

drbenvincent commented Jun 11, 2022 •

edited

Loading

Uh oh!

drbenvincent commented Jun 11, 2022

Uh oh!

michaelosthege commented Jun 11, 2022

Uh oh!

drbenvincent commented Jun 11, 2022 •

edited

Loading

Uh oh!

drbenvincent commented Jun 11, 2022

Uh oh!

michaelosthege commented Jun 11, 2022

Uh oh!

drbenvincent commented Jun 11, 2022

Uh oh!

michaelosthege commented Jun 11, 2022

Uh oh!

drbenvincent commented Jun 11, 2022 •

edited

Loading

Uh oh!

drbenvincent commented Jun 11, 2022 •

edited

Loading

Uh oh!

drbenvincent commented Jun 11, 2022

Uh oh!

Uh oh!

Remove pymc.* and pymc3.* tags from left bar on website #374

Remove pymc.* and pymc3.* tags from left bar on website #374

Comments

drbenvincent commented Jun 10, 2022

OriolAbril commented Jun 10, 2022

Uh oh!

drbenvincent commented Jun 10, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

twiecki commented Jun 10, 2022

Uh oh!

drbenvincent commented Jun 11, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

drbenvincent commented Jun 11, 2022

Uh oh!

michaelosthege commented Jun 11, 2022

Uh oh!

drbenvincent commented Jun 11, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

drbenvincent commented Jun 11, 2022

Uh oh!

michaelosthege commented Jun 11, 2022

Uh oh!

drbenvincent commented Jun 11, 2022

Uh oh!

michaelosthege commented Jun 11, 2022

Uh oh!

drbenvincent commented Jun 11, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

drbenvincent commented Jun 11, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

drbenvincent commented Jun 11, 2022

Uh oh!

Remove `pymc.` and `pymc3.` tags from left bar on website #374

Remove `pymc.` and `pymc3.` tags from left bar on website #374

drbenvincent commented Jun 10, 2022 •

edited

Loading

drbenvincent commented Jun 11, 2022 •

edited

Loading

drbenvincent commented Jun 11, 2022 •

edited

Loading

drbenvincent commented Jun 11, 2022 •

edited

Loading

drbenvincent commented Jun 11, 2022 •

edited

Loading