Skip to content

Enable to use a SQL DB as a storage backend #11

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Nov 10, 2022

Conversation

miraculixx
Copy link
Contributor

@miraculixx miraculixx commented May 23, 2022

closes #9

@miraculixx miraculixx force-pushed the enable-sqla-backend branch 2 times, most recently from 2c69170 to ff7570a Compare May 23, 2022 23:49
@miraculixx miraculixx changed the title Enable sqla backend Enable to use a SQL DB as a storage backend May 23, 2022
@miraculixx miraculixx force-pushed the enable-sqla-backend branch from ff7570a to ab93c02 Compare May 23, 2022 23:58
@miraculixx miraculixx force-pushed the enable-sqla-backend branch 7 times, most recently from 2d756f7 to d7417da Compare May 25, 2022 13:40
@AbdealiLoKo
Copy link
Collaborator

I have merged the PR for github-actions - Could you rebase this PR based on that ?
Also, could you help me understand why do you use dataset and not sqlalchemy directly here ?

dataset seems to need a bunch of other dependencies which we could avoid - https://github.com/pudo/dataset/blob/master/setup.py#L32


def __init__(self):
super().__init__()
db_url = os.environ.get("CHP_DATABASE_URL", self.default_db_url)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of using environment variables ? Can we get the configs for this from the jupyter config file ?
For example how the other jupyterhub proxy configs have it: https://jupyterhub-traefik-proxy.readthedocs.io/en/latest/toml.html#example-setup

Copy link
Contributor Author

@miraculixx miraculixx May 28, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have merged the PR for github-actions - Could you rebase this PR based on that ? Also, could you help me understand why do you use dataset and not sqlalchemy directly here ?

dataset seems to need a bunch of other dependencies which we could avoid - https://github.com/pudo/dataset/blob/master/setup.py#L32

The rationale for using dataset is that it reduces the effort to implement this feature by about 90%. dataset's core benefit is that it abstracts-away all the sqlalchemy nuts & bolts, in particular we can use Python-dicts as input/output of the DB. Doing the same using sqlalchemy directly would add quite a bit of code for db handling, which dataset allows us to avoid.

Apart from sqlalchemy, dataset adds two dependencies, alembic and banal. To make this feature operationally useful, it needs to support db upgrades (migrations), so alembic is required anyway. The additional dependency on banal seems a very small price to pay for that.

Copy link
Contributor Author

@miraculixx miraculixx May 28, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we get the configs for this from the jupyter config file ?

IMHO jupyter config files should be a concern of Jupyter, not of chp (if chp depends on jupyter configs there will be a dependency chp => jupyter, which would not be useful in my mind).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the jupyter configs - makes sense
I guess I was more thinking of adding CLI arguments instead of env variables to be consistent with the other configs we use in configurable-http-proxy



class TableTrie:
# A databased URLTrie-alike
Copy link
Collaborator

@AbdealiLoKo AbdealiLoKo May 26, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add a docstring here explaining how the trie structure is being saved in the DB with an example ?
I was trying to go through the code but reallized I was unable to figure out the table structure we are using (probably due to my lack of knowledge on dataset).

Copy link
Contributor Author

@miraculixx miraculixx May 28, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add a docstring here explaining how the trie structure is being saved in the DB with an example ?

Done. On a side note, it might be useful to add a docstring to the trie.URLtrie class because it isn't obvious either ;-)

@@ -27,6 +27,7 @@ The following items are supported:
- Customizable storage backends
- PID file writing
- Logging
- Configurable storage backend
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is an awesome feature and really useful.

We should highlight it by making 3 sections here:

  1. Features supported (and compatible) with nodejs configurable-http-proxy
  2. Additional features not available in nodejs configurable-http-proxy
  3. Features from nodejs configurable-http-proxy not supported (yet)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated the README accordingly

@miraculixx miraculixx force-pushed the enable-sqla-backend branch 3 times, most recently from bc5f0e6 to 7dc0ace Compare May 28, 2022 15:07
@miraculixx
Copy link
Contributor Author

I have merged the PR for github-actions - Could you rebase this PR based on that?

🆗 Done

@miraculixx miraculixx force-pushed the enable-sqla-backend branch from 7dc0ace to dd6e633 Compare June 13, 2022 09:43
@miraculixx miraculixx mentioned this pull request Nov 10, 2022
- add sqlastore.DatabaseStore and unittests
- update API unittests for use in database tests
- update README
@AbdealiLoKo
Copy link
Collaborator

@miraculixx I use rebase merges to maintain a linear commit history instead of merge commits
So, I have rebased your branch on main and pushed

I made some very small modifications in class names in tests (I avoid using _ in the class names)
Also in some places the number of / in sqlite:///chp.sqlite was 2 instead of 3
And modified the readme a little and formatted it using markdown / prettier

Got the tests to pass and I am good to merge it

@AbdealiLoKo AbdealiLoKo merged commit d9f1469 into corridor:main Nov 10, 2022
@AbdealiLoKo
Copy link
Collaborator

Thanks a lot for the contribution - this is super helpful @miraculixx !

@SMHari
Copy link

SMHari commented Jan 19, 2023

Hi @miraculixx @AbdealiLoKo
is this sqla package still valid?
I noticed that the version of sqla is 0.0.0 from PyPI index and there seems to be no active versions being maintained in GitHub

any alternatives that can be used?
https://pypi.org/project/sqla/

@AbdealiLoKo
Copy link
Collaborator

AbdealiLoKo commented Jan 24, 2023

Hi @SMHari The sqla mentioned here is an extra available with configurable-http-proxy
Not the sqla package on pypi

I.e. you need to run pip install configurable-http-proxy[sqla] which installs configurable-http-proxy with sqlalchemy dependencies
https://github.com/corridor/configurable-http-proxy/blob/main/setup.py#L23

Ref: Example 4 in https://pip.pypa.io/en/stable/reference/requirement-specifiers/?highlight=security#examples

SomeProject[foo, bar]
# OR:
requests[security]

@SMHari
Copy link

SMHari commented Jan 24, 2023

Hi @AbdealiLoKo
thanks for checking

But I'm getting the below warning while trying to do the same

WARNING: configurable-http-proxy 0.2.3 does not provide the extra 'sqla'

But I still tried starting the CHP with the storage-backend argument,
ie: configurable-http-proxy --storage-backend configurable_http_proxy.dbstore.DatabaseStore

it is failing because of the same

ModuleNotFoundError: No module named 'configurable_http_proxy.dbstore'

Am I missing anything?

@AbdealiLoKo
Copy link
Collaborator

Hm, looks like we haven't released this to pypi yet.
@indiVar0508 could you release this to pypi

@AbdealiLoKo
Copy link
Collaborator

We just released it @SMHari
To use it in your environment, please do:

pip uninstall configurable-http-proxy
pip install configurable-http-proxy[sql]

NOTE that I have changed it from sqla -> sql to avoid confusion the the pypi library sqla (As the previous name was never released)

@SMHari
Copy link

SMHari commented Jan 25, 2023

Cool
Thanks, @AbdealiLoKo
that is working fine

But I'm using this py CHP with my jupyterhub as an alternative to nodejs CHP
So, whenever, I try to open a notebook, I'm seeing the below exception in proxy logs, which is not the case for the other CHP. As a result, the notebook keeps loading

`Traceback (most recent call last):
  File "/export/apps/python/3.10/lib/python3.10/site-packages/tornado/websocket.py", line 1086, in write_message
    fut = self._write_frame(True, opcode, message, flags=flags)
  File "/export/apps/python/3.10/lib/python3.10/site-packages/tornado/websocket.py", line 1061, in _write_frame
    return self.stream.write(frame)
  File "/export/apps/python/3.10/lib/python3.10/site-packages/tornado/iostream.py", line 530, in write
    self._check_closed()
  File "/export/apps/python/3.10/lib/python3.10/site-packages/tornado/iostream.py", line 1019, in _check_closed
    raise StreamClosedError(real_error=self.error)
tornado.iostream.StreamClosedError: Stream is closed

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/export/apps/python/3.10/lib/python3.10/site-packages/tornado/websocket.py", line 635, in _run_callback
    result = callback(*args, **kwargs)
  File "/export/apps/python/3.10/lib/python3.10/site-packages/configurable_http_proxy/handlers.py", line 414, in on_message
    self.ws_client.write_message(message)
  File "/export/apps/python/3.10/lib/python3.10/site-packages/tornado/websocket.py", line 1500, in write_message
    return self.protocol.write_message(message, binary=binary)
  File "/export/apps/python/3.10/lib/python3.10/site-packages/tornado/websocket.py", line 1088, in write_message
    raise WebSocketClosedError()
tornado.websocket.WebSocketClosedError`

@AbdealiLoKo
Copy link
Collaborator

Hi @SMHari moving this to an issue so we can track it

@SMHari
Copy link

SMHari commented Jan 25, 2023

Sure @AbdealiLoKo
Appreciate your help :)

Whether this will be actively addressed or might take some time ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support a db backend for high availability scenarios
3 participants