Skip to content

Implement base object for handling triggers with basic data types #66

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
pragnagopa opened this issue Mar 2, 2018 · 40 comments · Fixed by #409
Closed

Implement base object for handling triggers with basic data types #66

pragnagopa opened this issue Mar 2, 2018 · 40 comments · Fixed by #409
Assignees
Labels
func-stack: Python P1 [P1] items : Ship blocking

Comments

@pragnagopa
Copy link
Member

No description provided.

@1st1
Copy link
Collaborator

1st1 commented Mar 5, 2018

Context

The original motivation for this feature is to allow creating functions for binding types that are not yet fully supported by the worker. The current behaviour is to simply reject them to be loaded with an error.

The proposed solution is to:

  • add common base datatypes for trigger bindings and regular bindings;
  • expose the underlying gRPC TypedData raw data like the nodejs worker.

Pros:

  • This would mean that even if there's no direct implementation of a binding type, users would still be able to write functions that use it.

Cons:

  • The main problem here is that by doing that, we would inadvertently expose Python Functions users to lots of low-level protocol details. In the short term, users will be able to use new bindings/azure services before they are officially supported by the Python worker.

  • This essentially makes the low-level encoding—TypedData gRPC structs—an API. This complicates things for binding types and azure services, as they would have to maintain backwards compatibility w.r.t. how exactly bindings are encoded on the wire.

  • Moreover, we'll still have to document how exactly those bindings are encoded on the wire, otherwise users won't be able to work with these new binding types.

Therefore, implementing this feature doesn't really make it easier for users to work with unsupported bindings. At best this would allow them to quickly prototype a function before the binding/integration is fully supported, but by doing so, they would be writing code that relies on private undocumented data formats.


Initial attempt to implement the feature

Invariants:

  • We don't want to expose the actual gRPC protocol to the user.
  • We don't want to mirror the TypedData gRPC message to equivalent Python classes—that's not Pythonic.
  • We don't want to assume that all bindings are always encoded as JSON. There might be a new binding in the future that uses TypedData(int=..) gRPC type for encoding its values.

Therefore we want a high-level facade types for the TypedData. I've prototyped the following ABCs:

class RawInputData(abc.ABC):

    @abc.abstractmethod
    def get_str(self) -> typing.Optional[str]:
        pass

    @abc.abstractmethod
    def get_bytes(self) -> typing.Optional[bytes]:
        pass

    @abc.abstractmethod
    def get_int(self) -> typing.Optional[int]:
        pass

    @abc.abstractmethod
    def get_float(self) -> typing.Optional[float]:
        pass

    @abc.abstractmethod
    def get_json(self) -> typing.Any:
        pass

    # XXX: Not complete! Probably need to add support for TypedData(http)


class RawOutputData(RawInputData):

    @abc.abstractmethod
    def set_str(self, val: str):
        pass

    @abc.abstractmethod
    def set_bytes(self, val: bytes):
        pass

    @abc.abstractmethod
    def set_int(self, val: int):
        pass

    @abc.abstractmethod
    def set_float(self, val: float):
        pass

    @abc.abstractmethod
    def set_json(self, val: typing.Any):
        pass


class GenericBinding(abc.ABC):

    @property
    @abc.abstractmethod
    def raw_data(self) -> RawOutputData:
        pass


class GenericTrigger(abc.ABC):

    @property
    @abc.abstractmethod
    def raw_data(self) -> RawInputData:
        pass

    @property
    @abc.abstractmethod
    def raw_metadata(self) -> typing.Mapping[str, RawInputData]:
        pass

The idea is to inherit most of binding and trigger types from either GenericBinding or GenericTrigger classes. We would then rewrite every type implementation to use the raw_data and raw_metadata to ensure that any modifications to the raw data are always synchronized.

The problem is that this increases the overall complexity of the code. That will result in documentation becoming more complex too.


Question

So the real question here: do we really want this? Maybe the correct approach is to simply add official support of a binding/trigger/service before documenting it and letting users use it?

After all we made it easy to implement new bindings:

class TimerRequest(azf_abc.TimerRequest):
def __init__(self, *, past_due: bool) -> None:
self.__past_due = past_due
@property
def past_due(self):
return self.__past_due
class TimerRequestConverter(meta.InConverter,
binding='timerTrigger', trigger=True):
@classmethod
def check_python_type(cls, pytype: type) -> bool:
return issubclass(pytype, azf_abc.TimerRequest)
@classmethod
def from_proto(cls, data: protos.TypedData, *,
pytype: typing.Optional[type],
trigger_metadata) -> typing.Any:
if data.WhichOneof('data') != 'json':
raise NotImplementedError
info = json.loads(data.json)
return TimerRequest(
past_due=info.get('IsPastDue', False))

/cc @asavaritayal @christopheranderson @brettcannon @elprans

@brettcannon
Copy link
Member

Could you expose the gRPC TypedData with a underscore-leading setup so that those who want to poke at it can, but that it isn't an officially supported part of the API?

@1st1
Copy link
Collaborator

1st1 commented Mar 6, 2018

Yes, we could do that. I'm still not sure why that would be useful though:

  • IMO, exposing raw gRPC messages is only possible for "in" bindings.

    Exposing them for "out" bindings is outright dangerous: the Azure Functions Web Host can be easily crashed in case of a malformed message. Same as with CPython, we allow users to create "broken" code objects, but it's impossible to fully protect the interpreter from segfaulting in all cases.

    Therefore, if we do this only for "in" bindings, then users will still need to wait for us adding support for new bindings/services officially.

  • Another problem with gRPC is that its datastructures are not pleasant to work with. Without looking at the protobuf spec, it would be hard for users to make sense of the data (I'm speaking from my own experience here). And even if somebody is persistent enough to extract some useful data from raw gRPC messages, it can stop working the other day when some minor change to the platform is made.

  • It seems that we don't expose raw data for C# functions: all bindings are exposed via nice and usable C# types. It looks like NodeJS worker is the exception here, and I feel that Python should follow the C# path.

@brettcannon
Copy link
Member

@1st1 neither do I, but you seemed to want to expose something and I was trying to minimize the work to do that. 😉

@1st1
Copy link
Collaborator

1st1 commented Mar 6, 2018

Yeah :)

So if @asavaritayal and @christopheranderson are OK with this, I'd prefer to close this issue and simply implement the remaining bindings properly, without exposing any of gRPC low-level messaging to the end-users.

@1st1
Copy link
Collaborator

1st1 commented Mar 8, 2018

As per our conf call with @asavaritayal, @pragnagopa, and @paulbatum, we decided to postpone the final decision on this issue. We'll continue implementing high-level rich Python types for bindings for now. We'll reconsider adding a generic type that exposes raw JSON data based on user experience and our own experience implementing rich types.

@asavaritayal asavaritayal added this to the backlog milestone Mar 27, 2018
@asavaritayal
Copy link
Contributor

Quick update on this issue -

Per feedback over the last couple of months, the high-level rich Python types are much easier to use than the basic types we have for node. Since the azure.functions package is now published to PyPI, rich types simplify development via type hints, intelli-sense, etc. making it easy to orchestrate data from the underlying binding sources.

Closing this issue for now. It can be re-opened if hear otherwise. /cc @paulbatum

@asavaritayal
Copy link
Contributor

Resurrecting this thread per discussion with @anthonychu.

Is it possible that while we support rich types for first class bindings such as HTTP, Blob, Event Grid, etc. , we still allow users to bring their own binding (per our extensibility model) and use it as a basic data type object in their function code? We've got a few new bindings spring up lately - Signal R, Event Hubs, etc. and it seems that Python will start to lose out on scenarios because we're not constantly building support for each of these bindings in the language worker.

@asavaritayal asavaritayal reopened this Sep 18, 2018
@pragnagopa
Copy link
Member Author

pragnagopa commented Sep 28, 2018

cc: @maiqbal11 @paulbatum

We should look into adding support for using context object similar to javascript functions. Exposing context object does not expose TypedData defined in the protobuf which is an internal implementation detail.

We should allow following function signatures:

  1. Without importing azure.functions, code should be able to work directly with context object and accessing input data via context.bindingData
def main(context, msg) -> str:
  1. Giving an option to the customer to use azure.functions package,
def main(context, msg: azf.QueueMessage) -> str:

@asavaritayal
Copy link
Contributor

asavaritayal commented Oct 3, 2018

We should look into adding support for using context object similar to javascript functions. Exposing context object does not expose TypedData defined in the protobuf which is an internal implementation detail.

Open to comments on this but I think we should keep the function signature consistent for the end user regardless of whether the binding is of a "basic" or "rich" type.

For e.g.

if req is of type azure.functions.HttpRequest type but msg is of type str

def main(req: azure.functions.HttpRequest, msg: str):

    name = req.params.get('name')
    logging.info(f'property from rich type: {name}')

    logging.info(f'basic data type: {msg}')

@elprans
Copy link
Collaborator

elprans commented Oct 3, 2018

One of the reasons why rich bindings are preferred is because we can hide the actual wire format behind a nice abstraction. Exposing raw host data as a str essentially makes the protocol a public API and it will have to be maintained for the purposes of backward compatibility, i.e. the breaking changes in the host/extension would not be possible without some sort of versioning on the trigger binding configuration.

@elprans
Copy link
Collaborator

elprans commented Oct 3, 2018

My point is that I think we shouldn't make it look like a regular binding. Perhaps an explicit func.RawData type would be better, with a big fat warning to use it at own risk.

@1st1
Copy link
Collaborator

1st1 commented Oct 3, 2018

My point is that I think we shouldn't make it look like a regular binding. Perhaps an explicit func.RawData type would be better, with a big fat warning to use it at own risk.

+1.

@asavaritayal asavaritayal added the feedback-needed Tracking the issue in order to gauge user interest or feedback. label Oct 7, 2018
@asavaritayal
Copy link
Contributor

asavaritayal commented Oct 16, 2018

Summarizing our discussion from today's sync -

We should use the dataType property in function.json - https://docs.microsoft.com/en-us/azure/azure-functions/functions-reference-node#binding-data-type to determine the data type to use to expose the contents of a binding. /cc @paulbatum

@paulbatum
Copy link
Member

paulbatum commented Oct 22, 2018

My point is that I think we shouldn't make it look like a regular binding. Perhaps an explicit func.RawData type would be better, with a big fat warning to use it at own risk.

I am struggling with this a little bit because in the case of JavaScript there is no abstraction, and we do not provide any warning. We expect the customer to code against the API surface area that is provided by the binding extension itself. If the binding makes a change to the format of the message, then the user needs to update their code when they update to the new major version of the extension that has that breaking change. Right now we do not have an extensibility model where bindings "plug in" to the language worker, and the Python worker is kind of cheating by referencing the python library that has "built in" knowledge of a certain set of bindings.

This is a long winded way of me saying that I'd rather not treat these generic bindings as a special case in Python, because they aren't a special case in our other languages. I look at it as the "base case" and then the additional abstraction that the python worker is doing over some bindings is sugar on top.

In the long term, we need to figure out an appropriate architecture for how that abstraction is provided. If we want to update and ship a new version of the python worker every time there's a new binding to add the sugar (perhaps by referencing an updated version of the library), then bindings are no longer acting as a true extensibility model.

I'll also mention that I chatted with @pragnagopa and confirmed that if we get to the point where we'd like to allow the python developer to express their datatype through annotations rather than function.json, we'll need to do some additional work in the host to allow that. Today, by the time the host asks the language worker to load a given function, we've already decided on the datatype and its too late for the language worker to change it.

@asavaritayal Can you reply to this with a proposed example of how using a generic binding in python should look, based on my comments above? Thanks!

@asavaritayal
Copy link
Contributor

asavaritayal commented Oct 22, 2018

@asavaritayal Can you reply to this with a proposed example of how using a generic binding in python should look, based on my comments above? Thanks!

To define the data type for a standard binding, user should be able to leverage the dataType property in the function.json binding configuration.

For example:

functions.json

{
    "bindings": [
        {
            "name": "myFile",
            "type": "apiHubFileTrigger",
            "direction": "in",
            "path": "samples-workitems",
            "connection": "<name of external file connection>",
            "dataType": "bytes"
        }
    ]
}

__init__.py

import logging 

def main(myFile):

    logging.info(f'New file has been added to your Dropbox account:  {myFile}')

Supported options for dataType being string, bytes and bytearray.

If dataType property is not provided, we'll check if a first-class type is defined in the worker/library for the binding type e.g. InputStream for Blob. Else, we should default to bytes.

@asavaritayal asavaritayal assigned elprans and unassigned paulbatum Oct 24, 2018
@maiqbal11
Copy link
Contributor

@elprans @asavaritayal @paulbatum

During our sync yesterday, Elvis had wanted to know whether the bytearray was necessary as a dataType. I believe it is there for parity with the other language workers? I do not have enough context on the implementation of the other language workers (Node in particular) to be able to articulate if this is needed or if we our emphasizing consistency.

@asavaritayal
Copy link
Contributor

Update from the sync - don't need to support bytearray.

@joshuapwalton
Copy link

joshuapwalton commented Feb 5, 2019 via email

@elprans
Copy link
Collaborator

elprans commented Feb 5, 2019

@asavaritayal @pragnagopa We need to bump the host version used for testing as well. Which version should I use?

@pragnagopa
Copy link
Member Author

pragnagopa commented Feb 5, 2019

@elprans - host is not rolled out yet. You can the build from appveyor to unblock development: https://ci.appveyor.com/project/appsvc/azure-webjobs-sdk-script-y8o14/branch/master/job/9u3n8utn7d08farp/artifacts

@joshuapwalton
Copy link

joshuapwalton commented Feb 5, 2019 via email

@pragnagopa
Copy link
Member Author

@joshuapwalton - This feature is still in development. This issue will be updated as we make progress.

@elprans
Copy link
Collaborator

elprans commented Feb 5, 2019

@pragnagopa I am not seeing the data_type field in metadata with that build.

directory: "tests/blob_functions/get_blob_return"
script_file: "tests/blob_functions/get_blob_return/main.py"
name: "get_blob_return"
bindings {
  key: "$return"
  value {
    type: "http"
    direction: out
  }
}
bindings {
  key: "file"
  value {
    type: "blob"
  }
}
bindings {
  key: "req"
  value {
    type: "httpTrigger"
  }
}

The function.json file looks like this:

{
  "scriptFile": "main.py",
  "disabled": false,
  "bindings": [
    {
      "type": "httpTrigger",
      "direction": "in",
      "name": "req"
    },
    {
      "type": "blob",
      "direction": "in",
      "name": "file",
      "dataType": "binary",
      "connection": "AzureWebJobsStorage",
      "path": "python-worker-tests/test-bytes.txt"
    },
    {
      "type": "http",
      "direction": "out",
      "name": "$return"
    }
  ]
}

@pragnagopa
Copy link
Member Author

@elprans - data_type is a new field added in the protobuff: https://github.com/Azure/azure-functions-language-worker-protobuf/blob/dev/src/proto/FunctionRpc.proto#L318
Python worker needs to pull in the latest proto file

@maiqbal11
Copy link
Contributor

@elprans, I am able to see the data_type field while running locally. The worker dev branch is updated with the latest protobuf so you should be able to pull it to have the same version as the host.

elprans added a commit that referenced this issue Feb 6, 2019
The generic binding infrastructure now allows annotating the function
argument as `str` or `bytes`.  This also adds support for `str` and
`bytes` binding for the blob functions as an example.

The binding data adapter is responsible for checking the declared
`dataType` in function.json against the declared parameter type
annotation.  All existing bindings, except blob, currently refuse to
bind if `dataType` is specified.

Issue: #66.
@elprans
Copy link
Collaborator

elprans commented Feb 6, 2019

It was my mistake, I needed to recompile the protobuf. I pushed the updated PR which incorporates the dataType change.

@asavaritayal asavaritayal removed this from the Functions Sprint 43 milestone Mar 18, 2019
elprans added a commit that referenced this issue Mar 19, 2019
The generic binding infrastructure now allows annotating the function
argument as `str` or `bytes`.  This also adds support for `str` and
`bytes` binding for the blob functions as an example.

The binding data adapter is responsible for checking the declared
`dataType` in function.json against the declared parameter type
annotation.  All existing bindings, except blob, currently refuse to
bind if `dataType` is specified.

Issue: #66.
@fabiocav
Copy link
Member

Closing as resolved.

@asavaritayal
Copy link
Contributor

asavaritayal commented May 6, 2019

Re-opening as the work does not satisfy the primary requirement which was to enable support for bindings that don't have rich binding support in the azure.functions library. @elprans please prioritize this.

@asavaritayal asavaritayal reopened this May 6, 2019
elprans added a commit that referenced this issue May 6, 2019
Add a generic binding implementation to allow unknown bindings with
`dataType: string` and `dataType: binary`.

Fixes: #66
Fixes: #408
elprans added a commit that referenced this issue May 15, 2019
Add a generic binding implementation to allow unknown bindings with
`dataType: string` and `dataType: binary`.

Fixes: #66
Fixes: #408
elprans added a commit that referenced this issue Jun 24, 2019
Add a generic binding implementation to allow unknown bindings with
`dataType: string` and `dataType: binary`.

Fixes: #66
Fixes: #408
maiqbal11 pushed a commit that referenced this issue Jul 1, 2019
* Allow arbitrary bindings with elementary dataType

Add a generic binding implementation to allow unknown bindings with
`dataType: string` and `dataType: binary`.

Fixes: #66
Fixes: #408

* Add support for untyped queue trigger binding

* Support unannotated elementary bindings

* Decouple rich binding support from the worker

Per discussion, the worker no longer includes support for rich bindings
by default.  The implementation have been moved to
azure-functions-python-library, and the runtime dependency on is has
been removed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
func-stack: Python P1 [P1] items : Ship blocking
Projects
None yet
Development

Successfully merging a pull request may close this issue.

9 participants