Skip to content

Known Regression: Activity without a $return binding returned a non-None value #401

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
davidmrdavid opened this issue Aug 31, 2022 · 24 comments

Comments

@davidmrdavid
Copy link
Collaborator

davidmrdavid commented Aug 31, 2022

Known regression notice: Activity without a $return binding returned a non-None value

Error description:
We have just been noticed that some subset of users are suddenly, and intermittently, experiencing errors in the Durable Functions Python apps with an exception that reads: "( Activity ) without a $return binding returned a non-None value".

What this error means is that Activity Triggers, which are both a trigger and an output bindings, somehow not being allowed to return
a value. Therefore, the error is complaining that the Function has a return statement (the "returned a non-None value" part) despite the fact that Activities are (incorrectly) not allowed to return data.

Reproducer:
The error does not seem reproducible locally, but it can be reproduced, intermittently, on Azure.
The simplest reproducer is to have a standard Function-chaining application, such as this one and to modify the Activity Function to utilize an output binding.

For example, here's a hello-world Activity that writes a hardcoded string to blob storage:

def main(name: str, blob) -> str:
    blob.set("a")
    return f"Hello {name}!"

and it's funtion.json

{
  "scriptFile": "__init__.py",
  "bindings": [
    {
      "name": "name",
      "type": "activityTrigger",
      "direction": "in"
    },
    {
      "type": "blob",
      "direction": "out",
      "name": "blob",
      "path": "test/blobtest",
      "connection": "AzureWebJobsStorage"
    }

  ]
}

After deploying and executing the orchestrator, the orchestrator may fail with the aforementioned exception.

In short - it appears this error triggers when an Activity is paired with an output binding.

Root cause theories:
At this time, we are fairly confident this is not a regression in the Durable Functions SDK and also not in the Durable Functions Extension. Instead, it appears that affected applications may have gone through a Functions Host (the base Azure Functions component) that triggered this behavior.

This leads us to believe that the problem is either caused by some regression in the Functions Host or, most likely, by some regression in the Python worker (the component that allows Azure Functions to run Python code) which was bundled in the Functions Host.

The Python team is currently investigating this, with out help, to understand and patch this regression as soon as possible.

Update 8/31:

Our current understanding is that this error occurs only on Functions V4 apps that are using the latest Host version (4.9.1.1). This latest Host version includes a refactoring in the Python worker that may be to blame for this issue, but the specifics are still being investigated. That said, this is enough to provide a workaround. Please see the update for 8/31 in the "workarounds" section.

Workarounds:

Workarounds are being worked on. We need to better understand the root cause to provide them. I will update this thread as soon as possible.

Update 8/31:

If you are affected by this error, you should be able to circumvent it by reverting back to a previous version of the Host.
The general guidance for doing this on Linux (the only OS for Python support today) can be found here: https://docs.microsoft.com/en-us/azure/azure-functions/set-runtime-version?tabs=portal#manual-version-updates-on-linux

For this, you will need to utilize the "Azure Functions CLI / az CLI. The link above contains a link to download the az on your local machine, but it should be able to run az CLI commands from the Azure Cloud Shell as well. Again, this is all in the link above, under the Azure CLI tab.

Just in case you're unfamiliar with how to use the az CLI:
To be able to manipulate your Azure Functions with it, you will need to first log in (you can run az login to do that) and then change the "active subscription" to be the subscription of your target app. You can read about how to change your az-CLI active subscription here: https://docs.microsoft.com/en-us/cli/azure/manage-azure-subscriptions-azure-cli#change-the-active-subscription

How to revert to a previous version of the Azure Functions Host:

You will need to modify your linuxFxVersion to pin your application to a previous Host version. We will be pinning to Host version 4.8.0, where we believe the error should be avoidable or at least much less frequent.

Here's the command you will use:

az functionapp config set \
 -g <resource_group> \
 -n <function_app_name> \
 --subscription <subscription_id> \
 --linux-fx-version <docker_image_with_the_right_host_version>

So, for example, if your resource group is called "myResourceGroup" your appName is called "Foo", and your subscription ID were "123", then you'd run the following command (ignoring the linux-fx-version parameter).

az functionapp config set \
 -g myResourceGroup \
 -n Foo \
 --subscription "123" \
 --linux-fx-version <docker_image_with_the_right_host_version>

The value of the linux-fx-version depends on whether your application is in a Consumption plan, or not.

If you're using the Consumption plan, then you should use: "DOCKER|mcr.microsoft.com/azure-functions/mesh:4.8.0-python3.9" You may change the suffix "python3.9" to "python3.8" or "python3.7" according to your Python interpreter preference.

Please see my latest update on this thread (on 9/2) - there seems to be a blocker preventing manual-editing of linuxFxVersion in the Consumption plan. As a result, we are automatically rolling back the default Host version to 4.8.0 on linux consumption for Python

If you are using the App Service plan and/or the Elastic Premium plan, then the docker image is slightly different. It is as follows: "DOCKER|mcr.microsoft.com/azure-functions/python:4.8.0-python3.9-appservice"

  • Again, you may change the suffix "python3.9" to "python3.8" or "python3.7" according to your Python interpreter preference.

If you are running this command in a PowerShell shell, be aware that the "pipe" (|) symbol in the docker image names will cause issues if you just specify the string with a single pair of quotes. To get around this, please wrap the name around '"-pairs. For instance, this is a full command for our example app, if it were on linux elastic premium:

az functionapp config set  -g myResourceGroup -n Foo --subscription "123" --linux-fx-version '"DOCKER|mcr.microsoft.com/azure-functions/python:4.8.0-python3.9-appservice"'

After invoking this command, please give your app enough time to apply the change - a minute or two should suffice. You will know this change got applied successfully if, on your Function app portal view, under the "Essentials" bar, the "Runtime version" field reads "4.8.0.0". If you do not see this, consider restarting your app.

Finally, if this guidance does not work, or you find any typos, please report them in this thread and we'll look to assist you. Thank you, and apologies for the inconvenience. Do note that the guidance here will need to be undone in the future to ensure your Functions Host continues getting regular updates. To revert this change, just set the linuxFxVersion to 'python|3.9" (replacing 3.9 for your python interpreter preference).

In the meantime, we're working on a permanent fix. We will update this thread once we have it.

@FaCoffee1984
Copy link

FaCoffee1984 commented Aug 31, 2022

I can confirm the erratic behaviour of Azure Durable Functions. Please keep us posted.

@KhaoticMind
Copy link

Thanks for the confirmation @davidmrdavid this was driving me crazy.
Please keep this thread updated as you guys continue to investigate.

@davidmrdavid
Copy link
Collaborator Author

davidmrdavid commented Aug 31, 2022

@FaCoffee1984, @KhaoticMind, @thec0dewriter - please see my update above ^

@KhaoticMind
Copy link

Will check It tomorrow morning (GMT-3). THANKS!

@FaCoffee1984
Copy link

To undo the changes suggested here, I think the Host version should be changed back to the newest version, which I think is 4.9.1.1, and all one needs to do is to repeat the steps with Azure CLI and just change:

"DOCKER|mcr.microsoft.com/azure-functions/mesh:4.8.0-python3.9"
into
"DOCKER|mcr.microsoft.com/azure-functions/mesh:4.9.1.1-python3.9".

@davidmrdavid can you please confirm?

@KhaoticMind
Copy link

Important to know if/how we can keep it "auto-updating".

@FaCoffee1984
Copy link

I noticed that after changing the runtime to version 4.8.0, the "Runtime" property is missing from the "Essentials" bar, and my functions, which have a Cosmos DB trigger, are not triggering anymore.

@davidmrdavid
Copy link
Collaborator Author

davidmrdavid commented Sep 1, 2022

@FaCoffee1984 / @KhaoticMind : To revert you application back to "auto-updating", it should suffice to revert linuxFxVersion to "python|3.9" (or 3.8, 3.7, depending on your preferred interpreter version). Note that the value of linuxFxVersion in the auto-upgrading mode does not specify the Host version number, just the language; I believe that's why it is auto-updating :) .

I'll triple check this guidance on reverting back the change, but I'm fairly confident of it at the moment.

@davidmrdavid
Copy link
Collaborator Author

@FaCoffee1984: your note that the runtimeVersion filed was missing after applying the change is surprising, I wonder if something went wrong when applying the guidance.

Two questions:
(1) You said your Function with a cosmos trigger was no longer triggering. Did you mean to say cosmos binding? If it was a binding: is it input or output? My hunch is that you mean to say that an activity trigger with a cosmos binding (most likely output) no longer triggered. Also - is it just one Function that was now unavailable, or was it all Functions?!

(2) any chance you could share with me your app name, affected Function name, and timerange in UTC where the guidance was applied?

@FaCoffee1984
Copy link

FaCoffee1984 commented Sep 1, 2022

Hi @davidmrdavid, here are my clarifications:

(1) My functions are Activity Functions with both input and output bindings to Cosmos DB;

(2) I do have a chain of functions, which are supposed to run as soon as one new json is created into the collection referenced by the input binding; as soon as the processing is over, the output is dumped into the collection referenced by the output binding, and the next function in the chain is executed;

(3) How secure is it to share those details here publicly?

(4) Also, when applying the workaround suggested here, what is the expected output? I did this in a Terminal window in VS Code, and I got a json as output. Is there anything in particular I should look for to check whether it went well or not?

@davidmrdavid
Copy link
Collaborator Author

@FaCoffee1984: I'll respond to those questions as soon as possible, on my way to work.

In the meantime,if the workaround didn't work and now some Functions are not even triggering - can you please confirm if you were able to revert the change so that your Functions are at least invoking again?

@FaCoffee1984
Copy link

FaCoffee1984 commented Sep 1, 2022

@davidmrdavid Yes, I can confirm that I've reverted the change and that (some of my) functions are now triggered. That "some of my" is problematic, in that the data type coming from the input binding seems to change from list to CosmosDB.Document, and I think this has to do with the Host, in that the interaction with Cosmos DB is unreliable at the moment.

@KhaoticMind
Copy link

As a feedback: we did the version pinning to 4.8.0 and the function ran without any issue.
We will keep testing and I'll update this thread accordingly.

Thanks!

@KhaoticMind
Copy link

@FaCoffee1984 / @KhaoticMind : To revert you application back to "auto-updating", it should suffice to revert linuxFxVersion to "python|3.9" (or 3.8, 3.7, depending on your preferred interpreter version). Note that the value of linuxFxVersion in the auto-upgrading mode does not specify the Host version number, just the language; I believe that's why it is auto-updating :) .

I'll triple check this guidance on reverting back the change, but I'm fairly confident of it at the moment.

I think I missed the original text where you said how to revert. Sorry for that and good work!

@davidmrdavid
Copy link
Collaborator Author

Hi all.
I worked today with the Python group and we identified the source of this regression. Using that, we should have a clearer path towards a prompt resolution. I'll provide updates on this as soon as possible.

I need to talk with the Functions Team on Linux to discuss the mitigation plan. I should be able to update this thread with that tomorrow.

@davidmrdavid
Copy link
Collaborator Author

davidmrdavid commented Sep 2, 2022

Another update:

Regarding the long-term fix:
The Python group has implemented a fix, which is currently being integration-tested. Once the tests pass, the fix should start deploying at the next available release date. Since the release date is not yet confirmed, I can't discuss it publicly, but we're pushing for it to occur as early as can be. I'll provide an update on this asap.

Regarding mitigations
For linux app service and premium plans: the recommendation to pin your Host version using the guidance above still applies.

For linux consumption users: we are automatically rolling back your Host version (for python apps only) right now. In the next few hours, your Host version should automatically return to 4.8.0 provided that you have returned to the "default Host version" setting of your app. In other words, if you return your linuxFxVersion to python|3.9 (or 3.8, 3.7, depending on your interpreter of choice). We are doing this rollback because we have found an issue with setting the linuxFxVersion as an end-user, similar to what @FaCoffee1984 reported.

In summary:

  • A patch has been implemented, to be rolled out asap
  • For app service and premium plan users: to prevent this bug, please use guidance above to set your linuxFxVersion to Host 4.8.0
  • For consumption users: we will be reverting back the default Host version to 4.8.0 while we wait for the patch to roll out. So this error should auto-resolve soon.

I will look to provide more details on the regression root-cause at some point. For now, I'll be focusing on following up to make sure these releases are rolled out. Thanks again for your patience.

@KhaoticMind
Copy link

Hi @davidmrdavid , any news on this issue? Can we "re-upgrade" the host version?

@KhaoticMind
Copy link

@davidmrdavid since the 09/09 I started running on the same issue again on our production environment.
I checked and we are still using host version 4.8.0. Did you guys notice any problem reaching on earlier versions?

Thanks.

@davidmrdavid
Copy link
Collaborator Author

Hi @KhaoticMind:

Let me follow-up with the release folks. My understanding is that, on the Consumption plan, this issue should no longer be occurring due to a rollback of the Host. For other plans, I need to triple check.

Can you please confirm if you're in the Consumption plan? If so, how often are you seeing this error since 9/9? I might ask follow-up questions depending on the response. In the meantime, let me check with the release folks.

@KhaoticMind
Copy link

Our function is running on a elastic-premium (backed by a app).
My function runs on a daily basis, and everyday I get the error. I tried running the code in dev right now and I got the same message again.

If you want I can PM you the resource-IDs for the resources so you can confirm the information if you want.

@davidmrdavid
Copy link
Collaborator Author

Hi @KhaoticMind:

I have confirmed that this should have been patched on Host 4.10.4, which should have been fully deployed. I am no longer able to repro this on my Azure app.

It would be good to get your app information Please follow this guidance (https://github.com/Azure/azure-functions-host/wiki/Sharing-Your-Function-App-name-privately) to share your app information privately with us. In the meantime, I'll continue trying to repro this on my end using Host 4.10.4 (the latest)

@davidmrdavid
Copy link
Collaborator Author

davidmrdavid commented Sep 15, 2022

@KhaoticMind: To utilize Host 4.10.4, please return your linuxFxVersion to its original value. For most people, that would be "PYTHON|3.9" (assuming your preferred interpreter is 3.9)

@davidmrdavid
Copy link
Collaborator Author

Closing as this issue was resolved.

@davidmrdavid
Copy link
Collaborator Author

davidmrdavid commented Oct 6, 2022

Just to be clear: now that the issue was resolved, there should be no need to be pinned to an older version of the Functions Host. Please consider returning back to the default version to continue getting the latest updates. The instructions for this are in this thread. Still, I'm available to assist and clarify any confusion or concerns. thanks!

@davidmrdavid davidmrdavid unpinned this issue Mar 9, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants