Skip to content

operator resubmits all spark applications after restart no matter what their status is #457

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
maxgruber19 opened this issue Aug 30, 2024 · 3 comments · Fixed by #460
Closed
Assignees
Labels
customer-request priority/high release/24.11.0 release-note Denotes a PR that will be considered when it comes time to generate release notes. type/bug

Comments

@maxgruber19
Copy link

Affected Stackable version

24.7

Affected Apache Spark-on-Kubernetes version

3.4.2

Current and expected behavior

when the spark-k8s operator restarts (e.g. in case of node restart or operator upgrade) every spark application is reconciled/resubmitted at once. this leads to hell and chaos because of both resource consumption and multiple jobs writing / reading one file which usually done ordered by airflow dags doing the spark submits

Possible solution

not submitting an application which is in state succeeded / failed / stopped / killed. maybe even running apps should not be restarted because they are already running

Additional context

No response

Environment

No response

Would you like to work on fixing this bug?

None

@maxgruber19 maxgruber19 changed the title operator resubmits all spark applications no matter what their status is operator resubmits all spark applications after restart no matter what their status is Aug 30, 2024
@razvan razvan moved this from Next to Development: In Progress in Stackable Engineering Sep 12, 2024
@razvan razvan moved this from Development: In Progress to Development: Waiting for Review in Stackable Engineering Sep 12, 2024
@sbernauer sbernauer moved this from Development: Waiting for Review to Development: Done in Stackable Engineering Sep 16, 2024
@maxgruber19
Copy link
Author

@razvan solution sounds good, thanks for having a look at it

@lfrancke
Copy link
Member

@razvan could you please add a short snippet we can use for the release notes?

@razvan
Copy link
Member

razvan commented Sep 17, 2024

@razvan could you please add a short snippet we can use for the release notes?

Spark Operator Bugfix: ensure Spark applications are submitted only once. Reconciling applications after the corresponding Job objects have been recycled doesn't lead to the creation of new Job objects. This behavior was triggered by different situations, such as when the operator was restarted.

@lfrancke lfrancke added release-note Denotes a PR that will be considered when it comes time to generate release notes. release/24.11.0 labels Sep 17, 2024
@lfrancke lfrancke moved this from Acceptance: In Progress to Done in Stackable Engineering Sep 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
customer-request priority/high release/24.11.0 release-note Denotes a PR that will be considered when it comes time to generate release notes. type/bug
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

5 participants