Skip to content

Bugfix: remove usage of "userClassPathFirst" properties [was: Investigate delta.io integration] #354

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
2 of 4 tasks
razvan opened this issue Feb 12, 2024 · 0 comments · Fixed by #355
Closed
2 of 4 tasks

Comments

@razvan
Copy link
Member

razvan commented Feb 12, 2024

Description

Users have reported that it's not possible to dynamically provision delta.io packages to use with PySpark.

The erroneous behavior can be reproduced with this commit.

The error is fixed and the delta test (and all others except for logging) is successful with this commit. This fix is only temporary and cannot be merged in it's current form since it breaks the logging tests.

Analysis

The problem is caused by the following two properties that the operator always adds to spark-submit in order to support log aggregation with vector:

--conf spark.driver.userClassPathFirst=true
--conf spark.executor.userClassPathFirst=true

In addition, the user classpath is extended like this:

--conf spark.driver.userClassPath=/stackable/spark/extra-jars/*
--conf spark.executor.userClassPath=/stackable/spark/extra-jars/*

The contents of /stackable/spark/extra-jars/ is:

bash-4.4$ ls -l /stackable/spark/extra-jars/
total 1868
-rw-r--r-- 1 stackable stackable  126137 Feb 12 08:54 jackson-dataformat-xml-2.15.2.jar
-rw-r--r-- 1 stackable stackable  195909 Feb 12 08:54 stax2-api-4.2.1.jar
-rw-r--r-- 1 stackable stackable 1586395 Feb 12 08:54 woodstox-core-6.5.1.jar

Acceptance Criteria

Since this is an investigation ticket, the following outcomes are possible:

  • An integration test showcasing Stackable and Delta with PySpark and S3.
  • Updated operator documentation
  • An update to the Spark images to include Delta dependencies.
  • A new Spark image with with Delta dependencies.

Related PRs

Related Issues

@razvan razvan self-assigned this Feb 12, 2024
@razvan razvan moved this to Development: In Progress in Stackable Engineering Feb 12, 2024
@razvan razvan moved this from Development: In Progress to Development: Waiting for Review in Stackable Engineering Feb 13, 2024
@razvan razvan changed the title Investigate delta.io integration Bugfix: remove usage of "userClassPathFirst" properties [was: Investigate delta.io integration] Feb 13, 2024
@sbernauer sbernauer moved this from Development: Waiting for Review to Development: In Review in Stackable Engineering Feb 14, 2024
@sbernauer sbernauer self-assigned this Feb 14, 2024
@sbernauer sbernauer moved this from Development: In Review to Development: Done in Stackable Engineering Feb 16, 2024
@lfrancke lfrancke moved this from Development: Done to Done in Stackable Engineering Feb 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

3 participants