Skip to content

fix: Remove userClassPathFirst properties #355

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 19 commits into from
Feb 15, 2024
Merged

fix: Remove userClassPathFirst properties #355

merged 19 commits into from
Feb 15, 2024

Conversation

razvan
Copy link
Member

@razvan razvan commented Feb 12, 2024

Fixes #354

  • feat(test): addded test for delta.io
  • fix: remove userClassPathFirst and extraClassPath properties.
  • feat: remove all refs to the extra-jars folder

This fails with:

```
spark     org/apache/spark/sql/delta/stats/StatisticsCollection$SqlParser$$anon$1.visitMultipartIdentifierList(Lorg/apache/spark/sql/catalyst/parser/SqlBaseParser$MultipartIdentifierListContext;)Lscala/collection/Seq; @17: invokevirtual
spark   Reason:
spark     Type 'org/apache/spark/sql/catalyst/parser/SqlBaseParser$MultipartIdentifierListContext' (current frame, stack[1]) is not assignable to 'org/antlr/v4/runtime/ParserRuleContext'
spark   Current Frame:
spark     bci: @17
spark     flags: { }
spark     locals: { 'org/apache/spark/sql/delta/stats/StatisticsCollection$SqlParser$$anon$1', 'org/apache/spark/sql/catalyst/parser/SqlBaseParser$MultipartIdentifierListContext' }
spark     stack: { 'org/apache/spark/sql/catalyst/parser/ParserUtils$', 'org/apache/spark/sql/catalyst/parser/SqlBaseParser$MultipartIdentifierListContext', 'scala/Option', 'scala/Function0' }
spark   Bytecode:
spark     0000000: b200 232b b200 23b6 0027 2a2b ba00 3f00
spark     0000010: 00b6 0043 c000 45b0
spark
spark     at org.apache.spark.sql.delta.stats.StatisticsCollection$SqlParser.<init>(StatisticsCollection.scala:409)
spark     at org.apache.spark.sql.delta.stats.StatisticsCollection$.<init>(StatisticsCollection.scala:422)
spark     at org.apache.spark.sql.delta.stats.StatisticsCollection$.<clinit>(StatisticsCollection.scala)
spark     at org.apache.spark.sql.delta.OptimisticTransactionImpl.updateMetadataInternal(OptimisticTransaction.scala:429)
spark     at org.apache.spark.sql.delta.OptimisticTransactionImpl.updateMetadataInternal$(OptimisticTransaction.scala:424)
spark     at org.apache.spark.sql.delta.OptimisticTransaction.updateMetadataInternal(OptimisticTransaction.scala:142)
spark     at org.apache.spark.sql.delta.OptimisticTransactionImpl.updateMetadata(OptimisticTransaction.scala:400)
spark     at org.apache.spark.sql.delta.OptimisticTransactionImpl.updateMetadata$(OptimisticTransaction.scala:393)
spark     at org.apache.spark.sql.delta.OptimisticTransaction.updateMetadata(OptimisticTransaction.scala:142)
spark     at org.apache.spark.sql.delta.schema.ImplicitMetadataOperation.updateMetadata(ImplicitMetadataOperation.scala:97)
spark     at org.apache.spark.sql.delta.schema.ImplicitMetadataOperation.updateMetadata$(ImplicitMetadataOperation.scala:56)
spark     at org.apache.spark.sql.delta.commands.WriteIntoDelta.updateMetadata(WriteIntoDelta.scala:76)
spark     at org.apache.spark.sql.delta.commands.WriteIntoDelta.write(WriteIntoDelta.scala:162)
spark     at org.apache.spark.sql.delta.commands.WriteIntoDelta.$anonfun$run$1(WriteIntoDelta.scala:105)

```
@razvan
Copy link
Member Author

razvan commented Feb 12, 2024

@razvan
Copy link
Member Author

razvan commented Feb 13, 2024

@razvan
Copy link
Member Author

razvan commented Feb 13, 2024

Tests pass on OpenShift 4.13:

--- PASS: kuttl (1670.68s)
    --- PASS: kuttl/harness (0.00s)
        --- PASS: kuttl/harness/spark-pi-private-s3_openshift-true_spark-3.5.0 (167.41s)
        --- PASS: kuttl/harness/spark-history-server_openshift-true_spark-3.5.0_s3-use-tls-true (409.71s)
        --- PASS: kuttl/harness/smoke_openshift-true_spark-3.5.0_s3-use-tls-true (282.40s)
        --- PASS: kuttl/harness/pyspark-ny-public-s3-image_openshift-true_spark-3.5.0_ny-tlc-report-0.1.0 (207.65s)
        --- PASS: kuttl/harness/pyspark-ny-public-s3_openshift-true_spark-3.5.0 (245.33s)
        --- PASS: kuttl/harness/spark-pi-public-s3_openshift-true_spark-3.5.0 (167.31s)
        --- PASS: kuttl/harness/iceberg_spark-3.5.0 (115.25s)
        --- PASS: kuttl/harness/spark-examples_openshift-true_spark-3.5.0 (102.49s)
        --- PASS: kuttl/harness/pod_overrides_openshift-true_spark-3.5.0 (231.25s)
        --- PASS: kuttl/harness/resources_openshift-true_spark-3.5.0 (131.30s)
        --- PASS: kuttl/harness/logging_openshift-true_spark-3.5.0_ny-tlc-report-0.1.0 (544.06s)
        --- PASS: kuttl/harness/delta_spark-delta-3.5.0_delta-3.1.0 (281.07s)
        --- PASS: kuttl/harness/spark-ny-public-s3_openshift-true_spark-3.5.0_s3-use-tls-true (231.64s)
PASS

@soenkeliebau
Copy link
Member

Tests pass on OpenShift 4.13:

this is super sexy :)

NickLarsenNZ
NickLarsenNZ previously approved these changes Feb 13, 2024
Copy link
Member

@NickLarsenNZ NickLarsenNZ left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@NickLarsenNZ
Copy link
Member

NickLarsenNZ commented Feb 13, 2024

re-running the failed job (503 from docker login)

@sbernauer sbernauer self-requested a review February 14, 2024 15:27
Copy link
Member

@sbernauer sbernauer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would be ok with merging this, but I would really like us solving the classpath problem with the logging libs instead of ripping out logging :)

@sbernauer sbernauer changed the title Remove userClassPathFirst properties fix: Remove userClassPathFirst properties Feb 14, 2024
@razvan
Copy link
Member Author

razvan commented Feb 14, 2024

I would be ok with merging this, but I would really like us solving the classpath problem with the logging libs instead of ripping out logging :)

What do you mean "ripping out logging" ? Logs from applications are not affected by this change.

@razvan
Copy link
Member Author

razvan commented Feb 14, 2024

sbernauer
sbernauer previously approved these changes Feb 15, 2024
Copy link
Member

@sbernauer sbernauer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you mean "ripping out logging"

That the logs of the spark-submit job will not end up in you logging sink if I understood correctly. Imagine a job that runs every hours did not properly start this night at 02:00. You don't have any insights what happened, e.g. the Kerberos server not being reachable.

But if you did not get it to work it is what it is I guess

@razvan razvan enabled auto-merge February 15, 2024 12:33
@razvan razvan added this pull request to the merge queue Feb 15, 2024
Merged via the queue into main with commit 98299e2 Feb 15, 2024
@razvan razvan deleted the feat/delta branch February 15, 2024 12:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Bugfix: remove usage of "userClassPathFirst" properties [was: Investigate delta.io integration]
5 participants