Skip to content

Dont set experimental userClassPathFirst configuration #296

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
razvan opened this issue Oct 16, 2023 · 2 comments
Closed

Dont set experimental userClassPathFirst configuration #296

razvan opened this issue Oct 16, 2023 · 2 comments

Comments

@razvan
Copy link
Member

razvan commented Oct 16, 2023

Affected version

No response

Current and expected behavior

We set the experimental spark.driver.userClassPathFirst and spark.executor.userClassPathFirst configs to true.
This causes Classpath issues once you pull in Java dependencies.

The problem is that there is no right or wrong way of doing things... Some jobs need this to be enabled, some need it to be disabled. We might just want to document the current state.

Possible solution

Don't set these experimental features.

Additional context

When dynamically loading extensions, like this:

      deps:
        packages:
          - org.apache.iceberg:iceberg-spark-runtime-3.4_2.12:1.4.0
          - org.apache.spark:spark-sql-kafka-0-10_2.12:3.4.0
      sparkConf:
        spark.driver.userClassPathFirst: "false"
        spark.executor.userClassPathFirst: "false"

An error occurs:

:: resolution report :: resolve 5610ms :: artifacts dl 1730ms
        :: modules in use:
        com.google.code.findbugs#jsr305;3.0.0 from central in [default]
        commons-logging#commons-logging;1.1.3 from central in [default]
        org.apache.commons#commons-pool2;2.11.1 from central in [default]
        org.apache.hadoop#hadoop-client-api;3.3.4 from central in [default]
        org.apache.hadoop#hadoop-client-runtime;3.3.4 from central in [default]
        org.apache.iceberg#iceberg-spark-runtime-3.4_2.12;1.4.0 from central in [default]
        org.apache.kafka#kafka-clients;3.3.2 from central in [default]
        org.apache.spark#spark-sql-kafka-0-10_2.12;3.4.0 from central in [default]
        org.apache.spark#spark-token-provider-kafka-0-10_2.12;3.4.0 from central in [default]
        org.lz4#lz4-java;1.8.0 from central in [default]
        org.slf4j#slf4j-api;2.0.6 from central in [default]
        org.xerial.snappy#snappy-java;1.1.9.1 from central in [default]
        ---------------------------------------------------------------------
        |                  |            modules            ||   artifacts   |
        |       conf       | number| search|dwnlded|evicted|| number|dwnlded|
        ---------------------------------------------------------------------
        |      default     |   12  |   12  |   12  |   0   ||   12  |   12  |
        ---------------------------------------------------------------------
:: retrieving :: org.apache.spark#spark-submit-parent-69012960-2916-4d39-9ea8-c688cb61be81
        confs: [default]
        12 artifacts copied, 0 already retrieved (84990kB/127ms)
SLF4J: A SLF4J service provider failed to instantiate:
org.slf4j.spi.SLF4JServiceProvider: org.apache.logging.slf4j.SLF4JServiceProvider not a subtype
SLF4J: No SLF4J providers were found.
SLF4J: Defaulting to no-operation (NOP) logger implementation
SLF4J: See https://www.slf4j.org/codes.html#noProviders for further details.
Exception in thread "main" java.lang.RuntimeException: java.lang.RuntimeException: class org.apache.hadoop.security.JniBasedUnixGroupsMappingWithFallback not org.apache.hadoop.security.GroupMappingServiceProvider
        at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2720)
        at org.apache.hadoop.security.Groups.<init>(Groups.java:107)
        at org.apache.hadoop.security.Groups.<init>(Groups.java:102)
        at org.apache.hadoop.security.Groups.getUserToGroupsMappingService(Groups.java:451)
        at org.apache.hadoop.security.UserGroupInformation.initialize(UserGroupInformation.java:338)
        at org.apache.hadoop.security.UserGroupInformation.ensureInitialized(UserGroupInformation.java:300)
        at org.apache.hadoop.security.UserGroupInformation.getCurrentUser(UserGroupInformation.java:575)
        at org.apache.hadoop.fs.FileSystem$Cache$Key.<init>(FileSystem.java:3746)
        at org.apache.hadoop.fs.FileSystem$Cache$Key.<init>(FileSystem.java:3736)
        at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:3520)
        at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:540)
        at org.apache.spark.util.DependencyUtils$.resolveGlobPath(DependencyUtils.scala:317)
        at org.apache.spark.util.DependencyUtils$.$anonfun$resolveGlobPaths$2(DependencyUtils.scala:273)
        at org.apache.spark.util.DependencyUtils$.$anonfun$resolveGlobPaths$2$adapted(DependencyUtils.scala:271)
        at scala.collection.TraversableLike.$anonfun$flatMap$1(TraversableLike.scala:293)
        at scala.collection.IndexedSeqOptimized.foreach(IndexedSeqOptimized.scala:36)
        at scala.collection.IndexedSeqOptimized.foreach$(IndexedSeqOptimized.scala:33)
        at scala.collection.mutable.WrappedArray.foreach(WrappedArray.scala:38)
        at scala.collection.TraversableLike.flatMap(TraversableLike.scala:293)
        at scala.collection.TraversableLike.flatMap$(TraversableLike.scala:290)
        at scala.collection.AbstractTraversable.flatMap(Traversable.scala:108)
        at org.apache.spark.util.DependencyUtils$.resolveGlobPaths(DependencyUtils.scala:271)
        at org.apache.spark.deploy.SparkSubmit.$anonfun$prepareSubmitEnvironment$4(SparkSubmit.scala:390)
        at scala.Option.map(Option.scala:230)
        at org.apache.spark.deploy.SparkSubmit.prepareSubmitEnvironment(SparkSubmit.scala:390)
        at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:955)
        at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:192)
        at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:215)
        at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:91)
        at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1111)
        at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1120)
        at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.lang.RuntimeException: class org.apache.hadoop.security.JniBasedUnixGroupsMappingWithFallback not org.apache.hadoop.security.GroupMappingServiceProvider
        at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2714)
        ... 31 more

Environment

No response

Would you like to work on fixing this bug?

None

@sbernauer sbernauer changed the title userClassPathFirst cannot be overwritten and it causes problems Dont set userClassPathFirst experimental configuration Oct 18, 2023
@sbernauer sbernauer changed the title Dont set userClassPathFirst experimental configuration Dont set experimental userClassPathFirst configuration Oct 18, 2023
@lfrancke lfrancke moved this from Next to Ideas Backlog in Stackable End-to-End Coordination Nov 8, 2023
@sbernauer
Copy link
Member

@razvan I guess this is a duplicate of #354?

@razvan
Copy link
Member Author

razvan commented Feb 14, 2024

Yes. Forgot about it.

@razvan razvan closed this as completed Feb 14, 2024
@lfrancke lfrancke moved this from Ideas Backlog to Done in Stackable End-to-End Coordination Apr 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Development

No branches or pull requests

2 participants