1

I'm using Spark with MongoDB, and consequently rely on the mongo-hadoop drivers. I got things working thanks to input on my original question here.

My Spark job is running, however, I receive warnings that I don't understand. When I run this command

$SPARK_HOME/bin/spark-submit --driver-class-path /usr/local/share/mongo-hadoop/build/libs/mongo-hadoop-1.5.0-SNAPSHOT.jar:/usr/local/share/mongo-hadoop/spark/build/libs/mongo-hadoop-spark-1.5.0-SNAPSHOT.jar --jars /usr/local/share/mongo-hadoop/build/libs/mongo-hadoop-1.5.0-SNAPSHOT.jar:/usr/local/share/mongo-hadoop/spark/build/libs/mongo-hadoop-spark-1.5.0-SNAPSHOT.jar my_application.py

it works, but gives me the following warning message

Warning: Local jar /usr/local/share/mongo-hadoop/build/libs/mongo-hadoop-1.5.0-SNAPSHOT.jar:/usr/local/share/mongo-hadoop/spark/build/libs/mongo-hadoop-spark-1.5.0-SNAPSHOT.jar does not exist, skipping.

When I was trying to get this working, if I left out those paths when submitting the job it wouldn't run at all. Now, however, if I leave out those paths it does run

$SPARK_HOME/bin/spark-submit  my_application.py

Can someone please explain what is going on here? I have looked through similar questions here referencing the same warning, and searched through the documentation.

By setting the options once are they stored as environment variables or something? I'm glad it works, but wary that I don't fully understand why sometimes and not others.

4

2 回答 2

2

问题是CLASSPATH应该用冒号分隔,而JARS应该用逗号分隔:

$SPARK_HOME/bin/spark-submit \
--driver-class-path /usr/local/share/mongo-hadoop/build/libs/mongo-hadoop-1.5.0-SNAPSHOT.jar:/usr/local/share/mongo-hadoop/spark/build/libs/mongo-hadoop-spark-1.5.0-SNAPSHOT.jar \
--jars /usr/local/share/mongo-hadoop/build/libs/mongo-hadoop-1.5.0-SNAPSHOT.jar,/usr/local/share/mongo-hadoop/spark/build/libs/mongo-hadoop-spark-1.5.0-SNAPSHOT.jar my_application.py
于 2015-11-28T00:27:19.603 回答
1

在 Zero323 答案之上添加

我认为这样做的更好方法是

$SPARK_HOME/bin/spark-submit \
--driver-class-path  $(echo /usr/local/share/mongo-hadoop/build/libs/*.jar | tr ' ' ',') \
--jars $(echo /usr/local/share/mongo-hadoop/build/libs/*.jar | tr ' ' ',') my_application.py

在这种方法中,您不会在类路径中错误地错过任何 jar,因此不会出现警告。

于 2016-04-30T16:07:10.940 回答