Pyspark with Aliyun's prebuild SDK doesn't seem to work.
Env:
- Apache Spark 2.2.0 on virtual machine
- OSS which is created under same user account
- Tokyo Region (ap-northeast-1)
This is my snippet for reproduce symptom.
Command:
/opt/spark/bin/pyspark --master="mesos://${MASTER}" --executor-memory 12g --jars /home/admin/aliyun-emapreduce-sdk/prebuild/emr-core-1.1.3-SNAPSHOT.jar,/home/admin/aliyun-emapreduce-sdk/prebuild/emr-sdk_2.10-1.1.3-SNAPSHOT.jar --conf "spark.hadoop.fs.oss.impl"="com.aliyun.fs.oss.nat.NativeOssFileSystem"
PySpark code:
from pyspark import SparkConf
conf = SparkConf()
conf.set("spark.hadoop.fs.oss.impl", "com.aliyun.fs.oss.nat.NativeOssFileSystem")
conf.set("spark.executor.memory", "12g")
conf.set("spark.python.worker.memory", "8g")
from pyspark.sql import SparkSession
spark = SparkSession.builder.config(conf=conf).getOrCreate()
# read from local hdfs
df = spark.read.parquet("hdfs://10.1.185.28:9000/User/admin/nyc/yellow.parquet")
# [failed] write to Aliyun OSS
outPathBase = "oss://MyOSSID:MySecretKey@oss-ap-northeast-1-internal.aliyuncs.com/test"
df.write.parquet(outPathBase+"/yellow.parquet")
Here is error like this.
Py4JJavaError: An error occurred while calling o74.parquet.
: java.lang.NoClassDefFoundError: com/aliyun/oss/ServiceException
at com.aliyun.fs.oss.nat.JetOssNativeFileSystemStore.initialize(JetOssNativeFileSystemStore.java:107)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
(snip)
Caused by: java.lang.ClassNotFoundException: com.aliyun.oss.ServiceException
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
Any advice ?