1. SparkCli外部库文件路径为:[安装目录]\esProc\extlib\ SparkCli;润乾外部库核心jar为scu-spark-cli-2.10.jar。
antlr-runtime-3.5.2.jar
antlr4-runtime-4.9.3.jar
arrow-memory-core-12.0.1.jar
arrow-vector-12.0.1.jar
avro-1.12.0.jar
avro-ipc-1.11.2.jar
avro-mapred-1.11.2.jar
breeze_2.12-2.1.0.jar
chill_2.12-0.10.0.jar
chill-java-0.10.0.jar
commons-collections-3.2.2.jar
commons-compiler-3.1.9.jar
commons-lang-2.6.jar
commons-lang3-3.12.0.jar
commons-text-1.10.0.jar
datanucleus-api-jdo-4.2.4.jar
datanucleus-core-4.1.17.jar
datanucleus-rdbms-4.1.19.jar
derby-10.14.2.0.jar
guava-14.0.1.jar
hadoop-aws-3.2.0.jar
hadoop-client-api-3.2.0.jar
hadoop-common-3.2.0.jar
hadoop-client-runtime-3.2.0.jar
hadoop-yarn-server-web-proxy-3.2.0.jar
hive-common-2.3.9.jar
hive-exec-2.3.9-core.jar
hive-jdbc-2.3.9.jar
hive-metastore-2.3.9.jar
hive-serde-2.3.9.jar
hive-shims-0.23-2.3.9.jar
hive-shims-common-2.3.9.jar
hive-storage-api-2.8.1.jar
htrace-core4-4.1.0-incubating.jar
iceberg-spark-runtime-3.5_2.12-1.7.0.jar
jackson-annotations-2.15.2.jar
jackson-core-2.15.2.jar
jackson-core-asl-1.9.13.jar
jackson-databind-2.15.2.jar
jackson-datatype-jsr310-2.15.2.jar
jackson-mapper-asl-1.9.13.jar
jackson-module-scala_2.12-2.15.2.jar
jakarta.servlet-api-4.0.3.jar
janino-3.1.9.jar
javax.jdo-3.2.0-m3.jar
jersey-container-servlet-2.40.jar
jersey-container-servlet-core-2.40.jar
jersey-server-2.40.jar
joda-time-2.15.2.jar
json4s-ast_2.12-3.7.0-M11.jar
json4s-core_2.12-3.7.0-M11.jar
json4s-jackson_2.12-3.7.0-M11.jar
json4s-scalap_2.12-3.7.0-M11.jar
jsr305-3.0.0.jar
kryo-shaded-4.0.2.jar
libfb303-0.9.3.jar
libthrift-0.12.0.jar
llz4-java-1.8.0.jar
log4j-1.2-api-2.20.0.jar
metrics-core-4.2.19.jar
metrics-graphite-4.2.19.jar
metrics-jmx-4.2.19.jar
metrics-json-4.2.19.jar
metrics-jvm-4.2.19.jar
minlog-1.3.0.jar
netty-buffer-4.1.96.Final.jar
netty-codec-4.1.96.Final.jar
netty-common-4.1.96.Final.jar
netty-handler-4.1.96.Final.jar
netty-transport-4.1.96.Final.jar
netty-transport-native-unix-common-4.1.96.Final.jar
objenesis-3.2.jar
orc-core-1.9.4-shaded-protobuf.jar
paranamer-2.8.jar
parquet-column-1.13.1.jar
parquet-common-1.13.1.jar
parquet-encoding-1.13.1.jar
parquet-format-structures-1.13.1.jar
parquet-hadoop-1.13.1.jar
parquet-jackson-1.13.1.jar
RoaringBitmap-0.9.47.jar
scala-compiler-2.12.18.jar
scala-library-2.12.18.jar
scala-reflect-2.12.18.jar
scala-xml_2.12-2.1.0.jar
slf4j-api-2.0.7.jar
slf4j-simple-1.7.31.jar
snappy-java-1.1.8.3.jar
spark-catalyst_2.12-3.5.3.jar
spark-common-utils_2.12-3.5.3.jar
spark-core_2.12-3.5.3.jar
spark-graphx_2.12-3.5.3.jar
spark-hive_2.12-3.5.3.jar
spark-hive-thriftserver_2.12-3.5.3.jar
spark-kvstore_2.12-3.5.3.jar
spark-launcher_2.12-3.5.3.jar
spark-mllib_2.12-3.5.3.jar
spark-mllib-local_2.12-3.5.3.jar
spark-network-common_2.12-3.5.3.jar
spark-network-shuffle_2.12-3.5.3.jar
spark-repl_2.12-3.5.3.jar
spark-sketch_2.12-3.5.3.jar
spark-sql_2.12-3.5.3.jar
spark-sql-api_2.12-3.5.3.jar
spark-streaming_2.12-3.5.3.jar
spark-tags_2.12-3.5.3.jar
spark-unsafe_2.12-3.5.3.jar
spark-yarn_2.12-3.5.3.jar
stax2-api-3.1.4.jar
stax-api-1.0.1.jar
stream-2.9.6.jar
transaction-api-1.1.jar
univocity-parsers-2.9.1.jar
woodstox-core-5.0.3.jar
xbean-asm9-shaded-4.23.jar
zaws-java-sdk-bundle-1.11.375.jar
zhudi-spark3.5-bundle_2.12-0.15.0.jar
zstd-jni-1.5.5-4.jar
注:以上第三方依赖jar,外部库压缩包中默认已放置,用户可根据实际应用环境灵活使用。
2. 自行搜索网络资源下载以下四个文件,放到[安装目录]\bin下
hadoop.dll
hadoop.lib
libwinutils.lib
winutils.exe
注意:windows环境下需要以上四个文件,Linux环境中不需要,并且winutils.exe区分x86与x64。
3.用户可以根据自己的操作需求,将不同的配置文件.properties放在scu-spark-cli-2.10.jar中。目前Spark外部库支持本地连接、Spark连接、Hudi/Iceberg格式Spark连接、与S3关联的Hudi/Iceberg格式Spark连接。
各类型配置文件参考如下,使用时可按需修改配置:
(1)本地连接时,jar中无需放置配置文件;
(2)Spark连接使用的spark.properties配置文件参考如下:
fs.default.name=hdfs://localhost:9000/
hive.metastore.uris=thrift://localhost:9083
hive.metastore.local=false
hive.metastore.warehouse.dir=/user/hive/warehouse
(3)Hudi格式Spark连接使用的hudi.properties配置文件参考如下:
spark.serializer=org.apache.spark.serializer.KryoSerializer
spark.sql.catalog.spark_catalog=org.apache.spark.sql.hudi.catalog.HoodieCatalog
spark.sql.extensions=org.apache.spark.sql.hudi.HoodieSparkSessionExtension
spark.kryo.registrator=org.apache.spark.HoodieSparkKryoRegistrar
spark.jars.packages=org.apache.hudi:hudi-spark3.5-bundle_2.12:0.15.0
spark.sql.catalog.warehouse.dir=hdfs://localhost:9000/user/hive/warehouse
spark.io.compression.codec=snappy
hive.metastore.uris=thrift://localhost:9083
master=local[*]
(4)Iceberg格式Spark连接使用的iceberg.properties配置文件参考如下:
spark.sql.extensions=org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions
spark.sql.catalog.local=org.apache.iceberg.spark.SparkCatalog
spark.sql.catalog.local.type=hadoop
spark.sql.catalog.local.warehouse=hdfs://localhost:9000/user/hive/warehouse
spark.io.compression.codec=lz4
hive.metastore.uris=thrift://localhost:9083
(5)与S3关联的Hudi格式Spark连接使用的hudi-s3.properties配置文件参考如下:
spark.serializer=org.apache.spark.serializer.KryoSerializer
spark.hadoop.fs.s3a.endpoint=https://s3.cn-north-1.amazonaws.com.cn
spark.hadoop.fs.s3a.access.key=AKIUNAFNDCXFOIIIACXO
spark.hadoop.fs.s3a.secret.key=aYI3JBZUiG8kU3bck2H698o5O3Fv9hjDhoVQU0yP
spark.hadoop.fs.s3a.region=cn-north-1
spark.hadoop.fs.s3a.path.style.access=true
spark.hadoop.fs.s3a.connection.ssl.enabled=false
spark.hadoop.fs.s3a.impl=org.apache.hadoop.fs.s3a.S3AFileSystem
spark.sql.catalog.warehouse.dir=s3a://mytest/hudi
(6)与S3关联的Iceberg格式Spark连接使用的iceberg-s3.properties配置文件参考如下:
spark.serializer=org.apache.spark.serializer.KryoSerializer
spark.hadoop.fs.s3a.endpoint=https://s3.cn-north-1.amazonaws.com.cn
spark.hadoop.fs.s3a.access.key=AKIUNAFNDCXFOIIIACXO
spark.hadoop.fs.s3a.secret.key=aYI3JBZUiG8kU3bck2H698o5O3Fv9hjDhoVQU0yP
spark.hadoop.fs.s3a.region=cn-north-1
spark.hadoop.fs.s3a.path.style.access=true
spark.hadoop.fs.s3a.connection.ssl.enabled=false
spark.hadoop.fs.s3a.impl=org.apache.hadoop.fs.s3a.S3AFileSystem
hive.metastore.schema.verification=false
spark.sql.extensions=org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions
spark.sql.catalog.local=org.apache.iceberg.spark.SparkCatalog
spark.sql.catalog.local.type=hadoop
spark.sql.catalog.local.warehouse=s3a://mytest/iceberg
4. SparkCli要求java环境为jdk11及以上版本,在使用前需要手动修改启动文件startup.bat/start.sh:
l startup.sh
#!/bin/bash
source [安装目录]/esproc/esProc/bin/setEnv.sh
$EXEC_JAVA $(jvm_args=$(sed -n 's/.*jvm_args=\(.*\).*/\1/p' "$START_HOME"/esProc/bin/config.txt)
echo " $jvm_args") -XX:+IgnoreUnrecognizedVMOptions --add-opens=java.base/java.net=ALL-UNNAMED --add-opens=java.base/sun.nio.ch=ALL-UNNAMED --add-opens=java.base/sun.nio.cs=ALL-UNNAMED --add-opens=java.base/java.nio=ALL-UNNAMED --add-opens=java.base/java.util=ALL-UNNAMED --add-opens=java.base/sun.util.calendar=ALL-UNNAMED --add-opens=java.base/java.lang.invoke=ALL-UNNAMED -cp "$START_HOME"/esProc/classes:"$START_HOME"/esProc/lib/*:"$START_HOME"/common/jdbc/* -Duser.language="$language" -Dstart.home="$START_HOME"/esProc com.scudata.ide.spl.EsprocEE
l startup.bat
@echo off
call "[安装目录]\esProc\bin\setEnv.bat"
start "dm" %EXECJAVAW% -XX:+IgnoreUnrecognizedVMOptions --add-opens=java.base/java.net=ALL-UNNAMED --add-opens=java.base/sun.nio.ch=ALL-UNNAMED --add-opens=java.base/sun.nio.cs=ALL-UNNAMED --add-opens=java.base/java.nio=ALL-UNNAMED --add-opens=java.base/java.util=ALL-UNNAMED --add-opens=java.base/sun.util.calendar=ALL-UNNAMED --add-opens=java.base/java.lang.invoke=ALL-UNNAMED -cp %START_HOME%\esProc\classes;%RAQCLASSPATH% -Duser.language=zh -Dstart.home=%START_HOME%\esProc -Djava.net.useSystemProxies=true com.scudata.ide.spl.EsprocEE
5. 当占用内存比较大时,用户可自己调整内存。Windows环境使用.exe启动时在config.txt里修改;使用.bat启动时在.bat文件中修改。Linux环境则是在.sh文件中修改。
以windows环境下修改config.txt为例:
java_home=C:\ProgramFiles\Java\jdk1.7.0_11;esproc_port=48773;btx_port=41735;gtm_port=41737;jvm_args=-Xms256m -XX:PermSize=256M -XX:MaxPermSize=512M -Xmx9783m -Duser.language=zh
6. 访问Spark可使用的外部库函数有spark_open()、spark_query()、spark_hudi()、spark_close()、spark_read()、spark_shell()。函数用法请参考【帮助】-【函数参考】。