Spark

阅读(1385) 标签: sparkcli,

1. SparkCli外部库文件路径为:[安装目录]\esProc\extlib\ SparkCli;润乾外部库核心jarscu-spark-cli-2.10.jar

antlr-runtime-3.5.2.jar

antlr4-runtime-4.9.3.jar

arrow-memory-core-12.0.1.jar

arrow-vector-12.0.1.jar

avro-1.12.0.jar

avro-ipc-1.11.2.jar

avro-mapred-1.11.2.jar

breeze_2.12-2.1.0.jar

chill_2.12-0.10.0.jar

chill-java-0.10.0.jar

commons-collections-3.2.2.jar

commons-compiler-3.1.9.jar

commons-lang-2.6.jar

commons-lang3-3.12.0.jar

commons-text-1.10.0.jar

datanucleus-api-jdo-4.2.4.jar

datanucleus-core-4.1.17.jar

datanucleus-rdbms-4.1.19.jar

derby-10.14.2.0.jar

guava-14.0.1.jar

hadoop-aws-3.2.0.jar

hadoop-client-api-3.2.0.jar

hadoop-common-3.2.0.jar

hadoop-client-runtime-3.2.0.jar

hadoop-yarn-server-web-proxy-3.2.0.jar

hive-common-2.3.9.jar

hive-exec-2.3.9-core.jar

hive-jdbc-2.3.9.jar

hive-metastore-2.3.9.jar

hive-serde-2.3.9.jar

hive-shims-0.23-2.3.9.jar

hive-shims-common-2.3.9.jar

hive-storage-api-2.8.1.jar

htrace-core4-4.1.0-incubating.jar

iceberg-spark-runtime-3.5_2.12-1.7.0.jar

jackson-annotations-2.15.2.jar

jackson-core-2.15.2.jar

jackson-core-asl-1.9.13.jar

jackson-databind-2.15.2.jar

jackson-datatype-jsr310-2.15.2.jar

jackson-mapper-asl-1.9.13.jar

jackson-module-scala_2.12-2.15.2.jar

jakarta.servlet-api-4.0.3.jar

janino-3.1.9.jar

javax.jdo-3.2.0-m3.jar

jersey-container-servlet-2.40.jar

jersey-container-servlet-core-2.40.jar

jersey-server-2.40.jar

joda-time-2.15.2.jar

json4s-ast_2.12-3.7.0-M11.jar

json4s-core_2.12-3.7.0-M11.jar

json4s-jackson_2.12-3.7.0-M11.jar

json4s-scalap_2.12-3.7.0-M11.jar

jsr305-3.0.0.jar

kryo-shaded-4.0.2.jar

libfb303-0.9.3.jar

libthrift-0.12.0.jar

llz4-java-1.8.0.jar

log4j-1.2-api-2.20.0.jar

metrics-core-4.2.19.jar

metrics-graphite-4.2.19.jar

metrics-jmx-4.2.19.jar

metrics-json-4.2.19.jar

metrics-jvm-4.2.19.jar

minlog-1.3.0.jar

netty-buffer-4.1.96.Final.jar

netty-codec-4.1.96.Final.jar

netty-common-4.1.96.Final.jar

netty-handler-4.1.96.Final.jar

netty-transport-4.1.96.Final.jar

netty-transport-native-unix-common-4.1.96.Final.jar

objenesis-3.2.jar

orc-core-1.9.4-shaded-protobuf.jar

paranamer-2.8.jar

parquet-column-1.13.1.jar

parquet-common-1.13.1.jar

parquet-encoding-1.13.1.jar

parquet-format-structures-1.13.1.jar

parquet-hadoop-1.13.1.jar

parquet-jackson-1.13.1.jar

RoaringBitmap-0.9.47.jar

scala-compiler-2.12.18.jar

scala-library-2.12.18.jar

scala-reflect-2.12.18.jar

scala-xml_2.12-2.1.0.jar

slf4j-api-2.0.7.jar

slf4j-simple-1.7.31.jar

snappy-java-1.1.8.3.jar

spark-catalyst_2.12-3.5.3.jar

spark-common-utils_2.12-3.5.3.jar

spark-core_2.12-3.5.3.jar

spark-graphx_2.12-3.5.3.jar

spark-hive_2.12-3.5.3.jar

spark-hive-thriftserver_2.12-3.5.3.jar

spark-kvstore_2.12-3.5.3.jar

spark-launcher_2.12-3.5.3.jar

spark-mllib_2.12-3.5.3.jar

spark-mllib-local_2.12-3.5.3.jar

spark-network-common_2.12-3.5.3.jar

spark-network-shuffle_2.12-3.5.3.jar

spark-repl_2.12-3.5.3.jar

spark-sketch_2.12-3.5.3.jar

spark-sql_2.12-3.5.3.jar

spark-sql-api_2.12-3.5.3.jar

spark-streaming_2.12-3.5.3.jar

spark-tags_2.12-3.5.3.jar

spark-unsafe_2.12-3.5.3.jar

spark-yarn_2.12-3.5.3.jar

stax2-api-3.1.4.jar

stax-api-1.0.1.jar

stream-2.9.6.jar

transaction-api-1.1.jar

univocity-parsers-2.9.1.jar

woodstox-core-5.0.3.jar

xbean-asm9-shaded-4.23.jar

zaws-java-sdk-bundle-1.11.375.jar

zhudi-spark3.5-bundle_2.12-0.15.0.jar

zstd-jni-1.5.5-4.jar

注:以上第三方依赖jar,外部库压缩包中默认已放置,用户可根据实际应用环境灵活使用。

2. 自行搜索网络资源下载以下四个文件,放到[安装目录]\bin

hadoop.dll 

hadoop.lib 

libwinutils.lib 

winutils.exe

注意:windows环境下需要以上四个文件,Linux环境中不需要,并且winutils.exe区分x86x64

3.用户可以根据自己的操作需求,将不同的配置文件.properties放在scu-spark-cli-2.10.jar中。目前Spark外部库支持本地连接、Spark连接、Hudi/Iceberg格式Spark连接、与S3关联的Hudi/Iceberg格式Spark连接。

各类型配置文件参考如下,使用时可按需修改配置:

(1)本地连接时,jar中无需放置配置文件;

(2)Spark连接使用的spark.properties配置文件参考如下:

fs.default.name=hdfs://localhost:9000/

hive.metastore.uris=thrift://localhost:9083

hive.metastore.local=false

hive.metastore.warehouse.dir=/user/hive/warehouse

(3)Hudi格式Spark连接使用的hudi.properties配置文件参考如下:

spark.serializer=org.apache.spark.serializer.KryoSerializer

spark.sql.catalog.spark_catalog=org.apache.spark.sql.hudi.catalog.HoodieCatalog

spark.sql.extensions=org.apache.spark.sql.hudi.HoodieSparkSessionExtension

spark.kryo.registrator=org.apache.spark.HoodieSparkKryoRegistrar

spark.jars.packages=org.apache.hudi:hudi-spark3.5-bundle_2.12:0.15.0

spark.sql.catalog.warehouse.dir=hdfs://localhost:9000/user/hive/warehouse

spark.io.compression.codec=snappy

hive.metastore.uris=thrift://localhost:9083

master=local[*]

(4)Iceberg格式Spark连接使用的iceberg.properties配置文件参考如下:

spark.sql.extensions=org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions

spark.sql.catalog.local=org.apache.iceberg.spark.SparkCatalog

spark.sql.catalog.local.type=hadoop

spark.sql.catalog.local.warehouse=hdfs://localhost:9000/user/hive/warehouse

spark.io.compression.codec=lz4

hive.metastore.uris=thrift://localhost:9083

(5)S3关联的Hudi格式Spark连接使用的hudi-s3.properties配置文件参考如下:

spark.serializer=org.apache.spark.serializer.KryoSerializer

spark.hadoop.fs.s3a.endpoint=https://s3.cn-north-1.amazonaws.com.cn

spark.hadoop.fs.s3a.access.key=AKIUNAFNDCXFOIIIACXO

spark.hadoop.fs.s3a.secret.key=aYI3JBZUiG8kU3bck2H698o5O3Fv9hjDhoVQU0yP

spark.hadoop.fs.s3a.region=cn-north-1

spark.hadoop.fs.s3a.path.style.access=true

spark.hadoop.fs.s3a.connection.ssl.enabled=false

spark.hadoop.fs.s3a.impl=org.apache.hadoop.fs.s3a.S3AFileSystem

spark.sql.catalog.warehouse.dir=s3a://mytest/hudi

(6)S3关联的Iceberg格式Spark连接使用的iceberg-s3.properties配置文件参考如下:

spark.serializer=org.apache.spark.serializer.KryoSerializer

spark.hadoop.fs.s3a.endpoint=https://s3.cn-north-1.amazonaws.com.cn

spark.hadoop.fs.s3a.access.key=AKIUNAFNDCXFOIIIACXO

spark.hadoop.fs.s3a.secret.key=aYI3JBZUiG8kU3bck2H698o5O3Fv9hjDhoVQU0yP

spark.hadoop.fs.s3a.region=cn-north-1

spark.hadoop.fs.s3a.path.style.access=true

spark.hadoop.fs.s3a.connection.ssl.enabled=false

spark.hadoop.fs.s3a.impl=org.apache.hadoop.fs.s3a.S3AFileSystem

hive.metastore.schema.verification=false

spark.sql.extensions=org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions

spark.sql.catalog.local=org.apache.iceberg.spark.SparkCatalog

spark.sql.catalog.local.type=hadoop

spark.sql.catalog.local.warehouse=s3a://mytest/iceberg

4. SparkCli要求java环境为jdk11及以上版本,在使用前需要手动修改启动文件startup.bat/start.sh

l  startup.sh

#!/bin/bash

source [安装目录]/esproc/esProc/bin/setEnv.sh

$EXEC_JAVA $(jvm_args=$(sed -n 's/.*jvm_args=\(.*\).*/\1/p' "$START_HOME"/esProc/bin/config.txt)

echo " $jvm_args") -XX:+IgnoreUnrecognizedVMOptions --add-opens=java.base/java.net=ALL-UNNAMED --add-opens=java.base/sun.nio.ch=ALL-UNNAMED --add-opens=java.base/sun.nio.cs=ALL-UNNAMED --add-opens=java.base/java.nio=ALL-UNNAMED --add-opens=java.base/java.util=ALL-UNNAMED --add-opens=java.base/sun.util.calendar=ALL-UNNAMED --add-opens=java.base/java.lang.invoke=ALL-UNNAMED -cp "$START_HOME"/esProc/classes:"$START_HOME"/esProc/lib/*:"$START_HOME"/common/jdbc/* -Duser.language="$language" -Dstart.home="$START_HOME"/esProc  com.scudata.ide.spl.EsprocEE

l  startup.bat

@echo off

call "[安装目录]\esProc\bin\setEnv.bat"

start "dm" %EXECJAVAW% -XX:+IgnoreUnrecognizedVMOptions --add-opens=java.base/java.net=ALL-UNNAMED --add-opens=java.base/sun.nio.ch=ALL-UNNAMED --add-opens=java.base/sun.nio.cs=ALL-UNNAMED --add-opens=java.base/java.nio=ALL-UNNAMED --add-opens=java.base/java.util=ALL-UNNAMED --add-opens=java.base/sun.util.calendar=ALL-UNNAMED --add-opens=java.base/java.lang.invoke=ALL-UNNAMED -cp %START_HOME%\esProc\classes;%RAQCLASSPATH% -Duser.language=zh -Dstart.home=%START_HOME%\esProc -Djava.net.useSystemProxies=true  com.scudata.ide.spl.EsprocEE

5. 当占用内存比较大时,用户可自己调整内存Windows环境使用.exe启动时在config.txt里修改;使用.bat启动时在.bat文件中修改。Linux环境则是在.sh文件中修改。

windows环境下修改config.txt为例:

java_home=C:\ProgramFiles\Java\jdk1.7.0_11;esproc_port=48773;btx_port=41735;gtm_port=41737;jvm_args=-Xms256m -XX:PermSize=256M -XX:MaxPermSize=512M -Xmx9783m -Duser.language=zh

6. 访问Spark可使用的外部库函数有spark_open()spark_query()spark_hudi()spark_close()spark_read()spark_shell()。函数用法请参考【帮助】-【函数参考】。