Zeppelin 搭建配置及使用

zeppelin_logo

Apache Zeppelin是一个让交互式数据分析变得可行的基于网页的开源框架。Zeppelin提供了数据分析、数据可视化等功能。Zeppelin 是一个提供交互数据分析且基于Web的笔记本。方便你做出可数据驱动的、可交互且可协作的精美文档,并且支持多种语言,包括 Scala(使用 Apache Spark)、Python(Apache Spark)、SparkSQL、 Hive、 Markdown、Shell等等。

环境

采用docker环境部署zeppelin

OS:CentOS 7.3.1611
JAVA:OpenJDK 1.8
zeppelin: 0.7.2

dockerfile:

1
2
3
4
5
6
7
8
FROM centos:7.3.1611

MAINTAINER zhoub

RUN yum update -y
RUN yum install -y java-1.8.0-openjdk.x86_64
RUN echo "JAVA_HOME=/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.141-1.b16.el7_3.x86_64/jre" | tee -a /etc/bashrc
RUN echo "export JAVA_HOME" | tee -a /etc/bashrc

下载安装

zeppelin官网或github上下载,推荐下载最新版。解压tar包,然后在准备好的环境中运行zeppelin/bin/zeppelin-daemon.sh start

docker:

1
2
3
4
5
6
7
#!/bin/sh

cmd="/root/run/entry.sh"
image="centos7.3:java1.8"
net="hadoop_net"

docker run -d --rm -w /root/ -v ${PWD}/run:/root/run -v ${PWD}/zeppelin:/root/zeppelin -v ${PWD}/hbase:/root/hbase -p 8080 --network ${net} -h zeppelin --name zeppelin ${image} ${cmd}

entry.sh:

1
2
3
4
5
6
7
8
#!/bin/sh

/root/zeppelin/bin/zeppelin-daemon.sh start

while true
do
sleep 10s
done

架构

zeppelin_arch
Zeppelin具有客户端/服务器架构,客户端一般就是指浏览器。服务器接收客户端的请求,并将请求通过Thrift协议发送给翻译器组。翻译器组物理表现为JVM进程,负责实际处理客户端的请求并与服务器进行通信。

zeppelin_interpreter_arch
翻译器是一个插件式的体系结构,允许任何语言/后端数据处理程序以插件的形式添加到Zeppelin中。当前的Zeppelin已经支持很多翻译器。插件式架构允许用户在Zeppelin中使用自己熟悉的特定程序语言或数据处理方式。例如,通过使用%spark翻译器,可以在Zeppelin中使用Scala语言代码。

配置

zeppelin的配置主要指interpreter的设置,然后note通过配置好的interpreter进行解释执行。

JDBC-hive配置

zeppelin想要通过jdbc连接hive需要对hive、hdfs、zeppelin三者进行配置

hive

需要在hive节点上启动hiveserver2服务nohup /shared-disk/apache-hive-2.1.1-bin/bin/hive --service hiveserver2&

zeepelin

拷贝连接hive-jdbc需要用到的jar包

将包hive-jdbc-1.1.0+cdh5.9.1+795-1.cdh5.9.1.p0.4.el7.noarch.rpm解开后的jar文件拷贝到zeppelin/lib/interpreter/目录下
hive-jdbc-1.1.0+cdh5.9.1+795-1.cdh5.9.1.p0.4.el7.noarch.rpm包内容:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
total 51448
-rw-r--r-- 1 root root 62050 Aug 25 10:18 commons-logging-1.1.3.jar
-rw-r--r-- 1 root root 19386631 Aug 25 10:18 hive-exec-1.1.0-cdh5.9.1.jar
lrwxrwxrwx 1 root root 28 Aug 25 10:18 hive-exec.jar -> hive-exec-1.1.0-cdh5.9.1.jar
-rw-r--r-- 1 root root 96598 Aug 25 10:19 hive-jdbc-1.1.0-cdh5.9.1.jar
-rw-r--r-- 1 root root 23635048 Aug 25 10:19 hive-jdbc-1.1.0-cdh5.9.1-standalone.jar
lrwxrwxrwx 1 root root 28 Aug 25 10:19 hive-jdbc.jar -> hive-jdbc-1.1.0-cdh5.9.1.jar
lrwxrwxrwx 1 root root 39 Aug 25 10:19 hive-jdbc-standalone.jar -> hive-jdbc-1.1.0-cdh5.9.1-standalone.jar
-rw-r--r-- 1 root root 5558969 Aug 25 10:19 hive-metastore-1.1.0-cdh5.9.1.jar
lrwxrwxrwx 1 root root 33 Aug 25 10:19 hive-metastore.jar -> hive-metastore-1.1.0-cdh5.9.1.jar
-rw-r--r-- 1 root root 827980 Aug 25 10:19 hive-serde-1.1.0-cdh5.9.1.jar
lrwxrwxrwx 1 root root 29 Aug 25 10:19 hive-serde.jar -> hive-serde-1.1.0-cdh5.9.1.jar
-rw-r--r-- 1 root root 2058121 Aug 25 10:19 hive-service-1.1.0-cdh5.9.1.jar
lrwxrwxrwx 1 root root 31 Aug 25 10:19 hive-service.jar -> hive-service-1.1.0-cdh5.9.1.jar
-rw-r--r-- 1 root root 313702 Aug 25 10:19 libfb303-0.9.3.jar
-rw-r--r-- 1 root root 234201 Aug 25 10:19 libthrift-0.9.3.jar
-rw-r--r-- 1 root root 481535 Aug 25 10:19 log4j-1.2.16.jar

配置jdbc interpreter

  • default.driver: org.apache.hive.jdbc.HiveDriver
  • default.url: jdbc:hive2://172.26.1.177:10000/default
    hiveserver2的默认端口为10000
  • default.user: root
    此处我使用root用户,后文中的hdfs的proxy设置也要用root

zeppelin_hive_jdbc_setting

hdfs

若zeppelin使用jdbc连接hive出错,报如下错误:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
Could not establish connection to jdbc:hive2://192.168.0.51:10000: Required field 'serverProtocolVersion' is unset! Struct:TOpenSessionResp(status:TStatus(statusCode:ERROR_STATUS, infoMessages:[*org.apache.hive.service.cli.HiveSQLException:Failed to open new session: java.lang.RuntimeException:
java.lang.RuntimeException:
org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.authorize.AuthorizationException): User: hive is not allowed to impersonate hive:13:12,
org.apache.hive.service.cli.session.SessionManager:openSession:SessionManager.java:266,
org.apache.hive.service.cli.CLIService:openSessionWithImpersonation:CLIService.java:202,
org.apache.hive.service.cli.thrift.ThriftCLIService:getSessionHandle:ThriftCLIService.java:402,
org.apache.hive.service.cli.thrift.ThriftCLIService:OpenSession:ThriftCLIService.java:297,
org.apache.hive.service.cli.thrift.TCLIService$Processor$OpenSession:getResult:TCLIService.java:1253,
org.apache.hive.service.cli.thrift.TCLIService$Processor$OpenSession:getResult:TCLIService.java:1238,
org.apache.thrift.ProcessFunction:process:ProcessFunction.java:39,
org.apache.thrift.TBaseProcessor:process:TBaseProcessor.java:39,
org.apache.hive.service.auth.TSetIpAddressProcessor:process:TSetIpAddressProcessor.java:56,
org.apache.thrift.server.TThreadPoolServer$WorkerProcess:run:TThreadPoolServer.java:285,
java.util.concurrent.ThreadPoolExecutor:runWorker:ThreadPoolExecutor.java:1145,
java.util.concurrent.ThreadPoolExecutor$Worker:run:ThreadPoolExecutor.java:615,
java.lang.Thread:run:Thread.java:745,
*java.lang.RuntimeException:java.lang.RuntimeException:
org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.authorize.AuthorizationException): User: hive is not allowed to impersonate hive:21:8,
org.apache.hive.service.cli.session.HiveSessionProxy:invoke:HiveSessionProxy.java:83,
org.apache.hive.service.cli.session.HiveSessionProxy:access$000:HiveSessionProxy.java:36,
org.apache.hive.service.cli.session.HiveSessionProxy$1:run:HiveSessionProxy.java:63,
java.security.AccessController:doPrivileged:AccessController.java:-2,
javax.security.auth.Subject:doAs:Subject.java:415,
org.apache.hadoop.security.UserGroupInformation:doAs:UserGroupInformation.java:1657,
org.apache.hive.service.cli.session.HiveSessionProxy:invoke:HiveSessionProxy.java:59,
com.sun.proxy.$Proxy19:open::-1,
org.apache.hive.service.cli.session.SessionManager:openSession:SessionManager.java:258,
*java.lang.RuntimeException:
org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.authorize.AuthorizationException): User: hive is not allowed to impersonate hive:26:5,
org.apache.hadoop.hive.ql.session.SessionState:start:SessionState.java:494,
org.apache.hive.service.cli.session.HiveSessionImpl:open:HiveSessionImpl.java:137,
sun.reflect.GeneratedMethodAccessor11:invoke::-1,
sun.reflect.DelegatingMethodAccessorImpl:invoke:DelegatingMethodAccessorImpl.java:43,
java.lang.reflect.Method:invoke:Method.java:606,
org.apache.hive.service.cli.session.HiveSessionProxy:invoke:HiveSessionProxy.java:78,
*org.apache.hadoop.ipc.RemoteException:User: hive is not allowed to impersonate hive:45:19,
org.apache.hadoop.ipc.Client:call:Client.java:1427,
org.apache.hadoop.ipc.Client:call:Client.java:1358,
org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker:invoke:ProtobufRpcEngine.java:229,
com.sun.proxy.$Proxy14:getFileInfo::-1,
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB:getFileInfo:ClientNamenodeProtocolTranslatorPB.java:771,
sun.reflect.GeneratedMethodAccessor7:invoke::-1,
sun.reflect.DelegatingMethodAccessorImpl:invoke:DelegatingMethodAccessorImpl.java:43,
java.lang.reflect.Method:invoke:Method.java:606,
org.apache.hadoop.io.retry.RetryInvocationHandler:invokeMethod:RetryInvocationHandler.java:252,
org.apache.hadoop.io.retry.RetryInvocationHandler:invoke:RetryInvocationHandler.java:104,
com.sun.proxy.$Proxy15:getFileInfo::-1,
org.apache.hadoop.hdfs.DFSClient:getFileInfo:DFSClient.java:2116,
org.apache.hadoop.hdfs.DistributedFileSystem$22:doCall:DistributedFileSystem.java:1315,
org.apache.hadoop.hdfs.DistributedFileSystem$22:doCall:DistributedFileSystem.java:1311,
org.apache.hadoop.fs.FileSystemLinkResolver:resolve:FileSystemLinkResolver.java:81,
org.apache.hadoop.hdfs.DistributedFileSystem:getFileStatus:DistributedFileSystem.java:1311,
org.apache.hadoop.fs.FileSystem:exists:FileSystem.java:1424,
org.apache.hadoop.hive.ql.session.SessionState:createRootHDFSDir:SessionState.java:568,
org.apache.hadoop.hive.ql.session.SessionState:createSessionDirs:SessionState.java:526,
org.apache.hadoop.hive.ql.session.SessionState:start:SessionState.java:480],
errorCode:0, errorMessage:Failed to open new session: java.lang.RuntimeException: java.lang.RuntimeException: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.authorize.AuthorizationException): User: hive is not allowed to impersonate hive), serverProtocolVersion:null)

需要在hdfs core-site.xml中增加配置

1
2
3
4
5
6
7
8
9
10
11
...
<property>
<name>hadoop.proxyuser.hive.groups</name>
<value>*</value>
</property>

<property>
<name>hadoop.proxyuser.hive.hosts</name>
<value>*</value>
</property>
...

然后重启hdfs namenode

使用

zeppelin_main_ui

参考&鸣谢