Phoenix 二级索引

在Hbase中,只有一个单一的按照字典序排序的rowKey索引,当使用rowKey来进行数据查询的时候速度较快,但是如果不使用rowKey来查询的话就会使用filter来对全表进行扫描,很大程度上降低了检索性能。而Phoenix提供了二级索引技术来应对这种使用rowKey之外的条件进行检索的场景。

Global Index

Global indexing适用于多读少写的业务场景。使用Global indexing的话在写数据的时候会消耗大量开销,因为所有对数据表的更新操作(DELETE, UPSERT VALUES and UPSERT SELECT),会引起索引表的更新,而索引表是分布在不同的数据节点上的,跨节点的数据传输带来了较大的性能消耗。在读数据的时候Phoenix会选择索引表来降低查询消耗的时间。 在默认情况下如果想查询的字段不是索引字段的话索引表不会被使用,也就是说不会带来查询速度的提升。

配置

HBase集群的每个regionserver节点的hbase-site.xml中加入配置,并重启HBase集群

1
2
3
4
<property>
<name>hbase.regionserver.wal.codec</name>
<value>org.apache.hadoop.hbase.regionserver.wal.IndexedWALEditCodec</value>
</property>

注: phoenix有两种链接方式JDBC和phoenix-client,phoenix-client可以正常创建索引;zeppelin使用jdbc连接不能正常创建索引,提示如下错误。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
java.sql.SQLException: ERROR 1029 (42Y88): Mutable secondary indexes must have the hbase.regionserver.wal.codec property set to org.apache.hadoop.hbase.regionserver.wal.IndexedWALEditCodec in the hbase-sites.xml of every region server. tableName=CMP_IDX
at org.apache.phoenix.exception.SQLExceptionCode$Factory$1.newException(SQLExceptionCode.java:464)
at org.apache.phoenix.exception.SQLExceptionInfo.buildException(SQLExceptionInfo.java:150)
at org.apache.phoenix.schema.MetaDataClient.createIndex(MetaDataClient.java:1293)
at org.apache.phoenix.compile.CreateIndexCompiler$1.execute(CreateIndexCompiler.java:85)
at org.apache.phoenix.jdbc.PhoenixStatement$2.call(PhoenixStatement.java:358)
at org.apache.phoenix.jdbc.PhoenixStatement$2.call(PhoenixStatement.java:341)
at org.apache.phoenix.call.CallRunner.run(CallRunner.java:53)
at org.apache.phoenix.jdbc.PhoenixStatement.executeMutation(PhoenixStatement.java:340)
at org.apache.phoenix.jdbc.PhoenixStatement.execute(PhoenixStatement.java:1511)
at org.apache.commons.dbcp2.DelegatingStatement.execute(DelegatingStatement.java:291)
at org.apache.commons.dbcp2.DelegatingStatement.execute(DelegatingStatement.java:291)
at org.apache.zeppelin.jdbc.JDBCInterpreter.executeSql(JDBCInterpreter.java:581)
at org.apache.zeppelin.jdbc.JDBCInterpreter.interpret(JDBCInterpreter.java:692)
at org.apache.zeppelin.interpreter.LazyOpenInterpreter.interpret(LazyOpenInterpreter.java:97)
at org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer$InterpretJob.jobRun(RemoteInterpreterServer.java:498)
at org.apache.zeppelin.scheduler.Job.run(Job.java:175)
at org.apache.zeppelin.scheduler.ParallelScheduler$JobRunner.run(ParallelScheduler.java:162)
at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
at java.util.concurrent.FutureTask.run(Unknown Source)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(Unknown Source)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.lang.Thread.run(Unknown Source)

索引操作

在开始创建索引以前先创建一个表,并向其中填充测试数据(此处填充的数据仅用于功能测试)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
0: jdbc:phoenix:> create table usertable (id varchar primary key,firstname varchar, lastname varchar);
0: jdbc:phoenix:> !tables
+------------+--------------+-------------------------+---------------+----------+------------+----------------------------+-------------+
| TABLE_CAT | TABLE_SCHEM | TABLE_NAME | TABLE_TYPE | REMARKS | TYPE_NAME | SELF_REFERENCING_COL_NAME | REF_GENERAT |
+------------+--------------+-------------------------+---------------+----------+------------+----------------------------+-------------+
| | SYSTEM | CATALOG | SYSTEM TABLE | | | | |
| | SYSTEM | FUNCTION | SYSTEM TABLE | | | | |
| | SYSTEM | SEQUENCE | SYSTEM TABLE | | | | |
| | SYSTEM | STATS | SYSTEM TABLE | | | | |
| | | USERTABLE | TABLE | | | | |
+------------+--------------+-------------------------+---------------+----------+------------+----------------------------+-------------+

0: jdbc:phoenix:> upsert into usertable(id, firstname, lastname) values('1', 'Hello', 'world');
...
0: jdbc:phoenix:> upsert into usertable(id, firstname, lastname) values('5', 'Hello', 'phoenix');
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
base(main):019:0> scan 'USERTABLE'
ROW COLUMN+CELL
1 column=0:FIRSTNAME, timestamp=1508397473229, value=Hello
1 column=0:LASTNAME, timestamp=1508397473229, value=world
1 column=0:_0, timestamp=1508397473229, value=x
2 column=0:FIRSTNAME, timestamp=1508397529852, value=Hello
2 column=0:LASTNAME, timestamp=1508397529852, value=world
2 column=0:_0, timestamp=1508397529852, value=x
3 column=0:FIRSTNAME, timestamp=1508397535836, value=Hello
3 column=0:LASTNAME, timestamp=1508397535836, value=world
3 column=0:_0, timestamp=1508397535836, value=x
4 column=0:FIRSTNAME, timestamp=1508397643594, value=Hello
4 column=0:LASTNAME, timestamp=1508397643594, value=phoenix
4 column=0:_0, timestamp=1508397643594, value=x
5 column=0:FIRSTNAME, timestamp=1508397648938, value=Hello
5 column=0:LASTNAME, timestamp=1508397648938, value=phoenix
5 column=0:_0, timestamp=1508397648938, value=x
5 row(s) in 0.3220 seconds

创建索引

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
0: jdbc:phoenix:> create index idx_name on usertable (lastname) include(firstname);
0: jdbc:phoenix:> 0: jdbc:phoenix:> !tables
+------------+--------------+-------------------------+---------------+----------+------------+----------------------------+-------------+
| TABLE_CAT | TABLE_SCHEM | TABLE_NAME | TABLE_TYPE | REMARKS | TYPE_NAME | SELF_REFERENCING_COL_NAME | REF_GENERAT |
+------------+--------------+-------------------------+---------------+----------+------------+----------------------------+-------------+
| | | IDX_NAME | INDEX | | | | |
| | SYSTEM | CATALOG | SYSTEM TABLE | | | | |
| | SYSTEM | FUNCTION | SYSTEM TABLE | | | | |
| | SYSTEM | SEQUENCE | SYSTEM TABLE | | | | |
| | SYSTEM | STATS | SYSTEM TABLE | | | | |
| | | USERTABLE | TABLE | | | | |
+------------+--------------+-------------------------+---------------+----------+------------+----------------------------+-------------+
0: jdbc:phoenix:> !index usertable
+------------+--------------+-------------+-------------+------------------+-------------+-------+-------------------+--------------+----+
| TABLE_CAT | TABLE_SCHEM | TABLE_NAME | NON_UNIQUE | INDEX_QUALIFIER | INDEX_NAME | TYPE | ORDINAL_POSITION | COLUMN_NAME | AS |
+------------+--------------+-------------+-------------+------------------+-------------+-------+-------------------+--------------+----+
| | | USERTABLE | true | | IDX_NAME | 3 | 1 | 0:LASTNAME | A |
| | | USERTABLE | true | | IDX_NAME | 3 | 2 | :ID | A |
| | | USERTABLE | true | | IDX_NAME | 3 | 3 | 0:FIRSTNAME | |
+------------+--------------+-------------+-------------+------------------+-------------+-------+-------------------+--------------+----+
1
2
3
4
5
6
7
8
9
10
11
12
13
hbase(main):020:0> scan 'IDX_NAME'
ROW COLUMN+CELL
phoenix\x004 column=0:0:FIRSTNAME, timestamp=1508397686228, value=Hello
phoenix\x004 column=0:_0, timestamp=1508397686228, value=x
phoenix\x005 column=0:0:FIRSTNAME, timestamp=1508397686228, value=Hello
phoenix\x005 column=0:_0, timestamp=1508397686228, value=x
world\x001 column=0:0:FIRSTNAME, timestamp=1508397686228, value=Hello
world\x001 column=0:_0, timestamp=1508397686228, value=x
world\x002 column=0:0:FIRSTNAME, timestamp=1508397686228, value=Hello
world\x002 column=0:_0, timestamp=1508397686228, value=x
world\x003 column=0:0:FIRSTNAME, timestamp=1508397686228, value=Hello
world\x003 column=0:_0, timestamp=1508397686228, value=x
5 row(s) in 0.3180 seconds

使用索引

正常的select … where … 是不会用到索引表的,要想用到索引表,必须查询出的字段也是索引字段。(此处的结论需要在后续的性能测试中进行验证)

1
2
3
4
5
6
7
0: jdbc:phoenix:> select firstname from usertable where lastname = 'phoenix';
+------------+
| FIRSTNAME |
+------------+
| Hello |
| Hello |
+------------+

性能测试

Todo…

Todo…

Local Index

Todo…

参考&鸣谢