hbase_filter()

阅读(1236) 标签: hbase过滤器, 筛选数据,

描述:

通过hbase的过滤器筛选数据

语法:

hbase_filter(filterName,filterArg)  不同的filterName值所对应的参数个数与类型均不同,具体如下:

hbase_filter(ColumnCountGetFilter,n)

取每行的前n个值

hbase_filter(ColumnPaginationFilter,limit,columnOffset)

每行从columnOffset+1列开始取limit个列值

hbase_filter(ColumnPrefixFilter,str)

取每行满足列名前缀为str的列值

hbase_filter(ColumnRangeFilter,minColumn,boolean,maxColumn,boolean)

取每行列名在指定范围内的值,参数boolean代表是否为开区间,false为开区间,true为闭区间

hbase_filter(DependentColumnFilter,family,qualifier)

指定一个列,过滤结果中与其时间戳不同的列

hbase_filter(FamilyFilter,ifequal,my-family)

指定列族过滤,ifequal的值可以是“=”“=”等,my-family为列族名

hbase_filter(FirstKeyOnlyFilter)

只取每行的第一个值

hbase_filter(FirstKeyValueMatchingQualifiersFilter,column)

只取符合指定列名的第一个值

hbase_filter(FuzzyRowFilter,fuzzyKeysData)

Rowkey模糊匹配,参数需要提供一个List>的参数,第一个byte数组是匹配的模式,第二个为0或者1表示该模式是否需要匹配,0代表需要匹配,1代表不需要匹配。

hbase_filter(InclusiveStopFilter,stopRowKey)

扫描到指定行停止

hbase_(KeyOnlyFilter)

只返回key不返回value,如:rowkey、列族、列名、时间戳

hbase_filter(MultipleColumnPrefixFilter,prefixes1,prefixes2,...prefixesN)

可以指定多个列名前缀

hbase_filter(PageFilter,n)

分页过滤器,只显示n条数据

hbase_filter(PrefixFilter,str)

指定rowkey的前缀

hbase_filter(QualifierFilter,cmp,cmpColumn)

根据列名的比较结果确定是否保存返回值,其中参数cmp为比较符,可以是eq(或=,ne(或!=)等比较符,cmpColumn为通过比较器得出的列名

hbase_filter(RowFilter,cmp,cmpRow)

根据rowkey的比较结果确定是否保留值,其中参数cmp为比较符,可以是eq(或=,ne(或!=)等比较符,cmpRow为通过比较器得出的rowkey

hbase_filter(ValueFilter,cmp,cmpValue)

根据值的比较结果确定是否保存该值,其中参数cmp为比较符,可以是eq(或=,ne(或!=)等比较符,cmpValue为通过比较器得出的value

hbase_filter(SingleColumnValueExcludeFilter,family,columnName,ifequal,cmpValue)/ hbase_filter(SingleColumnValueFilter,family,columnName,ifequal,cmpValue

根据指定的列值判断是否保留行,family为族名,columnName为列名,ifequal为比较符,cmpValue为通过比较器得出的结果值。其中

SingleColumnValueExcludeFilter过滤出的数据中不包括columnName列,SingleColumnValueFilter过滤出的数据中包括columnName

hbase_filter(RandomRowFilter,n)

随机过滤行,按照一定的几率来返回随机的结果集,对于同样的数据集,多次使用同一个RandomRowFilter会返回不同的结果集参数nfloat型,n<=0会过滤掉所有的行,>=1会包含所有的行

hbase_filter(SkipFilter,hbase_filter(ValueFilter,ifequal,cmpValue))

包装过滤器,其与ValueFilter结合使用,如果发现一行中的某一列不符合条件,那么整行就会被过滤掉

hbase_filter(WhileMatchFilter,hbase_filter())

包装的过滤器,当遇到过滤器的条件时停止扫描

hbase_filter(TimestampsFilter,timeStamp,boolean)

只取指定时间戳

备注:

外部库函数,通过hbase的过滤器筛选数据

参数:

filterName

hbase的过滤器名称。

filterArg

过滤器参数。

返回值:

Filter句柄

示例:

 

A

 

1

=hbase_open("hdfs://192.168.0.8", "192.168.0.8")

 

2

=hbase_scan(A1,"emp")

emp表中的所有数据

3

=hbase_filter("ColumnCountGetFilter",3)

4

=hbase_scan(A1,"emp";filter:A3)

rowkey 为行名称,不属于数据列

5

=hbase_filter("ColumnPaginationFilter",2, 2)

6

=hbase_scan(A1,"emp";filter:A5)

7

=hbase_filter("ColumnPrefixFilter","na")

8

=hbase_scan(A1,"emp";filter:A7)

9

=hbase_filter("ColumnRangeFilter","age", true, "name",false)

10

=hbase_scan(A1,"emp";filter:A9)

11

=hbase_filter("DependentColumnFilter","family","age")

12

=hbase_scan(A1,"emp";filter:A11)

13

=hbase_filter("FamilyFilter","=",hbase_cmp@s("family"))

14

=hbase_scan(A1,"emp";filter:A13)

15

=hbase_filter("KeyOnlyFilter")

16

=hbase_scan(A1,"emp";filter:A15)

17

=hbase_filter("FirstKeyOnlyFilter")

18

=hbase_scan(A1,"emp";filter:A17)

19

=hbase_filter("FuzzyRowFilter","row1",[1,1,0,0])

20

=hbase_scan(A1,"emp";filter:A19)

21

=hbase_filter("InclusiveStopFilter","row3")

22

=hbase_scan(A1,"emp";filter:A21)

23

=hbase_filter("MultipleColumnPrefixFilter","na","tel","position")

24

=hbase_scan(A1,"emp";filter:A23)

25

=hbase_filter("PrefixFilter","ro")

26

=hbase_scan(A1,"emp";filter:A25)

27

=hbase_filter("QualifierFilter","eq", hbase_cmp@s("name"))

28

=hbase_scan(A1,"emp";filter:A27)

29

=hbase_filter("RowFilter","eq", hbase_cmp("row1"))

30

=hbase_scan(A1,"emp";filter:A29)

31

=hbase_filter("ValueFilter","=", hbase_cmp("C++"))

32

=hbase_scan(A1,"emp";filter:A31)

33

=hbase_filter("SingleColumnValueFilter","family","tel" ,"eq",hbase_cmp@s("13"))

34

=hbase_scan(A1,"emp";filter:A33)

35

=hbase_filter("SingleColumnValueExcludeFilter","family","tel" ,"eq",hbase_cmp@s("13"))

36

=hbase_scan(A1,"emp";filter:A35)

37

=hbase_filter("SkipFilter",hbase_filter("ValueFilter","=", hbase_cmp@s("aaa")))

38

=hbase_scan(A1,"emp";filter:A37)

39

=hbase_filter("FirstKeyValueMatchingQualifiersFilter","name")

 

 

 

 

 

 

 

40

 

 

 

 

 

 

 

=hbase_scan(A1,"emp";filter:A39)

41

=hbase_filter("TimestampsFilter",1488855959195,1488855959219,1488855959145,false)

42

=hbase_scan(A1,"emp";filter:A41)

43

=hbase_filter("RandomRowFilter",0.5)

随机过滤行

相关概念:

hbase_cmp()

hbase_scan()