| --- |
| { |
| "title": "BITMAP_UNION", |
| "language": "en" |
| } |
| --- |
| |
| <!-- |
| Licensed to the Apache Software Foundation (ASF) under one |
| or more contributor license agreements. See the NOTICE file |
| distributed with this work for additional information |
| regarding copyright ownership. The ASF licenses this file |
| to you under the Apache License, Version 2.0 (the |
| "License"); you may not use this file except in compliance |
| with the License. You may obtain a copy of the License at |
| |
| http://www.apache.org/licenses/LICENSE-2.0 |
| |
| Unless required by applicable law or agreed to in writing, |
| software distributed under the License is distributed on an |
| "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY |
| KIND, either express or implied. See the License for the |
| specific language governing permissions and limitations |
| under the License. |
| --> |
| |
| |
| ## BITMAP_UNION |
| |
| ### description |
| |
| ### example |
| |
| #### Create table |
| |
| The aggregation model needs to be used when creating the table. The data type is bitmap and the aggregation function is bitmap_union. |
| ``` |
| CREATE TABLE `pv_bitmap` ( |
| `dt` int (11) NULL COMMENT" ", |
| `page` varchar (10) NULL COMMENT" ", |
| `user_id` bitmap BITMAP_UNION NULL COMMENT" " |
| ) ENGINE = OLAP |
| AGGREGATE KEY (`dt`,` page`) |
| COMMENT "OLAP" |
| DISTRIBUTED BY HASH (`dt`) BUCKETS 2; |
| ``` |
| |
| Note: When the amount of data is large, it is best to create a corresponding rollup table for high-frequency bitmap_union queries |
| |
| ``` |
| ALTER TABLE pv_bitmap ADD ROLLUP pv (page, user_id); |
| ``` |
| |
| #### Data Load |
| |
| `TO_BITMAP (expr)`: Convert 0 ~ 18446744073709551615 unsigned bigint to bitmap |
| |
| `BITMAP_EMPTY ()`: Generate empty bitmap columns, used for insert or import to fill the default value |
| |
| `BITMAP_HASH (expr)` or `BITMAP_HASH64 (expr)`: Convert any type of column to a bitmap by hashing |
| |
| ##### Stream Load |
| |
| ``` |
| cat data | curl --location-trusted -u user: passwd -T--H "columns: dt, page, user_id, user_id = to_bitmap (user_id)" http: // host: 8410 / api / test / testDb / _stream_load |
| ``` |
| |
| ``` |
| cat data | curl --location-trusted -u user: passwd -T--H "columns: dt, page, user_id, user_id = bitmap_hash (user_id)" http: // host: 8410 / api / test / testDb / _stream_load |
| ``` |
| |
| ``` |
| cat data | curl --location-trusted -u user: passwd -T--H "columns: dt, page, user_id, user_id = bitmap_empty ()" http: // host: 8410 / api / test / testDb / _stream_load |
| ``` |
| |
| ##### Insert Into |
| |
| id2's column type is bitmap |
| ``` |
| insert into bitmap_table1 select id, id2 from bitmap_table2; |
| ``` |
| |
| id2's column type is bitmap |
| ``` |
| INSERT INTO bitmap_table1 (id, id2) VALUES (1001, to_bitmap (1000)), (1001, to_bitmap (2000)); |
| ``` |
| |
| id2's column type is bitmap |
| ``` |
| insert into bitmap_table1 select id, bitmap_union (id2) from bitmap_table2 group by id; |
| ``` |
| |
| id2's column type is int |
| ``` |
| insert into bitmap_table1 select id, to_bitmap (id2) from table; |
| ``` |
| |
| id2's column type is String |
| ``` |
| insert into bitmap_table1 select id, bitmap_hash (id_string) from table; |
| ``` |
| |
| |
| #### Data Query |
| |
| ##### Syntax |
| |
| |
| `BITMAP_UNION (expr)`: Calculate the union of two Bitmaps. The return value is the new Bitmap value. |
| |
| `BITMAP_UNION_COUNT (expr)`: Calculate the cardinality of the union of two Bitmaps, equivalent to BITMAP_COUNT (BITMAP_UNION (expr)). It is recommended to use the BITMAP_UNION_COUNT function first, its performance is better than BITMAP_COUNT (BITMAP_UNION (expr)). |
| |
| `BITMAP_UNION_INT (expr)`: Count the number of different values in columns of type TINYINT, SMALLINT and INT, return the sum of COUNT (DISTINCT expr) same |
| |
| `INTERSECT_COUNT (bitmap_column_to_count, filter_column, filter_values ...)`: The calculation satisfies |
| filter_column The cardinality of the intersection of multiple bitmaps of the filter. |
| bitmap_column_to_count is a column of type bitmap, filter_column is a column of varying dimensions, and filter_values is a list of dimension values. |
| |
| ##### Example |
| |
| The following SQL uses the pv_bitmap table above as an example: |
| |
| Calculate the deduplication value for user_id: |
| |
| ``` |
| select bitmap_union_count (user_id) from pv_bitmap; |
| |
| select bitmap_count (bitmap_union (user_id)) from pv_bitmap; |
| ``` |
| |
| Calculate the deduplication value of id: |
| |
| ``` |
| select bitmap_union_int (id) from pv_bitmap; |
| ``` |
| |
| Calculate the retention of user_id: |
| |
| ``` |
| select intersect_count (user_id, page, 'meituan') as meituan_uv, |
| intersect_count (user_id, page, 'waimai') as waimai_uv, |
| intersect_count (user_id, page, 'meituan', 'waimai') as retention // Number of users appearing on both 'meituan' and 'waimai' pages |
| from pv_bitmap |
| where page in ('meituan', 'waimai'); |
| ``` |
| |
| ### keywords |
| |
| BITMAP, BITMAP_COUNT, BITMAP_EMPTY, BITMAP_UNION, BITMAP_UNION_INT, TO_BITMAP, BITMAP_UNION_COUNT, INTERSECT_COUNT |