Clickhouse Thinking

列存储 https://clickhouse.com/docs/knowledgebase/key-value ch的列存储,每一列是独立的文件存储,select * from test where k = 1 limit 10的查询,如果k是order by的列,那么查找会很快速的定位到文件,然后文件内查找定位到行,然后就得到所在文件的行数,之后再去找其他所有列对应的文件,然后读取对应行的数据,组合起来,才能得到这一行的数据。 ...

January 22, 2024 · notsobad

Clickhouse Projections

确认projection生效: https://clickhouse.com/docs/knowledgebase/projection_example ReadFromPreparedSource (for_column2) 可以看出是走了projection clickhouse-23.12 :) explain indexes = 1 SELECT count() from table1_projections where column2 > 999 and column1 > 1; ┌─explain─────────────────────────────────────┐ │ Expression ((Projection + Before ORDER BY)) │ │ Aggregating │ │ Filter │ │ ReadFromPreparedSource (for_column2) │ └─────────────────────────────────────────────┘ Using projections to speedup queries in Clickhouse This is important as this means we have to add all columns to projection that are used in the query. This is different from row-based databases indexes, which referrence entire rows. But columnar database doesn’t even know what row is, that’s why projection != index. Projection is just some part of the original table data stored in a different form, which is better suitable for certain queries. ...

January 19, 2024 · notsobad

Clickhouse Optimize

Query cache https://clickhouse.com/docs/en/operations/query-cache https://clickhouse.com/blog/introduction-to-the-clickhouse-query-cache-and-design v23.1 开始支持Query cache, 即可以缓存查询结果,可以选择针对会话启用。 设置项: query_cache_min_query_runs 查询运行过多少次才会启用缓存 query_cache_min_query_duration 查询运行时间超过多少才会缓存 query_cache_ttl cache 过期时间 query_cache_max_entries 往缓存表中存储的缓存结果条数 use_query_cache 此查询是否使用缓存 enable_reads_from_query_cache 此查询是否先从缓存中读 enable_writes_to_query_cache 此查询的结果是否写入缓存 Materialized View 物化视图,看这篇 ...

January 16, 2024 · notsobad

Clickhouse Deduplication

Deduplication Strategies 笔记 去重操作,CH不是在插入阶段去重,而是插入后,速度会很快,影响: 任何时间,数据中可能会有重复数据 分区合并时,才会去重 查询语句需要适配重复数据 几种合并: ReplacingMergeTree: 相同排序键的会被合并 CollapsingMergeTree: 有一个字段标识取消,可以声明取消前面的数据,然后再插入新数据,多线程写的情况不适用。 VersionedCollapsingMergeTree: 带有版本号的去重,可以 Q: 为什么数据没有被去重? ...

January 8, 2024 · notsobad

Clickhouse MV

《Everything you should know about materialized views.》 阅读笔记 原理 materialized views(以下简称MV)在insert的时候触发,并不会读源表,往一个表insert后,MV会从insert的数据中进行创建。 因此,源表甚至可以是个NULL Engine,即实际不写入数据,仅用来在inser时触发MV的动作。 ...

January 4, 2024 · notsobad

Clickhouse Tips

投影projection 官方文档https://clickhouse.com/docs/en/sql-reference/statements/alter/projection 创建这个特性时的讨论:https://github.com/ClickHouse/ClickHouse/issues/14730 ...

September 14, 2023 · notsobad