2

首先感谢所有回答我们所有问题的人。

我被困在我希望人们能让我摆脱问题的一点上。

我有 6 个节点的 Apache 2.1 集群,我创建了一个表,其中包含 3 列..第 1 列作为文本类型,其他 2 列是地图类型。当我将数据插入表并读取数据时..获取 1 行大约需要 20 毫秒,但如果我为所有 3 列创建一个带有文本类型的表,则只需 5 毫秒。如果我失踪了,请建议我。如果是地图类型,为什么要花时间?我很困惑开始地图类型读取延迟。

以下是 cfstats 和查询:

桌子:

PRODUCT_TYPE
SSTable count: 1
SSTables in each level: [1, 0, 0, 0, 0, 0, 0, 0, 0]
Space used (live): 81458
Space used (total): 81458
Space used by snapshots (total): 0
Off heap memory used (total): 87
SSTable Compression Ratio: 0.15090414689301526
Number of keys (estimate): 6
Memtable cell count: 0
Memtable data size: 0
Memtable off heap memory used: 0
Memtable switch count: 0
Local read count: 5
Local read latency: 22.494 ms
Local write count: 0
Local write latency: NaN ms
Pending flushes: 0
Bloom filter false positives: 0
Bloom filter false ratio: 0.00000
Bloom filter space used: 16
Bloom filter off heap memory used: 8
Index summary off heap memory used: 15
Compression metadata off heap memory used: 64
Compacted partition minimum bytes: 73458
Compacted partition maximum bytes: 105778
Compacted partition mean bytes: 91087
Average live cells per slice (last five minutes): 1.0
Maximum live cells per slice (last five minutes): 1.0
Average tombstones per slice (last five minutes): 0.0
Maximum tombstones per slice (last five minutes): 0.0

CREATE TABLE TEST.PRODUCT_TYPE (
type text PRIMARY KEY,
col1 map<int, boolean>,
timestamp_map map<int, timestamp>
) WITH bloom_filter_fp_chance = 0.1
AND caching = '{"keys":"ALL", "rows_per_partition":"NONE"}'
AND comment = ''
AND compaction = {'min_threshold': '4', 'class':                       'org.apache.cassandra.db.compaction.LeveledCompactionStrategy', 'max_threshold': '32'}
AND compression = {'sstable_compression': 'org.apache.cassandra.io.compress.LZ4Compressor'}
AND dclocal_read_repair_chance = 0.1
AND default_time_to_live = 0
AND gc_grace_seconds = 864000
AND max_index_interval = 2048
AND memtable_flush_period_in_ms = 0
AND min_index_interval = 128
AND read_repair_chance = 0.0
AND speculative_retry = '99.0PERCENTILE';


 activity                                                                                                              | timestamp                  | source        | source_elapsed
 -----------------------------------------------------------------------------------------------------------------------+----------------------------+---------------+----------------
                                                                                                    Execute CQL3 query | 2015-06-03 21:57:36.841000 | 10.65.133.202 |              0
                                            Parsing SELECT * from location_eligibility_by_type5; [SharedPool-Worker-1] | 2015-06-03 21:57:36.842000 | 10.65.133.202 |             54
                                                                             Preparing statement [SharedPool-Worker-1] | 2015-06-03 21:57:36.842000 | 10.65.133.202 |             86
                                                                       Computing ranges to query [SharedPool-Worker-1] | 2015-06-03 21:57:36.842000 | 10.65.133.202 |            165
  Submitting range requests on 1537 ranges with a concurrency of 1 (0.0 rows per range expected) [SharedPool-Worker-1] | 2015-06-03 21:57:36.842000 | 10.65.133.202 |            410
                                                             Enqueuing request to /10.65.137.191 [SharedPool-Worker-1] | 2015-06-03 21:57:36.849000 | 10.65.133.202 |           7448
                                                                      Message received from /10.65.133.202 [Thread-15] | 2015-06-03 21:57:36.849000 | 10.65.137.191 |             15
                                      Submitted 1 concurrent range requests covering 1537 ranges [SharedPool-Worker-1] | 2015-06-03 21:57:36.849000 | 10.65.133.202 |           7488
                                                              Sending message to /10.65.137.191 [WRITE-/10.65.137.191] | 2015-06-03 21:57:36.849000 | 10.65.133.202 |           7515
 Executing seq scan across 0 sstables for [min(-9223372036854775808), min(-9223372036854775808)] [SharedPool-Worker-1] | 2015-06-03 21:57:36.850000 | 10.65.137.191 |            105
                                                              Read 1 live and 0 tombstoned cells [SharedPool-Worker-1] | 2015-06-03 21:57:36.866000 | 10.65.137.191 |          16851
                                                              Read 1 live and 0 tombstoned cells [SharedPool-Worker-1] | 2015-06-03 21:57:36.882000 | 10.65.137.191 |          33542
                                                              Read 1 live and 0 tombstoned cells [SharedPool-Worker-1] | 2015-06-03 21:57:36.899000 | 10.65.137.191 |          50206
                                                              Read 1 live and 0 tombstoned cells [SharedPool-Worker-1] | 2015-06-03 21:57:36.915000 | 10.65.137.191 |          66556
                                                              Read 1 live and 0 tombstoned cells [SharedPool-Worker-1] | 2015-06-03 21:57:36.932000 | 10.65.137.191 |          82814
                                                                    Scanned 5 rows and matched 5 [SharedPool-Worker-1] | 2015-06-03 21:57:36.932000 | 10.65.137.191 |          82839
                                                            Enqueuing response to /10.65.133.202 [SharedPool-Worker-1] | 2015-06-03 21:57:36.933000 | 10.65.137.191 |          82878
                                                              Sending message to /10.65.133.202 [WRITE-/10.65.133.202] | 2015-06-03 21:57:36.933000 | 10.65.137.191 |          83054
                                                                     Message received from /10.65.137.191 [Thread-151] | 2015-06-03 21:57:36.944000 | 10.65.133.202 |         102134
                                                         Processing response from /10.65.137.191 [SharedPool-Worker-2] | 2015-06-03 21:57:36.944000 | 10.65.133.202 |         102191
                                                                                                      Request complete | 2015-06-03 21:57:36.948916 | 10.65.133.202 |         107916

提前感谢您的所有支持和回答。

谢谢,约翰

4

1 回答 1

3

Cassandra 中的集合类型在底层实现为 blob,这里没有真正的魔法。

要测量差异,您可以在 C* 中启用跟踪并自己查看差异:

create table no_collections(id int, value text, primary key (id));
create table with_collections(id int, value set<text>, primary key (id));

cqlsh:stackoverflow> select * from no_collections ;

 id | value
----+-------------
  1 | foo,bar,baz
  2 | xxx,yyy,zzz
  3 | aaa,bbb,ccc

(3 rows)
cqlsh:stackoverflow> select * from with_collections ;

 id | value
----+-----------------------
  1 | {'bar', 'baz', 'foo'}
  2 | {'xxx', 'yyy', 'zzz'}
  3 | {'aaa', 'bbb', 'ccc'}

(3 rows)

现在让我们启用跟踪以查看发生了什么:

cqlsh:stackoverflow> TRACING ON ;
Now Tracing is enabled
cqlsh:stackoverflow> select * from with_collections where id=3;

 id | value
----+-----------------------
  3 | {'aaa', 'bbb', 'ccc'}

(1 rows)

Tracing session: 7c3d4ed0-09c8-11e5-b4cd-2988e70b20cb

activity                                                                                            | timestamp                  | source    | source_elapsed
-------------------------------------------------------------------------------------------------+----------------------------+-----------+----------------
                                                                              Execute CQL3 query | 2015-06-03 11:13:58.717000 | 127.0.0.1 |              0
                        Parsing select * from with_collections where id=3; [SharedPool-Worker-1] | 2015-06-03 11:13:58.718000 | 127.0.0.1 |             72
                                                       Preparing statement [SharedPool-Worker-1] | 2015-06-03 11:13:58.718000 | 127.0.0.1 |            218
                      Executing single-partition query on with_collections [SharedPool-Worker-3] | 2015-06-03 11:13:58.718000 | 127.0.0.1 |            547
                                              Acquiring sstable references [SharedPool-Worker-3] | 2015-06-03 11:13:58.718000 | 127.0.0.1 |            556
                                               Merging memtable tombstones [SharedPool-Worker-3] | 2015-06-03 11:13:58.719000 | 127.0.0.1 |            574
 Skipped 0/0 non-slice-intersecting sstables, included 0 due to tombstones [SharedPool-Worker-3] | 2015-06-03 11:13:58.719000 | 127.0.0.1 |            636
                                Merging data from memtables and 0 sstables [SharedPool-Worker-3] | 2015-06-03 11:13:58.719000 | 127.0.0.1 |            644
                                        Read 1 live and 0 tombstoned cells [SharedPool-Worker-3] | 2015-06-03 11:13:58.719000 | 127.0.0.1 |            673
                                                                                Request complete | 2015-06-03 11:13:58.717847 | 127.0.0.1 |            847

如您所见,解析和执行使用集合的查询只需要大约 800ns。如果没有集合,情况看起来大致相同:

cqlsh:stackoverflow> select * from no_collections where id=3;

 id | value
----+-------------
  3 | aaa,bbb,ccc

(1 rows)

Tracing session: 7e9ac6d0-09c8-11e5-b4cd-2988e70b20cb

 activity                                                                                        | timestamp                  | source    | source_elapsed
-------------------------------------------------------------------------------------------------+----------------------------+-----------+----------------
                                                                              Execute CQL3 query | 2015-06-03 11:14:02.685000 | 127.0.0.1 |              0
                          Parsing select * from no_collections where id=3; [SharedPool-Worker-1] | 2015-06-03 11:14:02.686000 | 127.0.0.1 |             77
                                                       Preparing statement [SharedPool-Worker-1] | 2015-06-03 11:14:02.686000 | 127.0.0.1 |            209
                        Executing single-partition query on no_collections [SharedPool-Worker-3] | 2015-06-03 11:14:02.686000 | 127.0.0.1 |            525
                                              Acquiring sstable references [SharedPool-Worker-3] | 2015-06-03 11:14:02.686000 | 127.0.0.1 |            534
                                               Merging memtable tombstones [SharedPool-Worker-3] | 2015-06-03 11:14:02.687000 | 127.0.0.1 |            553
 Skipped 0/0 non-slice-intersecting sstables, included 0 due to tombstones [SharedPool-Worker-3] | 2015-06-03 11:14:02.687000 | 127.0.0.1 |            598
                                Merging data from memtables and 0 sstables [SharedPool-Worker-3] | 2015-06-03 11:14:02.688000 | 127.0.0.1 |            606
                                        Read 1 live and 0 tombstoned cells [SharedPool-Worker-3] | 2015-06-03 11:14:02.688000 | 127.0.0.1 |            630
                                                                                Request complete | 2015-06-03 11:14:02.685789 | 127.0.0.1 |            789

所以我在这里没有看到真正的区别。

cqlsh 跟踪显示的时间是近似的,在统计上并不真正正确。要探索差异,您至少需要运行几十个实验,然后比较它的结果。但结果可能取决于不同的事情:

  • 节点之间的网络延迟。它可能是 AWS 等共享基础设施中所有延迟问题的原因。
  • 集群负载。如果您的集群没有空闲,它可能会执行一些可能会干扰您的测量的后台工作。
  • 后台工作。如果你有一个频繁更新/删除的数据集,C* 可能会在后台执行一些压缩任务,它也会干扰其他查询。
  • 大量更新/删除,内存不足。如果您有(或过去有)繁重的更新/删除工作量,您的数据可能会分布在多个尚未压缩的小型 SSTable 中。C* 必须为您的行读取其中的大部分,这将导致高查询延迟。

因此,我建议您在启用跟踪的情况下运行查询以查看问题,但我敢打赌它根本与集合无关。

于 2015-06-03T08:32:30.123 回答