DataStax está trabajando en la construcción de un modelo de datos de rendimiento para Apache Cassandra. Qué es este trabajo y cómo hacerlo correctamente, en la conferencia Cassandra Day Rusia 2021, dijo Artyom Chebotko, arquitecto de soluciones de DataStax.
Apache Cassandra. DataStax. use cases, . .
. , Cassandra , , . . 3 , . , , .
Cassandra
Cassandra , , KEYSPACE — . . , replication strategy, - replication factors .
DC-WEST — - replication factor 3. DC-EAST replication factor 5. KEYSPACE. , KEYSPACE, replication strategy.
KEYSPACE . Create Table — .
. SQL: 4 , 4 . primary key — — , , 2 . — year. , partition key, . — name. clustering key, , .
Partition key YEAR , . . YEAR partition key. partition. , 2015 partition, 2015 partition. - .
— Cassandra , , , replication factor. , partition — - 3 , - 5 . 1- partition 3 . partition key Cassandra , , , .
KEYSPACE, — Cassandra Query Language, Structured Query Language, SQL.
, Create Table, .
partition key, , primary key partition key , , clustering key. , clustering key.
, . , . , , - , . partition, partition.
clustering order by — , partition, . , , clustering key. Cassandra , . , , , .
, partitions. , primary key. primary key ID, partition key. partition . . « , » — Single-Row Partitions. , Cassandra. partitions , 1. Multi-Row Partitions.
, partition key, clustering key, Cassandra, . . . . 10 , . partition partition - .
partition key. Venue year — «» «». DataStax Accelerate. partition key . , — - . title, — . .
Country , partition, . , .
. . ? , 5 , , K — partition key, — clustering key, — ascending descending, , — . S — .
, . , CQL. SQL: select, from, where, group by, order by, limit. allow filtering — .
Select — , from — . Cassandra . . , join — , union — , intersection — . , 2 , . , , , join, , join.
where — , primary key. partition key — . — — clustering key, , /. . use cases, , .
Group by primary key , .
Order by — . Cassandra , . , . , . . .
Limit — .
llow filtering — , . , . , , , , .
, artefacts_by_venue.
artefacts, venue - , year - , partition key. partition key clustering key — . clustering key. : partition key clustering key.
, .
, venue. partition key, Cassandra , , . partition key, clustering key.
venue, year — partition key, title , primary key, . Country. . , , .
Primary key , . -, , partition key, partition , , partition. .
clustering key ( ). , join, - , , . , , , . .
— . , , . .
— . . — . , — , . , . — , ( ). , .
, , . , — access patterns . . , , , , . . , , .
- — — .
, Cassandra , (consistency) , , . — join . , .
, — , , , . , .
4 :
- .
- , , .
- , .
- .
:
- Conceptual Data Model.
- Application Workflow Model.
- Logical Data Model.
- Physical Data Model.
- : Entity-Relationship Diagram (-), Application Workflow Diagram ( ), Chebotko Diagram Chebotko Diagram&CQL.
. — .
, : « — Conceptual Data Model Application Workflow Model»? . , , . , . , , .
: ? consistency level , ?
: , . . , . ? partition key, Cassandra- , . 100 , replication factor 3, partition key , 3 — . secondary index partition key, 100 , .
?
- partition key
- . , OLTP-, , . Cassandra, -. . - Cassandra — Spark, - . - -, , , .
consistency level . , . .
, , .
DataStax Academy , 2. , . , : , .
— Internet of Things . ? - , , . , , , , . - , , , . .
, . , ?
, - . - — .
, , . , . , , - .
, . — , . — , . , . ID — . , — . — — , , : , , . , .
, , , — . , . ID timestamp - . — timestamp — .
, Entity-Relationship (-), . , . , .
Application Workflow Model — . : , .
Application Workflow . . . : - — , . , - , . . - data access pattern. , , batch.
4 , 4 4 . , , 1 — . ?
- .
- . ? . , . . .
- : .
- : .
. , . ? : . — : /. clustering key, partition key. , . , , ID .
, . , , Application Workflow. — . — , . , , DataStax Academy.
sensors_bynetwork — . Network — partition key, partition. Temperatures by_sensor — , timestamp. , + . timestamp clustering key, . , . .
, ? , . — . 3 . — . bucket — partition key, name — clustering key. partition . partition. Bucket — , , partition.
: networks — . , partition.
? week — . . partition key. . partition , partition . ? — , . , , . , .
, , 100 000 100 . . , 5 , - 100 . 100 000 - — 10 . - 100 000 — 1 . .
, ? , , — 24 . , . 1 000 — 24 * 1 000 = 24 000 . , , . , . . .
— . — . timestamp — .
: , like - , ?
secondary indexes, , , secondary indexes . , , Cassandra . , , , . , — solar indexes, Cassandra, .
, — . , CQL. . . , KEYSPACE, . , , , , , partition key, clustering key — . — CQL , , Stargate API — .
2 : , . , , . , partition, .. bucket = all. , , , partition.
. forest-net, , . : network = forest-net, -. - . . .
, , ? ? 2 partition, 2 . , . 2 : , . . , in, . in, , 2 . , .
, . — . , . , — «» «». - . , mutual funds ( ), ETF (Exchange-traded fund). . , .
. keys, username, , , — . . , . , . -, , : , . , .
Workflow — 3 . . , , . — . , . . 5 . , 5 , . , . — . — : . — + + + . — + + . .
, ?
4 3- . 3.1 3.2. , , , . Trade_id — id . , : . partition — , trade_id.
, . ? . — . — . , .
, trades_by_a_d ? ? , — . , . , , 100 000 — . — — . , , , 100 000 .
, — trade_id . Trade_id — TIMEUUID. UUID — . timestamp, . , .
, - . .
? , TIMEUUID? TIMEUUID timestamp .
, , , . TIMEUUID — , .
, — TIMEUUID, . trade_id > maxTIMEUUID — , , . , timestamp. timestamp . .
: . ?
: ? — update insert . , . : trades — 4 , , -. -. ? baches, . baches , , baches, partition, . .
partition , . insert application retry, - . - — - , - , . Spark , , . join Spark, .