Cassandra

Terms & Definitions

Infrastructure terminology

Term

Definition

AKA

Node

Logical atomic unit of processing

Database server instance

Rack

A set of Nodes

Load-balanced database server instances

Data Center

A set of Racks

Highly available load-balanced database server instances

Cluster

A set of Data Centers

Globally available load-balanced database server instances

Database terminology

Term

Definition

AKA

Keyspace

Contains data related to a single application

Schema

Table

Contains data related to a single object (originally known as Column Family)

Database table/Collection

Row

Contains data for a single object instance with a structure equal to the Column Family the Row is under

Row (duh?)

Primary Key

Composed of the Partition Key and Clustering Keys

Nothing like this

Partition Key

Non-unique identifier for a Row

Primary key

Clustering Key

Identifier for Rows with

A GROUP BY-ish key

Opertaional terminology

Term

Definition

AKA

Bloom Filter

Reduces the seek-time for read operations

CQL

Cassandra Query Language

(Yet another) domain-specific SQL

Gossip

Protocol used to achieve eventual consistency

The syncing protocol

Partition Key Cache

Stores the location of partitions which are commonly queried. Default feature

Row Cache

A cache reserved for frequently queried data rows. Optional feature

Seed Node

A node that bootstraps other nodes

Master database instance

Usage

Consistency Check

Before querying, Cassandra checks if replicas consistency is satisfied. If not, an exception is thrown

Read Path

Reading is more expensive than writing because of the Consistency Check and Read Repair

Read Repair

When reading, data is read from the fastest responding replica (via the dynamic snitch) and hashed into a digest. Digests are requested from all other replicas. If all match, the data is returned to the client. If some don't match, data is requested from all replicas and the data with the latest timestamp is returend to the client. Read Repair requests are then issued to replicas whose digest do not match the digest of the returned data (with the latest timetsamp).

Tombstone

Cassandra does not delete data immediately, it tags a Row for deletion and this is called a Tombstone

Availability Scaling

Consistency

Cassandra is eventually consistent

One, Two, Three

Specified number of replicas must agree

Quorum

A majority of replicas must agree

Local_ (modifier)

Local refers to the Data Center
Local_two - two replicas within the Data Center must agree
Local_quorum - majority of replicas within the Data Center must agree

Each_ (modifier)

Each refers to the Data Center
Each_two - all Data Centers must have two replicas within that Data Center agree
Each_quorum - all Data Centers must have a majority of replicas within the Data Center agree

Data Partition

Made up of rows that share a common Partition Key

Data Replication

Configured per keyspace (replication factor per data center)

Data Centers

Simulates real-world data centers
Holds multiple racks
Enable segregation of workloads that require the same data set
Enable lower latency by processing data as close to the source region as possible
Enables a disaster-recovery-ready setup

Node

Owns set(s) of Data Partitions
Responsible for multiple token-ranges via Vnodes
Each node by default is assigned 256 tokens (num_tokens option)

Partition

Ideally less than 10 MB
Definitely less than 100MB
Size of this is a crucial attribute for Cassandra performance

Partition Key

Each Table contains one
Can be standalone or composite
Converted to a Token by a partitioner
Default partitioner is Murmur3Partitioner

Replication

Defined by the replication factor
One primary replica which exists in the Node that owns the associated Token

Replication Factor

Ideally an odd number
Number of Racks should be in multiples of the Replication Factor

Replication Strategy

Defined at the keyspace level
Either SimpleStrategy or NetworkTopologyStrategy
SimpleStrategy does not consider Racks/Data Centers. Places data replicas on nodes sequentially
NetworkTopologyStrategy does the above

Replica Placement Strategy

Defines placement of a Data Partition on other nodes

Snitches

Determines the Data Center and Rack a Node belongs to
Defined at the Node level
Defaults to GossipingPropertyFileSnitch
Dynamic snitches exist to keep track of node latencies dynmically

Token

Unique identifer referencing a Data Partition
Range: -2e63 to +2e63 - 1 - each node owns a portion of this range and primarily owns data corresponding to the range
Token range is split as cluster grows (eg. if one node => one node owns entire range, if X nodes => X nodes own the entire range)

VNode

A virtual node structure that enables splitting of a node's token-range into smaller ranges
Each physical node is assigned a number of virtual nodes

Performance Scaling

Partition sizes

Partition size affects JVM heap size
Partition size affects GC
Big partition makes garbage collection inefficient (GC works over a larger JVM heap)
Partition size affects Cassandra Repairs

Querying

Use as little Partition Keys as possible - 1 if possible
Cassandra is more efficient because data is retrieved from a single Partition

Error messages

Data center differs from previous data center

ERROR [main] 2020-09-24 02:10:24,030 CassandraDaemon.java:759 - Cannot start node if snitch's data center (dc1) differs from previous data center (ap-southeast-1). Please fix the snitch configuration, decommission and rebootstrap this node or use the flag -Dcassandra.ignore_dc=true.

Run kubectl edit statefulset cassandra and add the flag

Snitch's rack differs from previous rack

RROR [main] 2020-09-24 02:13:42,845 CassandraDaemon.java:759 - Cannot start node if snitch's rack (rack1) differs from previous rack (ap-southeast-1a). Please fix the snitch configuration, decommission and rebootstrap this node or use the flag -Dcassandra.ignore_rack=true.

References

PreviousData Persistence NextSoftware Architecture

Last updated 4 years ago

Was this helpful?