Some interesting snippits on Cassandra now that I am using it in anger.

Read this

MemtableSSTable Its a great guide on how Cassandra stores things and why compaction and so on occurs.

Select column range restrictions - one colunm only!

Cassandra has a primary key which you can search on, which is split into two parts. The partition key determines which Cassandra node can retrieve the data, and the cluster keys allow searching in order (ascending or descending)

You can select using = and IN, but you cannot select using two restrictions (ie < and >)

Example table, with a go at restricting with two columns.

CREATE TABLE jonathaneg (
    forename text,
    surname text,
    race text,
    vt timestamp,
    tt timestamp,
 PRIMARY KEY ((forename), vt, tt))
 WITH CLUSTERING ORDER BY (vt DESC)
 AND CLUSTERING ORDER BY (tt DESC);

select * from jonathaneg where forename='Jon' and vt<'2016-02-01' and tt<'2016-02-01';

InvalidRequest: code=2200 [Invalid query] message="Clustering column "tt" cannot be restricted (preceding column "vt" is restricted by a non-EQ relation)"

Materialised views from Cassandra 3.0.x

We can create a table with a subset of the columns, with different clustering keys.

Inserts and deletes automatically cascade to the views. There is a penalty in performance for inserts and updates as Cassandra is obviously maintaining multiple data stores.

There are restrictions:

  • Only one new column can be added to the marerialised views primary key. Static columns are not allowed.
  • The columnns in the source tables primary key must be part of the views primary key
CREATE MATERIALIZED VIEW IF NOT EXISTS jonathansurname AS
    SELECT forename,surname,race,vt,tt
    FROM jonathaneg
    WHERE forename IS NOT NULL
    AND surname IS NOT NULL
    AND race IS NOT NULL
    AND vt IS NOT NULL
    AND tt IS NOT NULL
 PRIMARY KEY ((forename), surname, vt, tt)
 WITH CLUSTERING ORDER BY (surname DESC);

Note that you can only drop a table once all materialized views have been dropped.

Column timestamp

The writetime can be selected from a column. This writetime cannot be selected for map and lists, but other than that it is the mechanism to use for any temporal select - its as close to cassandra time as you can get and helps to avoid Quorum write and reads crossing over.

INSERT INTO jonathaneg (forename,surname,race,vt,tt)
  VALUES ('Bilbo', 'Baggins', 'Hobbit', '2016-07-02 10:55', '2016-07-02 10:55')
  IF NOT EXISTS;

SELECT writetime(race) from jonathaneg
  WHERE forename='Bilbo';