| = Physical Data Modeling |
| |
| Once you have a logical data model defined, creating the physical model |
| is a relatively simple process. |
| |
| You walk through each of the logical model tables, assigning types to |
| each item. You can use any valid `CQL data type <data-types>`, including |
| the basic types, collections, and user-defined types. You may identify |
| additional user-defined types that can be created to simplify your |
| design. |
| |
| After you’ve assigned data types, you analyze the model by performing |
| size calculations and testing out how the model works. You may make some |
| adjustments based on your findings. Once again let's cover the data |
| modeling process in more detail by working through an example. |
| |
| Before getting started, let’s look at a few additions to the Chebotko |
| notation for physical data models. To draw physical models, you need to |
| be able to add the typing information for each column. This figure shows |
| the addition of a type for each column in a sample table. |
| |
| image::data_modeling_chebotko_physical.png[image] |
| |
| The figure includes a designation of the keyspace containing each table |
| and visual cues for columns represented using collections and |
| user-defined types. Note the designation of static columns and secondary |
| index columns. There is no restriction on assigning these as part of a |
| logical model, but they are typically more of a physical data modeling |
| concern. |
| |
| == Hotel Physical Data Model |
| |
| Now let’s get to work on the physical model. First, you need keyspaces |
| to contain the tables. To keep the design relatively simple, create a |
| `hotel` keyspace to contain tables for hotel and availability data, and |
| a `reservation` keyspace to contain tables for reservation and guest |
| data. In a real system, you might divide the tables across even more |
| keyspaces in order to separate concerns. |
| |
| For the `hotels` table, use Cassandra’s `text` type to represent the |
| hotel’s `id`. For the address, create an `address` user defined type. |
| Use the `text` type to represent the phone number, as there is |
| considerable variance in the formatting of numbers between countries. |
| |
| While it would make sense to use the `uuid` type for attributes such as |
| the `hotel_id`, this document uses mostly `text` attributes as |
| identifiers, to keep the samples simple and readable. For example, a |
| common convention in the hospitality industry is to reference properties |
| by short codes like "AZ123" or "NY229". This example uses these values |
| for `hotel_ids`, while acknowledging they are not necessarily globally |
| unique. |
| |
| You’ll find that it’s often helpful to use unique IDs to uniquely |
| reference elements, and to use these `uuids` as references in tables |
| representing other entities. This helps to minimize coupling between |
| different entity types. This may prove especially effective if you are |
| using a microservice architectural style for your application, in which |
| there are separate services responsible for each entity type. |
| |
| As you work to create physical representations of various tables in the |
| logical hotel data model, you use the same approach. The resulting |
| design is shown in this figure: |
| |
| image::data_modeling_hotel_physical.png[image] |
| |
| Note that the `address` type is also included in the design. It is |
| designated with an asterisk to denote that it is a user-defined type, |
| and has no primary key columns identified. This type is used in the |
| `hotels` and `hotels_by_poi` tables. |
| |
| User-defined types are frequently used to help reduce duplication of |
| non-primary key columns, as was done with the `address` user-defined |
| type. This can reduce complexity in the design. |
| |
| Remember that the scope of a UDT is the keyspace in which it is defined. |
| To use `address` in the `reservation` keyspace defined below design, |
| you’ll have to declare it again. This is just one of the many trade-offs |
| you have to make in data model design. |
| |
| == Reservation Physical Data Model |
| |
| Now, let’s examine reservation tables in the design. Remember that the |
| logical model contained three denormalized tables to support queries for |
| reservations by confirmation number, guest, and hotel and date. For the |
| first iteration of your physical data model design, assume you're going |
| to manage this denormalization manually. Note that this design could be |
| revised to use Cassandra’s (experimental) materialized view feature. |
| |
| image::data_modeling_reservation_physical.png[image] |
| |
| Note that the `address` type is reproduced in this keyspace and |
| `guest_id` is modeled as a `uuid` type in all of the tables. |
| |
| _Material adapted from Cassandra, The Definitive Guide. Published by |
| O'Reilly Media, Inc. Copyright © 2020 Jeff Carpenter, Eben Hewitt. All |
| rights reserved. Used with permission._ |