Data Modeling and CDC
Change Data Capture (CDC) changed data records propagate in one direction - from a source table to a topic in a changelog stream. One stream with one topic can be created for the changed data records or multiple streams with multiple topics can be created.
One source to one destination topic on one stream
You might use this scenario if there are a large number of changed data records being propagated, and you want the topic on a separate or isolated volume, so that resources are dedicated to these particular changed data records.
The following graphic shows a source table's change data records being propagated to one topic on one stream.
One source to multiple destination topics on one stream
You might use this scenario if you want to propagate specific changed data records from one source table to different topics.
When you set up a table changelog for data propagation, you can specify the column parameter to propagate a specific field or column family. Default: All fields are propagated. See table changelog add for information about adding a table changelog.
The following graphic shows a source table's change data records being propagated to multiple topics on a stream.
One source to multiple destination topics on multiple streams
You might use this scenario if the change data records are important and you want to have an extra copy for backup purposes.
The following graphic shows a source table's change data records being propagated to topics on multiple streams.
Multiple sources to multiple destination topics on one stream
You might use this scenario if you want to set up permissions to one stream so that a team has access to all the topics that they want to access. For example, if table1 and table2 has change data records that a team wants to monitor, then on the stream, you would grant permission to the monitoring team.
The following graphic shows three source tables' change data records being propagated to three topics on the same stream.
Source Cluster to Destination Cluster
If you are propagating changed data from a source table on a source cluster to a destination stream topic on a remote destination cluster, you must setup a gateway. Gateways are setup by installing the gateway on the destination cluster and specifying the gateway node(s) on the source cluster. See Administering Data Fabric Gateways and Configuring Gateways for Table and Stream Replication.
The following diagram shows a simple CDC data model, with one source table to one destination topic on one stream. Since this scenario has the destination stream topic on a remote destination cluster, you must setup and configure a gateway.