dbShards
Useful Links


Previous Posts

Archives

Categories

Syndication
Contact Us

And get a unique database solution that grows when you grow and never costs more than it should.

Featured product

 

Use our no-charge dbShards/Analyze driver to identify critical performance issues in your database tier. A dbShards consultant will then coordinate with you to identify key hotspots, helping you to develop a plan for optimizing your database.

Sharding with Hibernate and other ORMs

Tuesday, December 21, 2010

Object Relational Mapping frameworks such as Hibernate (Java), PDO (PHP) and Django (Python) take care of persisting objects without requiring the developer to hand-code SQL for most tasks.

Primary Shard Tables

In most cases these frameworks generate SQL that works with dbShards. Taking a simple example of a top-level primary shard table ‘customer’ table that is sharded by a ‘customer_id’ column, which is also the primary key, these frameworks would generate SQL statements similar to the following examples:

INSERT INTO customer (customer_id, customer_name) VALUES (?, ?)

UPDATE customer SET customer_name = ? WHERE customer_id = ?

DELETE customer WHERE customer_id = ?

SELECT * FROM customer WHERE customer_id = ?

In each case, the SQL contains the ‘customer_id’ shard key and dbShards will be able to determine which shard to execute the SQL against.

Child and Grandchild Tables

For a table that is a direct child of ‘customer’, such as ‘customer_order’, there will already be a ‘customer_id’ column to support the foreign key relationship to the parent table and this column can also be used as the shard key. However, in this case the shard key is different from the primary key (the ‘customer_order’ table would most likely have its own auto increment ‘order_id’ column which would be used as the primary key).

For a table that is a direct child of ‘customer_order’ and therefore the grandchild of ‘customer’, such as ‘customer_order_item’, there would typically not be a ‘customer_id’ column in a normalized relational database design and there would just be a ‘customer_order_id’ column to support the foreign key relationship to the ‘customer_order’ table. To support sharding without adding undue complexity, it makes sense to denormalize the schema and add the ‘customer_id’ column to this table so that we have a natural shard key. It is desirable for performance reasons to keep all data for a single customer (including orders and order items) in a single shard.

ORM Mappings for Child Tables

When a table has a shard key that is different from the primary key, as is the case in the ‘customer_order’ and ‘customer_order_item’ child tables, then the ORM configuration must be updated to force the ORM to include the shard key in WHERE clauses, otherwise the WHERE clause will only contain the primary key and there will not be sufficient information for dbShards to determine the target shard.

Example: UPDATE on child table without shard key

UPDATE customer_order SET quantity = ? WHERE order_id = ?

This UPDATE will fail since there is no shard key (although, if the ORM allows comments to be inserted into the SQL then the dbShards Shard Hints feature can be used to specify the shard key).

The solution to this issue is actually very simple. The ORM configuration can be updated to include the shard key in the primary key definition (note that no schema changes are required since we are just tricking the ORM into generating the correct SQL). With the ORM configured to use ‘order_id’ and ‘customer_id’ as the primary key for the ‘customer_order’ table, the following SQL will now be generated:

UPDATE customer_order SET quantity = ? WHERE order_id = ? AND customer_id = ?

Each query now contains the ‘customer_id’ shard key and dbShards can once again automatically determine which shard or shards to execute the query against without requiring the use of Shard Hints.

Hibernate/JPA Code Example

Here is the original CustomerOrder class using JPA annotations:

@Entity
public class CustomerOrder implements Serializable {
  @Id
  private int orderId;
  private int quantity;
  private long totalPrice;
}

Here is the modified CustomerOrder class to support sharding by ‘customer_id’:

@Entity
public class CustomerOrder implements Serializable {
  @EmbeddedId
  private CustomerOrderPK pk;
  private int quantity;
  private long totalPrice;
}

@Embeddable
public class CustomerOrderPK implements Serializable {
  @Id
  private int orderId;
  private int customerId;
}

Posted in Articles, Tutorial by Andy Grove - No Comments »

Shard Hints

Monday, December 13, 2010

One of the new features in dbShards version 2.2.2 is the ability to provide “shard hints” as a comment before any SQL statement. There are a number of hints that can be supplied to tell the dbShards driver where to execute the query.

For example, an application could use shard hints to tell the driver to run certain queries against a secondary shard to reduce load on the primary shard. Other uses for shard hints include forcing an UPDATE or DELETE to be executed as a “global write” that gets replicated across all primary shards.

So what does this look like? Here is a Java code sample:

String sql = "/*DBS_HINT: dbs_shard_action=global_write */ "
    + "DELETE FROM `user` WHERE last_login < ?";
PreparedStatement pstmt = conn.prepareStatement(sql);
pstmt.setDate(1, oldDate);
pstmt.executeUpdate();

Without the shard hint, this DELETE statement would usually fail because the `user` table is sharded on a userID column and the query does not contain a shard key, and by default, dbShards would not allow an UPDATE or DELETE that spans multiple shards as this would not be an ACID-compliant transaction.

This table shows all shard hints that are currently supported.

Hint Name Description
dbs_shard_action Possible values are auto, shard_read, shard_write, go_fish, global_write.
  • auto (driver parses the SQL to determine the shard action)
  • shard_read (SELECT that can be fulfilled from a single shard)
  • shard_write (INSERT, UPDATE, DELETE that needs to run against a single shard).
  • global_write (INSERT, UPDATE or DELETE against a global table)
  • go_fish (SELECT that needs to run against more than one shard)
dbs_vshard Virtual shard number. Used in conjunction with shard_read, shard_write, global_write hints.
dbs_pshard Physical shard number. Used in conjunction with shard_read, shard_write, global_write hints.
dbs_use_secondary Enables or disables query execution against a secondary shard. Valid values are ‘true’ and ‘false’.
Enabling this option forces the driver to execute the query against the secondary shard. An
application may choose to execute some read-only queries against the secondary shard for performance
reasons (to take some load off the primary shard). This feature should be used with caution as there
is no guarantee that the secondary database is online and available at all times.
dbs_auto_incr_support Enables or disables auto increment handling in the driver. Valid values are ‘true’ and ‘false’.
dbs_cache_stmt This hint is specific to the dbShards JDBC driver. By default, parsed queries are cached to
improve performance. This hint can be provided to disable caching of a particular statement. This
is recommended for dynamic SQL where the SQL is different each time due to embedded literal values
and would not benefit from this type of caching.

Examples:

Example 1 – Force the driver to execute a SELECT query against a specific virtual shard:

/*DBS_HINT: dbs_shard_action=shard_read, dbs_vshard=1234 */
SELECT * FROM customer WHERE id = 1234

Example 2 – Force the driver to execute a SELECT query against a specific physical shard:

/*DBS_HINT: dbs_shard_action=shard_read, dbs_pshard=2 */
SELECT * FROM customer WHERE id = 1234

Example 3 – Force the driver to execute an UPDATE query against a specific virtual shard:

/*DBS_HINT: dbs_shard_action=shard_read, dbs_vshard=1234 */
UPDATE customer SET active = 0 WHERE id = 1234

Example 4 – Force the driver to execute an UPDATE query against a specific physical shard:

/*DBS_HINT: dbs_shard_action=shard_read, dbs_pshard=2 */
UPDATE customer SET active = 0 WHERE id = 1234

Posted in Tutorial by Andy Grove - No Comments »

MySQL Replication HOWTO video

Saturday, September 12, 2009

I just posted this short video to YouTube to demonstrate how easy it is to set up MySQL replication and also to demonstrate how transactions are lost if the master database fails.

Posted in Articles, Tutorial by Andy Grove - No Comments »