dbShards
Useful Links


Previous Posts

Archives

Categories

Syndication
Contact Us

And get a unique database solution that grows when you grow and never costs more than it should.

Featured product

 

Use our no-charge dbShards/Analyze driver to identify critical performance issues in your database tier. A dbShards consultant will then coordinate with you to identify key hotspots, helping you to develop a plan for optimizing your database.

Database Sharding. What is it?

Monday, February 7, 2011

What is Database Sharding?

Contrary to popular belief, database sharding is not a new concept. It has been around almost as long as the SQL engine itself. In the days of our ancestors (well at least mine), if your application or database grew past the point of the machine I/O and hardware capabilities, you had no choice but to add more machines and thus more databases. But what is a new concept is the ability to “shard” your database without hiring Stephen Hawking to write the algorithm to distribute and gather data as if it was one database. That is where dbShards comes into play.

Although we do not have Stephen Hawking on our team, we do have some of the most experienced database professionals in the field. Using these resources, we have developed the solution for scaling and replicating high performance databases. In two paragraphs, I will layout the advantages of database sharding in general, not just with dbShards, and I will go over the advantages of using dbShards as your high performance database solution.

Sharding?

Database sharding is simply the process of taking one database and breaking it up among multiple machines to spread out the workload and increase the overall performance of the database. The term “shard” is coined from the idea of breaking something into parts, but when combined, they stand as one. Simply put, when one database is doing the work of ten, it only makes sense to give it some help. Many of the most popular internet applications use database sharding as a performance and scaling solution. Can you say Twitter?

What is dbShards?

dbShards is the industry leading database sharding solution. We have customers scaling to hundreds of millions of transactions per day, and millions of new users per day. Add that with a replication solution that lets your system administrator shower more than once per year, it is no wonder we are considered the sharding solution. Database sharding does not have to be difficult, and with our team of engineers, you can rest assured it is done right.

Posted in Articles by choisington - No Comments »

Sharding with Hibernate and other ORMs

Tuesday, December 21, 2010

Object Relational Mapping frameworks such as Hibernate (Java), PDO (PHP) and Django (Python) take care of persisting objects without requiring the developer to hand-code SQL for most tasks.

Primary Shard Tables

In most cases these frameworks generate SQL that works with dbShards. Taking a simple example of a top-level primary shard table ‘customer’ table that is sharded by a ‘customer_id’ column, which is also the primary key, these frameworks would generate SQL statements similar to the following examples:

INSERT INTO customer (customer_id, customer_name) VALUES (?, ?)

UPDATE customer SET customer_name = ? WHERE customer_id = ?

DELETE customer WHERE customer_id = ?

SELECT * FROM customer WHERE customer_id = ?

In each case, the SQL contains the ‘customer_id’ shard key and dbShards will be able to determine which shard to execute the SQL against.

Child and Grandchild Tables

For a table that is a direct child of ‘customer’, such as ‘customer_order’, there will already be a ‘customer_id’ column to support the foreign key relationship to the parent table and this column can also be used as the shard key. However, in this case the shard key is different from the primary key (the ‘customer_order’ table would most likely have its own auto increment ‘order_id’ column which would be used as the primary key).

For a table that is a direct child of ‘customer_order’ and therefore the grandchild of ‘customer’, such as ‘customer_order_item’, there would typically not be a ‘customer_id’ column in a normalized relational database design and there would just be a ‘customer_order_id’ column to support the foreign key relationship to the ‘customer_order’ table. To support sharding without adding undue complexity, it makes sense to denormalize the schema and add the ‘customer_id’ column to this table so that we have a natural shard key. It is desirable for performance reasons to keep all data for a single customer (including orders and order items) in a single shard.

ORM Mappings for Child Tables

When a table has a shard key that is different from the primary key, as is the case in the ‘customer_order’ and ‘customer_order_item’ child tables, then the ORM configuration must be updated to force the ORM to include the shard key in WHERE clauses, otherwise the WHERE clause will only contain the primary key and there will not be sufficient information for dbShards to determine the target shard.

Example: UPDATE on child table without shard key

UPDATE customer_order SET quantity = ? WHERE order_id = ?

This UPDATE will fail since there is no shard key (although, if the ORM allows comments to be inserted into the SQL then the dbShards Shard Hints feature can be used to specify the shard key).

The solution to this issue is actually very simple. The ORM configuration can be updated to include the shard key in the primary key definition (note that no schema changes are required since we are just tricking the ORM into generating the correct SQL). With the ORM configured to use ‘order_id’ and ‘customer_id’ as the primary key for the ‘customer_order’ table, the following SQL will now be generated:

UPDATE customer_order SET quantity = ? WHERE order_id = ? AND customer_id = ?

Each query now contains the ‘customer_id’ shard key and dbShards can once again automatically determine which shard or shards to execute the query against without requiring the use of Shard Hints.

Hibernate/JPA Code Example

Here is the original CustomerOrder class using JPA annotations:

@Entity
public class CustomerOrder implements Serializable {
  @Id
  private int orderId;
  private int quantity;
  private long totalPrice;
}

Here is the modified CustomerOrder class to support sharding by ‘customer_id’:

@Entity
public class CustomerOrder implements Serializable {
  @EmbeddedId
  private CustomerOrderPK pk;
  private int quantity;
  private long totalPrice;
}

@Embeddable
public class CustomerOrderPK implements Serializable {
  @Id
  private int orderId;
  private int customerId;
}

Posted in Articles, Tutorial by Andy Grove - No Comments »

Database Sharding Configuration

Tuesday, June 15, 2010

Moving from a single database to a sharded database has some implications when it comes to configuring application server thread pools and connection pools. This article explains how to achieve the best performance and scalability from a sharded database and is applicable to any sharding solution.

Read the full article on Database Sharding Configuration.

Posted in Articles by Andy Grove - No Comments »

MySQL Replication HOWTO video

Saturday, September 12, 2009

I just posted this short video to YouTube to demonstrate how easy it is to set up MySQL replication and also to demonstrate how transactions are lost if the master database fails.

Posted in Articles, Tutorial by Andy Grove - No Comments »