Database Sharding

The Problem: Economic Database Scalability

It is not a secret that enterprise databases are growing in size as more data is collected. While many factors contribute to the problem, the primary factors are: a) the amount of information required to run critical systems only seems to grow over time; and b) the number of users and transaction volumes rise in proportion to the number of "on-line", Web-based and Software as a Service (SaaS) applications.

As databases grow in a linear fashion, response times unfortunately increase at an exponential rate. This puts a tremendous burden on overall application performance, with the database often becoming the key bottleneck in mission critical systems.

Database Sharding

Further, as databases grow they become significantly more costly, difficult to manage, and reliability schemes are far slower to "recover."

Many techniques have been tried to overcome these issues, often at great expense and complexity:

In summary, these techniques are complex, often require significant cost and application development changes, and often fall short of expectations.

The Solution: Database Sharding

"Database Sharding" is a new concept in horizontally partitioning databases, not just across disks, but across servers. Many of the largest online service providers, most notably Google, have developed in-house implementations of this architecture to meet application scalability requirements.

In essence, Database Sharding allows databases to be split across servers using application-specific logic. By understanding the dynamics of a specific application, large, transaction-intensive databases can be divided across independent database servers in a way that is compatible with the requirements of the application itself.

When a database is "sharded", it is essentially broken down into multiple smaller databases across multiple (typically low-cost commodity) servers, each yielding far greater performance through this highly distributed model. In addition to delivering better performance, individual "shards" (or database "chunks") can be managed much more easily, including replication, fault tolerance, and cross-data center reliability.

Database Sharding

As with all valid scalability and performance techniques, Database Sharding is not appropriate for every type of database application. However, a large segment of applications can work extremely well in this environment, most often those that are transaction intensive with high volumes and size. Some common target application categories include:

Although Database Sharding is gaining popularity, was up to now a lack of implementation tools. Without such tools, it is often difficult to shard existing applications without extensive re-architecture and re-development. For many companies, re-development is completely infeasible, given the need to continue business operations.

The Origins of Database Sharding

There seems to be general agreement that the term 'sharding' and its derivative Database Sharding was introduced by Google (there are examples of the use of the term prior to its use in Google, but not in the same context). It has been reported that the sharding terminology is in widespread and general use within Google's engineering teams.

Benefits of Database Sharding

There are many potential benefits of Database Sharding, some of which are application-specific and do not apply to all use cases. The most immediate benefit and the primary reason for Database Sharding is to provide horizontal scalability for applications and databases.

Database Sharding White Paper

Read about how Database Sharding helps many major companies to linearly scale their database applications.

Fields marked with an asterisk (*) are required.

E-Mail Address*
First Name*
Last Name*
Job Title*
Phone*

I would like to subscribe to the CodeFutures newsletter. This is a low volume list with roughly one email per month.