dbShards provides the following features to make database sharding simple to implement.
Cross-shard Transactions
One of the challenges of database sharding is how to deal with transactions that need to write data to multiple shards. For example, in an online game that is sharded by user, there could be transactions that involve a user challenging another user to an activity, requiring data to be written to two different shards.
dbShards supports cross-shard transactions directly from the database driver by maintaining one transaction for all shards that are affected by a given set of transactional SQL statements. This is transparent to the application. Once the application issues the commit command, the driver utilizes dbShards patent-pending replication technology (see dbShards/Replcate) to ensure full reliability of the distributed transaction across all affected shards.
Extensive language/framework support
dbShards provides a JDBC driver for Java applications which is compatible with all major persistence APIs and frameworks including Spring JDBC and Hibernate.
dbShards provides a native MySQL-compatible C driver (replacement for libmysqlclient) which is compatible with all major scripting languages including PHP, Python and Ruby and is compatible with popular ORM frameworks such as PDO and Django.
Dynamic Query Sharding
The dbShards driver introspects each SQL statement and automatically determines which shard(s) to run the query against, requiring few changes to existing application code. dbShards provides a JDBC driver for Java applications and a native C driver for applications implemented in C or scripting languages such as PHP, Python, Ruby and Perl.
Shard Hints
Sometimes an application may choose to override the default sharding behavior in the dbShards driver. In these situations a comment can be placed in front of the SQL to tell the dbShards driver how to shard the query.
Virtual Shards
Because the number of shards may increase over time as an application becomes popular and attracts a larger user base, dbShards maps shard keys (e.g. user id) to a virtual shard number. The number of virtual shards is fixed and is usually a relatively high number (typically 100 or 1000). dbShards then maintains a mapping between virtual shards and physical shards that can change over time. For example, assuming there are 100 virtual shards and a ‘user’ table is sharded using the modulus of user ID, user ID 1234 will be mapped to virtual shard 34 and this relationship is fixed. The application could start out with 2 physical shards where virtual shards 1-50 are in physical shard 1 and virtual shards 51-100 are in physical shard 2. In the future the database can easily be re-sharded across 4 shards so that physical shard 1 now contains virtual shards 1-25, physical shard 2 contains virtual shards 26-50, and so on.
Parallel Query
Sometimes a query needs to be executed across multiple shards. For example if a ‘user’ table is sharded then an aggregate query such as “SELECT count(*) FROM `user`” needs to run against all shards. dbShards supports this by sending the query from the dbShards driver to a Query Agent process on one of the shard servers. This query agent then forwards the query in parallel to all other query agents and the query is run in parallel against all shards. The results are combined real-time by the originating query agent and then streamed back to the driver as a single result set.
Because each shard is smaller and faster than the original non-sharded database, each parallel query runs much faster, often an order of magnitude faster.
Reliable Replication & Failover
In a sharded environment there are more databases and therefore more points of failure, so reliable replication is even more important. dbShards uses proprietary patent-pending replication technology to guarantee that no transactions are lost in the case of a primary shard failing. In the case of a primary shard failure, dbShards can quickly failover to the secondary shard with no transaction loss.
Failover for planned maintenance
A common problem for live applications with large databases is the amount of time it takes to perform database maintenance operations such as adding an index to a table or running a ‘check table’ operation. This is often impossible to do on a live application without downtime. Using the failover capabilities in dbShards it is easy to solve this issue. The secondary shards can be taken offline by pausing replication (live transactions are still being replicated to log files but not being applied to the database) and database maintenance can then be performed. Once the database maintenance is complete, replication on the secondary can be resumed. Once replication has caught up, the primary and secondary shards can be switched in a matter of seconds with no downtime (some transactions will fail during the switchover but integrity between primary and secondary is guaranteed). The new secondaries can now be updated using the same process.
Pluggable Replication Agent
The dbShards replication agent had a pluggable architecture, allowing custom components to be injected into the replication stream. This is a powerful feature for asynchronously processing the transaction stream. An example use case would be producing summary information to be inserted into a data warehouse.
Pluggable Query Agent
Custom components can be hosted in the query agent, allowing stored procedure-like functionality to be implemented efficiently in Java. The query agents typically reside on the same host as the database server, making this efficient for certain categories of query.
Sharding Tools
dbShards contains all the necessary tools for the initial sharding of a single database into multiple shards and for re-sharding a sharded database into a different number of shards.

