Shift from Relational Databases to vertical databases

Commenti · 2394 Visualizzazioni

A write up on how relational databases are not efficient in serving today's market needs.

Relational database management systems are remarkably successful in capturing the DBMS marketplace. One relational engine will cover all DBMS needs. The major vendors sell software that is a quarter century old, has been extended and morphed to meet today’s needs. These legacy systems are at the end of their useful life.

In the data warehouse market, a column store beats a row store by approximately a factor of 50 on typical business intelligence queries. The reason is because column stores read only the columns of interest to the query and not all of them. In addition, compression is more effective in a column store. Since the legacy systems are all row stores, they are vulnerable to competition from the newer column stores.

In the online transaction processing (OLTP) market, a lightweight main memory DBMS beats a row store by a factor of 50. Leveraging main memory and the fact that no DBMS application will send a message to a human user in the middle of a transaction, allows an OLTP DBMS to run transactions to completion with no resource contention or locking overhead.

To store Resource Description Framework (RDF) data, which is popular in the bio community and elsewhere, then “Scalable Semantic Web Data Management Using Vertical Partitioning” points out that column stores are very good at certain RDF workloads. In addition, other ideas, such as “RDF-3X: A Risc-style engine for RDF,” will beat conventional DBMSs in other situations. Lastly, native RDF engines (e.g., Virtuoso, Sesame, and Jena) may well gain traction.

Text applications have never used relational DBMSs. If we use a relational DBMS to store the results of Web crawling but found RDBMS to be two orders of magnitude slower than a home-brew system. All the major Web-search engines use home-brew text software to serve us search results.

Even in XML, where the current major vendors have spent a great deal of energy extending their engines, it is claimed that specialized engines, such as Mark Logic or Tamino, run circles around the major vendors, according to a private communication by Dave Kellogg.

If the user’s data is naturally something other than tables and if simulating his natural data model on top of tables is awkward, then chances are that a native implementation of the natural data model will significantly outperform a conventional RDBMS. This is certainly true in scientific data.

If something other than a row store accelerates the user’s queries, then a direct implementation of the relational model using non-row store technology will run circles around a conventional RDBMS. This is true in the data warehouse marketplace.

Current row stores give a single implementation that suits all of the transactions. This can be radically beaten if a user has lesser requirements or if the system can take advantage of workload specific features. This is true in the OLTP marketplace.

The replacement will be a collection of vertical market specific engines, with much higher performance. If performance is not a priority, then use the open source relational DBMSs. They are mature, reliable, and, best of all, they are free.

References

Michael Stonebraker et al., “C-Store: A Column-oriented DBMS,” Proc 2005 VLDB Conference, Trondheim, Norway, Sept. 2005.

Michael Stonebraker et al., “The End of an Architectural Era (It’s Time for a Complete Rewrite)” Proc 2007 VLDB Conference, Vienna, Austria, Sept. 2007.

Dan Abadi et al., “Scalable Semantic Web Data Management Using Vertical Partitioning,” Proc. 2007 VLDB Conference, Vienna, Austria, Sept. 2007.

Thomas Neumann et al., “RDF-3X: A Risc-style engine for RDF,” Proc VLDB Endowment, 1(1): 647-659 (2008)

Commenti