As an expert in the object database technology segment, I am often asked, "Where are there industry standard benchmarks which show the ODB's relative performance to competing products in the RDB space?".
I do think it would be great to have some kind of apples to apples comparison. It's one thing to say, "10 of the 10 largest telecommunications equipment providers use an object database to manage their network" ( a true statement - though I'm borrowing a line from Oracle's marketing dept) and another thing to have a piece of code which anyone can run to prove the point that an ODB is faster and more scalable than a relational database for applications with complex models.
In the past, none of the standard RDB benchmarks made any sense for the object database, because the models were so very simple they did not highlight the ODB advantage. When the application models are simple, most systems will perform similarly with the performance of a b-tree index lookup on a primary key. It's only when the model becomes complex, like in the real world, when true performance and scalability will be apparent.
So, I was recently encouraged when the TPC council released the TPC-E benchmark for OLTP systems proclaiming it was "a reflection of the true complexity found in today's software systems". Indeed, the TPC-E is centered around a model for online financial trading, which is an application area bountiful with object database implementations, especially in areas such as options trading and analytics where models are especially complex.
So, I was looking at the latest TPC-E benchmarks and planning to implement the TPC-E for the Versant Object Database. This is where I discovered the inequity of the TPC benchmarks and came to the realization that somewhere the software industry has become so biased to relational technology that it's tailoring the "industry standard benchmarks" specifically towards RDB technology. This is despite a growing number of obvious database exceptions where relational is proving not to hold up and alternative technologies are being used including: object databases, distributed transactional caching, BigTable, TerraData, GreenPlum and similar tech.
The TPC benchmarks are done in such a way as to require the implementer to use a "driver" program which is provided and cannot be modified. Then you implement the data layer interfaces to drive the transactions through to the database and measure the transactions per second for performance. Using these numbers, predictions are made and total cost of ownership numbers are produced for theoretical systems driving millions of users.
The driver is written using C++ classes to model the trade information: Account, Customer, Holding, Holding History, Broker, Charge, Trade, etc over what the benchmark defines as 33 "tables". This is where TPC looses touch with reality, the driver expects raw scalar data in and out, not objects. The measurements of transactions per second are measured at the in/output of the driver at the data layer neglecting the drivers translation into C++ objects. So, even though in the real world anyone building such an application will need the processing power to create C++ objects ( translating into reduced transactions per second and increased hardware footprint and total cost of ownership ) , these are expressly removed from the TPC-E ( TPC in general ) results.
Now as an alternative database technology vendor it gets even more interesting. So, there is a desire to use this "industry standard OLTP benchmark" and so the implementation of the transaction layer under the driver is done to derive results. Well, since an object database works with C++ objects natively, it now must unmap and map C++ objects into raw data in order to feed the driver layer. Not only does the TPC-E eliminate the cost of object to relational mapping in the benchmark for a relational database and falsely state the true total cost of ownership in the system, but at the same time it forces and object database to add the overhead of mapping and unmapping, something which is totally unnecessary even for the driver layer when using an object database and adds this workload to the performance numbers as part of the benchmark integration.
The TPC-E removed real world overhead associated with object relational mapping from the performance/TCO results and added this overhead to the object database a technology with the very design goal to eliminate such an overhead.
Now I ask, Does it not seem here that the TPC benchmarks have lost touch with the reality of what is really the cost of computing and TCO of today's applications?
Another question: When you run your application, where are most of the machines, in the business logic middle tier layers ( driver ) or in the database layer? If an object database optimizes the middle tier by removing mapping overhead and causes a 30% reduction in system footprint of the largest layer, why would anyone even care about the TCO results of TPC-E whether they are true or false when it represents such a small percentage of the overall system?
Seems to me, as an industry we've at times got the cart before the horse.
Wednesday, August 5, 2009
Subscribe to:
Posts (Atom)
