Oracle’s Public Cloud is currently the only cloud platform that provides the dedicated hardware and software required for scaling its database in a cluster as a service. However, Oracle’s cloud currently doesn’t have the scale and breadth of third-party software in order to support the same variety of Web-scale applications growing up on AWS, Azure, and Google. We believe that for the next two years, customers wanting to extend their Oracle-based Systems of Record into Web-scale Systems of Intelligence have three options:
- Build new applications in public clouds such as AWS, Azure, or Google and maintain what is likely to be a very high-latency link to business critical transaction processing on Oracle’s cloud.
- Install a private cloud using their own hardware/software or Oracle “Engineered Systems” such Exadata in mega data centers such as Equinix that have low-latency direct connections to the major cloud providers. That setup should provide the tightest integration until Oracle’s cloud either matures or provides direct, low latency connections to the major public cloud providers.
- Design and develop a hybrid approach to extending the function of existing transactional systems of record using a simplified table look-up approach. These tables can be feed from the systems of intelligence either locally or from the cloud-based solutions.
New Web-scale and consumer-facing applications have largely been growing up on Amazon AWS and more recently on Microsoft Azure, and Google Cloud Platform. At the same time, large-scale traditional Systems of Record have been anchored at their core by Oracle, IBM, and to some extent on Microsoft SQL DBMS’s. Discussions with practitioners confirm that these databases aren’t the most “cloud-friendly” because they don’t yet support the elastic scaling necessary for true Web-scale applications.
Rather than creating new silos of data, we believe successful customers will integrate their systems of record with systems of intelligence. The new applications ultimately have to leverage and augment the core business processes customers have been automating for several decades. The key issues currently preventing Oracle from scaling on AWS, Azure, and Google are likely to lessen over time. Specifically:
- Oracle currently requires a highly tuned, dedicated hardware infrastructure to run in a cluster.
- Oracle’s infrastructure requirements are likely to relax and public cloud infrastructure is likely to move closer to Oracle’s requirements over time, though this may take a couple of years.
Oracle has put forth its strategy and is making a large commitment to the cloud, but it still needs to mature in several aspects.
- First, it needs to continue to build out data centers in more countries to satisfy local regulations.
- Second, Oracle has to evolve a significant breadth of third-party software managed by other vendors as services. This approach provides customers with the choice to consume the software as services rather than having to manage it all the way down to the metal.
- In addition, Oracle has to offer location of its services in mega datacenters that offer very low-latency low-protocol connections to internet data on data aggregation services.
Oracle currently requires highly tuned, dedicated hardware infrastructure to run in a cluster
On the first issue, Oracle’s current need for highly tuned, dedicated infrastructure is in conflict with the norm of most public cloud infrastructure. Cloud platforms are about shared infrastructure by definition. Mega clouds use shared infrastructure to separate the software from the underlying hardware infrastructure. That’s what allows the management software to configure the hardware and software separately for maximum utilization and automated administration.
Origin of Oracle infrastructure requirements
Oracle’s infrastructure requirements for scalable computing took shape in the late ’90s and extended well into the 2000s. The gold standard for clustered databases involved Oracle, Sun servers, Veritas for a clustered file system, and EMC for shared storage. The cluster actually required three dedicated networks.
One network connected just the Oracle database nodes. Another connected Oracle with the shared storage network so that each database node could communicate with maximum bandwidth and minimal interference. And then there was the network to clients accessing the database. While Oracle ran with great resilience for high availability and scalability in this configuration, it actually took a lot of specialized expertise to set it up.
Oracle started lifting the burden of building scalable clusters for customers with its Exadata engineered appliances starting in 2007. Crucially, Oracle didn’t just configure cluster hardware and pre-install the software, it took a more integrated approach by engineering hardware and software in tandem.
Dedicated, optimized infrastructure
Oracle optimized both the hardware and software to leverage the capabilities of each other. For example, the storage servers in the SAN contained some of the database software that made it possible to filter out data that didn’t need to be sent back across the storage network to the database servers. Pushing some of the filtering down to the storage level greatly accelerated performance by saving time and bandwidth.
Those same networks from the original Oracle clusters also exist in Exadata, only in this case they run with even better performance in the form of the very high performance Infiniband network. For example, the “hot” data that each database node keeps in memory is deeply integrated with the Infiniband network so that a cluster can scale out to include more nodes without bottlenecking. There are dozens of other integrations and optimizations that make Exadata far and away the most scalable and lowest admin cost platform for running the full Oracle stack (from infrastructure through apps) in a clustered environment.
The only problem is that there are no public clouds other than Oracle’s that contain this hardware. Even without the Exadata optimizations, they don’t even contain sufficiently dedicated hardware to run Oracle in clustered environment. On the one hand, this makes Oracle’s cloud more attractive for customers but it also limits their current options.
Oracle’s cluster infrastructure requirements are likely to relax and public cloud infrastructure is likely to move closer to Oracle’s requirements over time
This constraint of limiting Web-scale Oracle applications to the Oracle Public Cloud isn’t likely to be a permanent issue. It’s worth noting that the Oracle Public Cloud offers the option of running the database in a cluster without specialized hardware such as the Exadata appliances. But there are performance caveats. Specifically, this approach scales, but it doesn’t have all the optimizations and hence minimizes Oracle’s integrated advantage.
Understanding these specialized hardware requirements is relevant to identifying where other public clouds currently come up short in their infrastructure. When they ultimately support Oracle clusters, it will likely be on less optimized infrastructure than Exadata but will still have to address three limitations:
- Virtual machines
- Noisy neighbor networks
Public clouds don’t have shared storage. Their storage connects to individual servers when the servers spin up. That allows customers to separate their storage needs from their compute needs. That way compute can be idle even while data gets stored for later use. However, AWS announced it will have shared NFS storage later this year. Current spec’s call for EC2 instances that can sustain throughput as high as 1Gbps to shared storage volumes. Oracle engineers Exadata appliances with greater than an order of magnitude more bandwidth. We will need to see how Oracle performs under these conditions but it is significant step in the right direction.
Virtual machines are a very heavyweight way of sharing compute, memory, and network infrastructure. Each database running on each server node needs a hypervisor and a guest operating system, in addition to the shared, host operating system. Starting in 12c, Oracle uses lightweight containers, which allow multiple pluggable databases to share one container, without each database requiring its own hypervisor and operating system. As containers gain acceptance in cloud environments, we expect Oracle won’t be penalized by the overhead of virtual machines. But for now, the use of VM’s as shared infrastructure is a significant performance penalty.
Noisy neighbor networks
Noisy neighbor networks refer to traffic coming from multiple VM’s on a single server instance. Unless networks can have their own quality of service (QoS) associated with each VM or container, the cluster will often spin its gears waiting for data to flow between servers and between servers and shared storage.
Getting to critical mass
Collectively, AWS, Azure, and Google public clouds have data centers in a so many countries that customers can store data just about wherever local regulatory rules require. Their economies of scale have also reached another form of critical mass. Customers used to be able to justify on-premises infrastructure if their capacity requirements were predictable. The need for variable capacity used to be the most easily justified in moving to the cloud. Now, customers say that’s not necessarily true anymore with these vendors. The billions of dollars per year on capex and management automation tools have gotten them to the point where they say they can’t even have intelligent discussions about data center design with their largest customers.
Oracle positions its cloud as having a rich enough offering to satisfy customers with multi-vendor platform requirements. But even if customers can install their favorite software on Oracle’s IaaS, there are dozens or even hundreds of third-party managed services on AWS. Rather than a customer having to install and manage Cassandra, Hadoop, and other software on AWS, other vendors will run them as if they were standard IaaS services.
These third party managed services don’t exist as yet Oracle’s Public Cloud. They make AWS and Azure richer platforms than just counting their native services. Oracle’s Public Cloud will grow in breadth over time, no doubt. But customers building sophisticated applications requiring highly-specialized services need to consider primarily AWS, Azure, and Google for now.
We believe Oracle will announce a version of 12c this summer that will run in a cluster without shared storage. Given historical practice, it will likely be another year before customers, and cloud providers, are comfortable putting it into production. It will likely still require a high performance network without noisy neighbors, but it should embrace a greater portion of public cloud infrastructure.
Although most customers who have built applications on AWS, Azure, and Google Public Cloud have typically used scale-out, clustered databases that don’t need highly tuned, dedicated infrastructure, many Oracle customers still have business critical OLTP and data warehousing needs.
For now they have to run their critical Oracle workloads in Oracle’s cloud or in a facility with a direct connection to AWS, Azure, or Google. They need to pay closest attention to AWS as the cloud most likely to support Oracle database clusters first. Until then, Oracle customers will need some patience as the Oracle Public Cloud reaches a critical mass of scale and matures in its third-party software support.