Many clients looking to do public cloud computing are interested in having the same performance they have on their on-premise HPC clusters on the cloud, and sometimes assume they require “bare metal” servers to achieve it.
TotalCAE supports a wide variety of hardware infrastructure including on-premise HPC clusters, IaaS cloud vendors, and bare metal clouds with our TotalCAE platform and HPC management services. This post is to help clients understand that engineering applications on the cloud can achieve good performance without bare metal, and three risks to avoid when choosing a bare metal hardware cloud provider.
Bare Metal Public Cloud vs. Cluster vs. Public Cloud Performance
Advances in computing technology has resulted in negligible performance differences between “bare metal” where the operating system has direct access to the hardware, and “not bare metal” where there is a thin layer between the operating system and the raw hardware that uses hardware assisted technologies such as SR-IOV.
Below is the performance of a standard engineering benchmark ran across bare metal on-premise systems, multiple standard cloud systems, and bare metal cloud systems.
Can you guess which runs were on-premise, vs. cloud, vs. bare metal cloud? No!
The performance is comparable between them, modern public cloud HPC systems and bare metal HPC systems have achieved performance parity today.
With a properly configured and tuned cloud HPC system, a “bare metal” cloud is not a mandatory requirement for good engineering application performance.
Top three risks to avoid in bare metal cloud.
Since customers have low level access on many bare metal clouds to the direct hardware, this creates three major risks to avoid on bare metal clouds if you need to use one:
1. “Double Dip” Risk
2. DIY Data Protection Risk
3. Trust Model Risk
Bare metal cloud vendors vary widely in their mitigations of these three risks, which we will discuss in the following section.
Double Dip Risk
A “double dip risk” is the risk of having the bare metal server infected by an evil customer prior to you renting it. This is like someone double dipping in the chip dip at a party before you got there, it is hard to know if previous bad server hygiene could infect you.
This is not a theoretical concern as researchers at Eclypsium published earlier this year they were able write to the the management processor on a major bare metal cloud provider which persisted across later rentals to other customers. This hardware vendor was not properly sanitizing the bare metal cloud servers between different renters.
Some bare metal providers have had to implement mitigations to this risk such as AWS and OCI. You can see a a technical talk on how OCI created custom hardware to overcome the challenge of securing servers not natively designed with the idea of being shared between hostile renters. The AWS bare metal offering implements custom hardware and custom ASICS that prevents clients from re-programming the underlying hardware on the AWS bare *.metal instances to remove this double dip risk.
DIY Data Protection Risk
Often the power of having bare metal access on cloud means you need to handle things such as encryption at rest and in transit to ensure that when the hardware is “re-rented” that your data can’t be recovered. A study from a few years ago showed 78 percent of drives that were discarded and not encrypted contained residual data that could be recovered. The same is true on cloud, if you don’t encrypt data on hard drives there is a possibility that data could be recovered from the local disks of the bare metal compute by the next tenant, even if the data had been erased.
Trust Model Risk
Some bare metal providers have trust model risk, which means they trust that users will not gain administrative privileges to the server when sharing tenants. In this shared bare metal model, if a hostile client was able to gain administrative access they could snoop on other tenant network traffic or files. These are the highest risk providers that share infrastructure between tenants with only logical controls separating them. It is not a good assumption that a determined hostile tenant will not gain administrative privileges on shared infrastructure.
In summary, to achieve good engineering application performance on the cloud does not require bare metal. If you do choose to go the bare metal route, be aware that not all bare metal cloud offerings are created equal. Be sure to understand what your bare metal hardware cloud provider is doing to mitigate these top three risks.