The (EC2) Elastic Compute Cloud from Amazon gives businesses a chance to hire scalable servers and host their services and applications remotely instead of having to pay for and handling the management and infrastructure of the resources themselves. The service, which went into beta stage more than 10 years ago, has focused primarily on CPUs, but that is changing now because of their partnership with Nvidia.
Amazon is now going to be offering P2 instances which include the K80 accelerators of Nvidia that are based on Kepler architecture. This may surprise those who are informed about the graphics market since Maxwell hasn’t been available post-2014. Maxwell was designed explicitly as a workstation and consumer product. It was never supposed to be an HPC part.
The new instances of P2 from Amazon are going to offer as many as eight K80 GPUs with 12 GB RAM and 2,496 CUDA cores on a single card. All of the K80s support memory protection and offer 240 GB/s worth of memory bandwidth too.
One of the reasons why Amazon chose to do this is because of the von Neumann bottleneck. This imposes limits on additional CPU power values.
This is an oversimplification of the actual problem, though. John von Neumann, in 1945, had described a computer where data and program instructions were stored in a single memory pool and accessed by one bus.
In systems which used such models, the CPU can access data or program instructions, but only one at a time. It can’t simultaneously copy data and instructions and can’t transfer data from main memory as quickly as it can do work once information has been loaded. Since the CPU clock speeds have increased quicker than memory performance, CPU spent a lot of time waiting to recover data. This wait-state is called the von Neumann bottleneck, and it was a major problem by the 70s.
There’s an alternative architecture called Harvard architecture which offered a solution. In these chips, data, and instructions had separate physical storage and buses. Most chips, including the ones built by AMD and Intel today, can’t be described clearly as von Neumann or Harvard. Like RISC and CISC, CPUs today are described as nothing more than modified Harvard architectures.
Modern chips from AMD, ARM, and Intel all use a split L1 cache with data and instructions stored separately. They use branch predictions for determining which code paths are the most likely to get executed. They can store both instructions and programs in case information is required once again.
There has been progress while challenging the von Neumann bottleneck using hardware. But the consensus remains that the programming standards changes which were needed didn’t really took root.
There is no clear indication why Amazon chose to go down this route. But including GPUs in their EC2 service does make sense. Deep learning, self-driving cars, and AI are all being discussed off late with a lot of smaller companies and corporate funding trying to build positions for themselves in this nascent market.