So in 2006, Nvidia introduced the CUDA platform. CUDA allows programmers to create “kernels,” which are short programs designed to run on a single execution unit. Kernels allow you to break large computing tasks into bite-sized chunks that can be processed in parallel. This allows certain types of calculations to be completed much faster than using only the CPU.
However, there was little interest in CUDA when it was first introduced. Stephen Witt wrote Last year’s New Yorker magazine wrote:
When CUDA was released in late 2006, Wall Street reacted with dismay. Mr. Huang was bringing supercomputing to the masses, but the public showed no signs of wanting it.
“They were spending a lot of money on this new chip architecture,” said Ben Gilbert, co-host of the popular Silicon Valley podcast Acquired. “They were spending billions of dollars targeting an obscure corner of academic and scientific computing that wasn’t a huge market at the time. It was certainly less than the billions they were putting into it. ”
Huang argued that the mere existence of CUDA will expand the supercomputing field. This view was not widely accepted, and by the end of 2008, Nvidia’s stock price had fallen 70%.
CUDA downloads peaked in 2009 and then declined for three years. Board members were concerned that Nvidia’s low stock price would make it a target for corporate raiders.
Huang didn’t have AI or neural networks in mind when he created the CUDA platform. However, it turns out that Hinton’s backpropagation algorithm can be easily broken down into bite-sized chunks. So training neural networks turned out to be a killer app for CUDA.
According to Witt, Hinton recognized CUDA’s potential right away.
In 2009, Hinton’s research group used Nvidia’s CUDA platform to train a neural network to recognize human speech. He was surprised by the quality of the results he presented at a conference later that year. He then contacted Nvidia. “I sent out an email saying, ‘Look, I told 1,000 machine learning researchers that they should go buy Nvidia cards. Can you send them to me for free?'” Hinton told me. “They said no.”
Despite the scorn, Hinton and his graduate students Alex Krizhevsky and Ilya Satskeva Nvidia GTX 580 GPU For AlexNet project. Each GPU has 512 execution units, allowing Krizhevsky and Sutskever to train their neural network hundreds of times faster than using a CPU. This speed allows us to train larger models and train models with more training images. And all that extra computing power is needed to tackle large ImageNet datasets.