Matrix multiplication and AI: A short primer
Last summer, I spent a week at a conference dedicated to graphics-processing units—GPUs. It was presented by GPU big name Nvidia, a brand that is largely associated with gaming hardware. At the conference, however, gaming was a sideshow. For that matter, graphics themselves (excluding VR) were a sideshow, despite being in the actual name. In general, this was a machine learning conference, and, to most of the attendees, of course it was.
First, understand that there’s no magic to machine learning. It’s just math. And in the grand scheme of math, the basic ideas behind machine learning are even kind of simple, at least conceptually. Machine learning is optimization. Given some very long equation with a lot of variables in it, can we come up with a good/reliable way of tweaking those variables such that our very long equation spits out accurate predictions? While this may be a conceptually simple question to ask, actually computing the specific tweaks needed is labor-intensive.
To get some intuition about this sort of optimization, start just by thinking of cause and effect. The air outside is cold. Why? We might look at things like where the jet stream is; what the air pressure is; whether it’s cloudy or sunny out; how much moisture is in the there; and-or what season it is. I’m no meteorologist, but those seem like things that might reasonably predict the air temperature outside, so if, say, we didn’t know the air temperature ahead of time, but we knew all of this other stuff, we might be able to predict the temperature reliably.
The weighted equation is what’s normally called a model. It models relationships that exist in the world and so it has predictive utility. The hard math is in how we come up with the model, or how we figure out how important each of those different observations are relative to the other ones.
We do this by taking a lot of observations and doing a lot of optimizations one after another. Each one would then look something like the following:
For the resulting weights to be meaningful, we have to do this a lot, with a lot of observations. Training a real-life machine learning model might involve doing this same thing millions of times, with each iteration tweaking those weights just a little bit to better optimize the resulting model.
Obviously, that’s a big reduction, but the thing to understand is that what we wind up doing in machine learning is crunching together big matrices of numbers. It so happens that this is what happens in graphics processing too, where the matrices instead represent pixels. Computing graphics is all about doing computations across big matrices of pixel data, updating each one. That’s what GPUs exist for: doing computations across big matrices. Massive parallelization.
How is this different from normal computing? The key thing is parallel computation. Generally, in a CPU, we imagine things happening sequentially. This makes sense for computations that depend on each other, where one computation has to wait for another to complete because it depends on the result of the earlier computation. Given this sort of computing, adding more and more cores doesn’t wind up adding all that much computing power, but a GPU can wield hundreds of individual cores and wind up hundreds of times more powerful.
We can make machine learning algorithms work faster simply by adding more and more processor cores within a GPU. That tends to be an easier engineering problem than those faced by conventional CPUs where parallelism can help with performance sometimes, but finding and implementing that utility is pretty hard. That’s why GPUs are so important to machine learning, and, increasingly, vise versa.