Posted by randfish
Imagine a world where even the high-up Google engineers don’t know what’s in the ranking algorithm. We may be moving in that direction. In today’s Whiteboard Friday, Rand explores and explains the concepts of deep learning and machine learning, drawing us a picture of how they could impact our work as SEOs.
For reference, here’s a still of this week’s whiteboard!
Howdy, Moz fans, and welcome to another edition of Whiteboard Friday. This week we are going to take a peek into Google’s future and look at what it could mean as Google advances their machine learning and deep learning capabilities. I know these sound like big, fancy, important words. They’re not actually that tough of topics to understand. In fact, they’re simplistic enough that even a lot of technology firms like Moz do some level of machine learning. We don’t do anything with deep learning and a lot of neural networks. We might be going that direction.
But I found an article that was published in January, absolutely fascinating and I think really worth reading, and I wanted to extract some of the contents here for Whiteboard Friday because I do think this is tactically and strategically important to understand for SEOs and really important for us to understand so that we can explain to our bosses, our teams, our clients how SEO works and will work in the future.
The article is called “Google Search Will Be Your Next Brain. ” It’s by Steve Levy. It’s over on Medium. I do encourage you to read it. It’s a relatively lengthy read, but just a fascinating one if you’re interested in search. It starts with a profile of Geoff Hinton, who was a professor in Canada and worked on neural networks for a long time and then came over to Google and is now a distinguished engineer there. As the article says, a quote from the article: “He is versed in the black art of organizing several layers of artificial neurons so that the entire system, the system of neurons, could be trained or even train itself to divine coherence from random inputs.”
This sounds complex, but basically what we’re saying is we’re trying to get machines to come up with outcomes on their own rather than us having to tell them all the inputs to consider and how to process those incomes and the outcome to spit out. So this is essentially machine learning. Google has used this, for example, to figure out when you give it a bunch of photos and it can say, “Oh, this is a landscape photo. Oh, this is an outdoor photo. Oh, this is a photo of a person.” Have you ever had that creepy experience where you upload a photo to Facebook or to Google+ and they say, “Is this your friend so and so?” And you’re like, “God, that’s a terrible shot of my friend. You can barely see most of his face, and he’s wearing glasses which he usually never wears. How in the world could Google+ or Facebook figure out that this is this person?”
That’s what they use, these neural networks, these deep machine learning processes for. So I’ll give you a simple example. Here at MOZ, we do machine learning very simplistically for page authority and domain authority. We take all the inputs — numbers of links, number of linking root domains, every single metric that you could get from MOZ on the page level, on the sub-domain level, on the root-domain level, all these metrics — and then we combine them together and we say, “Hey machine, we want you to build us the algorithm that best correlates with how Google ranks pages, and here’s a bunch of pages that Google has ranked.” I think we use a base set of 10,000, and we do it about quarterly or every 6 months, feed that back into the system and the system pumps out the little algorithm that says, “Here you go. This will give you the best correlating metric with how Google ranks pages.” That’s how you get page authority domain authority.
Cool, really useful, helpful for us to say like, “Okay, this page is probably considered a little more important than this page by Google, and this one a lot more important.” Very cool. But it’s not a particularly advanced system. The more advanced system is to have these kinds of neural nets in layers. So you have a set of networks, and these neural networks, by the way, they’re designed to replicate nodes in the human brain, which is in my opinion a little creepy, but don’t worry. The article does talk about how there’s a board of scientists who make sure Terminator 2 doesn’t happen, or Terminator 1 for that matter. Apparently, no one’s stopping Terminator 4 from happening? That’s the new one that’s coming out.
So one layer of the neural net will identify features. Another layer of the neural net might classify the types of features that are coming in. Imagine this for search results. Search results are coming in, and Google’s looking at the features of all the websites and web pages, your websites and pages, to try and consider like, “What are the elements I could pull out from there?”
Well, there’s the link data about it, and there are things that happen on the page. There are user interactions and all sorts of stuff. Then we’re going to classify types of pages, types of searches, and then we’re going to extract the features or metrics that predict the desired result, that a user gets a search result they really like. We have an algorithm that can consistently produce those, and then neural networks are hopefully designed — that’s what Geoff Hinton has been working on — to train themselves to get better. So it’s not like with PA and DA, our data scientist Matt Peters and his team looking at it and going, “I bet we could make this better by doing this.”
This is standing back and the guys at Google just going, “All right machine, you learn.” They figure it out. It’s kind of creepy, right?
In the original system, you needed those people, these individuals here to feed the inputs, to say like, “This is what you can consider, system, and the features that we want you to extract from it.”
Then unsupervised learning, which is kind of this next step, the system figures it out. So this takes us to some interesting places. Imagine the Google algorithm, circa 2005. You had basically a bunch of things in here. Maybe you’d have anchor text, PageRank and you’d have some measure of authority on a domain level. Maybe there are people who are tossing new stuff in there like, “Hey algorithm, let’s consider the location of the searcher. Hey algorithm, let’s consider some user and usage data.” They’re tossing new things into the bucket that the algorithm might consider, and then they’re measuring it, seeing if it improves.
But you get to the algorithm today, and gosh there are going to be a lot of things in there that are driven by machine learning, if not deep learning yet. So there are derivatives of all of these metrics. There are conglomerations of them. There are extracted pieces like, “Hey, we only ant to look and measure anchor text on these types of results when we also see that the anchor text matches up to the search queries that have previously been performed by people who also search for this.” What does that even mean? But that’s what the algorithm is designed to do. The machine learning system figures out things that humans would never extract, metrics that we would never even create from the inputs that they can see.
Then, over time, the idea is that in the future even the inputs aren’t given by human beings. The machine is getting to figure this stuff out itself. That’s weird. That means that if you were to ask a Google engineer in a world where deep learning controls the ranking algorithm, if you were to ask the people who designed the ranking system, “Hey, does it matter if I get more links,” they might be like, “Well, maybe.” But they don’t know, because they don’t know what’s in this algorithm. Only the machine knows, and the machine can’t even really explain it. You could go take a snapshot and look at it, but (a) it’s constantly evolving, and (b) a lot of these metrics are going to be weird conglomerations and derivatives of a bunch of metrics mashed together and torn apart and considered only when certain criteria are fulfilled. Yikes.
So what does that mean for SEOs. Like what do we have to care about from all of these systems and this evolution and this move towards deep learning, which by the way that’s what Jeff Dean, who is, I think, a senior fellow over at Google, he’s the dude that everyone mocks for being the world’s smartest computer scientist over there, and Jeff Dean has basically said, “Hey, we want to put this into search. It’s not there yet, but we want to take these models, these things that Hinton has built, and we want to put them into search.” That for SEOs in the future is going to mean much less distinct universal ranking inputs, ranking factors. We won’t really have ranking factors in the way that we know them today. It won’t be like, “Well, they have more anchor text and so they rank higher.” That might be something we’d still look at and we’d say, “Hey, they have this anchor text. Maybe that’s correlated with what the machine is finding, the system is finding to be useful, and that’s still something I want to care about to a certain extent.”
But we’re going to have to consider those things a lot more seriously. We’re going to have to take another look at them and decide and determine whether the things that we thought were ranking factors still are when the neural network system takes over. It also is going to mean something that I think many, many SEOs have been predicting for a long time and have been working towards, which is more success for websites that satisfy searchers. If the output is successful searches, and that’ s what the system is looking for, and that’s what it’s trying to correlate all its metrics to, if you produce something that means more successful searches for Google searchers when they get to your site, and you ranking in the top means Google searchers are happier, well you know what? The algorithm will catch up to you. That’s kind of a nice thing. It does mean a lot less info from Google about how they rank results.
So today you might hear from someone at Google, “Well, page speed is a very small ranking factor.” In the future they might be, “Well, page speed is like all ranking factors, totally unknown to us.” Because the machine might say, “Well yeah, page speed as a distinct metric, one that a Google engineer could actually look at, looks very small.” But derivatives of things that are connected to page speed may be huge inputs. Maybe page speed is something, that across all of these, is very well connected with happier searchers and successful search results. Weird things that we never thought of before might be connected with them as the machine learning system tries to build all those correlations, and that means potentially many more inputs into the ranking algorithm, things that we would never consider today, things we might consider wholly illogical, like, “What servers do you run on?” Well, that seems ridiculous. Why would Google ever grade you on that?
If human beings are putting factors into the algorithm, they never would. But the neural network doesn’t care. It doesn’t care. It’s a honey badger. It doesn’t care what inputs it collects. It only cares about successful searches, and so if it turns out that Ubuntu is poorly correlated with successful search results, too bad.
This world is not here yet today, but certainly there are elements of it. Google has talked about how Panda and Penguin are based off of machine learning systems like this. I think, given what Geoff Hinton and Jeff Dean are working on at Google, it sounds like this will be making its way more seriously into search and therefore it’s something that we’re really going to have to consider as search marketers.
All right everyone, I hope you’ll join me again next week for another edition of Whiteboard Friday. Take care.
Sign up for The Moz Top 10 , a semimonthly mailer updating you on the top ten hottest pieces of SEO news, tips, and rad links uncovered by the Moz team. Think of it as your exclusive digest of stuff you don’t have time to hunt down but want to read!