All Things Digital

Skip to main content.

Digital Daily

Google and the Evolution of Search I: Human Evaluators

The goal is to enable Google users to be able to ask the question such as ‘What shall I do tomorrow?’ and ‘What job shall I take?’…We are very early in the total information we have within Google. The algorithms [software] will get better and we will get better at personalization.

– Google CEO Eric Schmidt

qualiterrole

For many years, Google (GOOG), on its Explanation of Our Search Results page, claimed that “a site’s ranking in Google’s search results is automatically determined by computer algorithms using thousands of factors to calculate a page’s relevance to a given query.”

Then in May of 2007, that statement changed: “A site’s ranking in Google’s search results relies heavily on computer algorithms using thousands of factors to calculate a page’s relevance to a given query.”

A slight adjustment in wording, but an important comment on the supremacy of the algorithm that Google had touted for years. Google had finally acknowledged that its search results were no longer solely and automatically determined by the company’s vaunted algorithms. Now they simply “relied heavily” on them. Why the sudden change?

Google claims it was arbitrary, unrelated to any sudden philosophical shifts within the company. But it seems far too specific an adjustment to chalk up to a random brand-management edit. We are, after all, talking about the company’s official explanation of its search results. And indeed, sources say the language was changed to account for the continual calibration of the algorithm, which these days is done with a bit of human help.

Google, for example, employs a vast team of human search “Quality Raters” (You’ll find a copy of an old training manual here). Spread out around the world, these evaluators, mostly college students, review search returns against established criteria–testing different algorithms and see which works “best” in predicting the quality of a site (though not directly judging the quality of any individual site itself).

They’re aided by Google’s own registered users, who can now, when logged into their Google accounts, promote and delete sites from their own search returns according to their preferences. These data too are used to tweak and further optimize the algorithm. So Google’s objective evaluation and ranking of Web sites is to some extent defined by subjective reasoning of a collective human intelligence. And so it must be if Google is to continue returning search results that we perceive to be the “best” answers to our search queries.

In interviews serialized over the next three days, key Google engineers with central roles in managing the company’s search engine discuss resources and techniques they use to optimize the system for users world-wide. The series kicks off below with Engineering director Scott Huffman, who oversees the company’s search evaluation team. Senior Google software engineer Matt Cutts appears tomorrow. And Google Fellow Amit Singhal wraps up the series on Friday.

Google and the Evolution of Search

  1. Human Evaluators — Google Engineering director Scott Huffman
  2. Cheating the System — Google software engineer Matt Cutts
  3. What’s Next in Search? Much, Much Better Search — Google Fellow Amit Singhal

Part I: Scott Huffman

John Paczkowski: How do you maintain quality in search ranking?

Scott Huffman: We are constantly evaluating the quality of our results in something like a hundred different locales and language tiers all around the world. So every day, we are looking at a random sample of grades that we think represent the queries we get from users. Evaluators look at the quality of each result relative to those queries. We are constantly tracking a pretty wide array of different kinds of quality signals that come through our text.

JP: Talk a bit more about the human element here. You’ve hired people to evaluate pages?

SH: Yes, we have folks around the world who are trained to evaluate the quality of results. We like them to be in-country so they understand the culture and that type of thing. And then we have a work flow system that feeds them different kinds of evaluation tasks. Things like “tell us how good you think this result is for this query.” And then out of the data, we produce a set of aggregate metrics that we look at and that we can track over time.

JP: So how many of these evaluators are there?

SH: How many? I don’t think we can talk about the exact number, unfortunately.

JP: Ballpark? I’ve heard 10,000.

SH: Well, the number actually is pretty large and that’s for a couple of reasons. One is that, like I mentioned, we try to do an evaluation pretty broadly across all of the locales Google is in, and there are a lot of them. So you’re already talking about a pretty large group of people. Secondly, we prefer a larger group to a narrow one because we want to use our evaluations to give us an independent picture of our quality. We get a lot of queries from all over the world so we need a broad base of people to help us understand how good our results are for them.

JP: So are these raters college students or random folks responding to a job post? What are the requirements?

SH: It’s a pretty wide range of folks. The job requirements are not super-specific. Essentially, we require a basic level of education, mainly because we need them to be able to communicate back and forth with us, give us comments and things like that in writing.

JP: And how are they trained?

SH: The training is pretty simple. There are manuals and video training and, ultimately, participation in the rating program. We help them understand what it means for search results to be highly relevant and useable for the viewer. Is there a dominant result for a particular query today? If so, it should be right there at the top. Take a broad-based query like…“Olympics.” If a user searches for “Olympics,” the results from the 1996 Olympics are not as interesting as the ones from the 2008 Olympics.

JP: So how do you vet data provided by the raters? Is there any quality control?

SH: Well, the raters work in-country, so we don’t see them everyday. And we don’t typically talk to them on the phone. We have some automated measures that account for things like, say, evaluators who consistently say two sites in a side-by-side comparison are about the same. We also have moderators. But ultimately, the real quality control is done by the folks who are working on ranking and search UI. They’re the ones who understand why we are better today in China than we were a week ago or a month ago. What changed? What are we are doing better? The evaluation program really just gives our engineers an aggregate measure of how good their algorithms are so they can improve them.

JP: So you’re describing a process in which these evaluators are going to specific Web pages and rating them according to a specific criteria. Do these data have any effect on those sites’ page ranks or pay-per-click and Ad Word bids?

SH: We don’t use any of the data we gather in that way. I mean, it is conceivable you could. But the evaluation site ratings that we gather never directly affect the search results that we return. We never go back and say, “Oh, we learned from a rater that this result isn’t as good as that one, so let’s put them in a different order.” Doing something like that would skew the whole evaluation by-and-large. So we never touch it.

JP: Let’s backtrack a little bit. How did this project begin? Who came up with it? What were its origins?

SH: Well, from the earlier days of Google, of course, we have always been interested in measuring how well our search algorithms are doing. I wasn’t here, but what I understand is that way back when there was a set of Sergy’s favorite 10 queries, people would run those and they would make sure that any change they made to the ranking algorithms would make those work. Obviously, as Google grew in traffic and reach, it needed a broader set of queries, and there was a realization that we really needed to have evaluators in the countries we service who understand the culture to do that well. We needed a team that could evaluate results from the users’ perspective.

Google and the Evolution of Search

  1. Human Evaluators — Google Engineering director Scott Huffman
  2. Cheating the System — Google software engineer Matt Cutts
  3. What’s Next in Search? Much, Much Better Search — Google Fellow Amit Singhal

Comments

  1. So Google finally talks about their Quality Raters system!

    yrs,
    andreas
    andreas.com

    Posted by andreas ramos at June 3rd, 2009 at 8:54 am
  2. All I know is that my Google search results are no better today than they were five years ago. I get good results on garden-variety searches that virtually any search engine could handle, but am frequently frustrated by the poor quality of my search results when I’m looking for something a little more esoteric. Come to think of it, why is it that I still can’t sort my search results chronologically after all these years???

    Posted by Alan Sanders at June 3rd, 2009 at 9:02 am
  3. Alan: Yes, you can now filter your searches chronologically. Log into your Google account and search for something. At the top of the page, there is a “Show Options” link. Click that. It opens a new view of search results, where you can view by time frame — andreas

    Posted by andreas ramos at June 3rd, 2009 at 11:08 am
  4. Re: last two comments. It’s a start. But the results are often nonsensical. The date of the information (when the page was created) isn’t explicitly available and instead any date contained within the page (such as Henry Ford’s birthday) may cause an article on the Ford companies latest car to be placed back in the 1800s.

    Posted by Mac Beach at June 3rd, 2009 at 2:54 pm
  5. PS: This looks like a good series.

    Posted by Mac Beach at June 3rd, 2009 at 2:56 pm
  6. Was that old training manual a real one? I’ve read it before, it makes not much sense.

    Posted by Marcis Gasuns at June 5th, 2009 at 7:00 pm

Add a Comment

You must be logged in to post a comment. Sign up here or log in below.

Comments posted on this site must be signed with your full, real name. Please see our Comments policy for details.

Latest Digital Daily Videos

More Videos »

About John

John Paczkowski has been poking fun at the tech industry and the personalities that drive it since 1997. From 1999 to 2007, he wrote the award-winning tech news Web log Good Morning Silicon Valley for the San Jose Mercury News, Silicon Valley's daily newspaper.

Read more »

Ethics Statement

Here is a statement of my ethics and coverage policies. It is more than most of you want to know, but, in the age of suspicion of the media, I am laying it all out.

Read more »

alt.misc

Older at alt.misc »