Wednesday, March 01, 2006

** The Search - How Google rewrote the rules of business and transformed our culture by John Battelle

** The Search - How Google rewrote the rules of business and transformed our culture by John Battelle

I recently interviewed with Google for a position, and declined their offer for a position. The details of that episode are worthy of a separate blog entry. During my deliberation over the offer I carefully studied this book to learn about the unique nature of Google. If you're curious about the search industry and Google in particular, you'll find this very interesting.

As the amount of information available to us explodes, search has become the user's interface metaphor. There is now all this information that is possible to get into your hands. Search is our attempt to make sense of it. - Raymie Stata, SV Engineer p4

The Database of Intentions is simply this: the aggregate results of every search ever entered, every result ever tendered, and every path taken as a result… Taken together, this information represents real-time history of post-Web culture - a massive clickstream DB of desires, needs, wants that can be discovered, subpoenaed, archived, and exploited for all sorts of ends. p6

We're willing to trade some of our privacy for convenience, service, and power… From a consumer's point of view, there are also very simple and compelling reasons for this shift: services like email, search, and recommendation networks make our lives easier, faster, and more convenient. p12

As we move our data to the servers of Microsoft (Hotmail), Yahoo, and Google (Gmail) we are making an implicit bargain… That bargain is this: we trust you to not do evil things with our information. We trust that you will keep it secure, free from unlawful govt or private search and seizure, and under control at all times. We understand that you might use our data in aggregate to provide us better and more useful services, but we trust you will not identify individuals… But imagine disorientation you might feel if search becomes self-aware - capable of watching you as you interact with it. p15

What does the world want? Build a company that answers this question in all of its shades of meaning, and you've unlocked the most intractable riddle of marketing and arguably of human culture itself. And over the past few years, Google seems to have built just that company. p17

How might a search engine understand concepts like a biography rather than just key words? One way is by the use of cue words that tip the engine off to the context of a particular search… A good query engine will link this cue word to clusters of results that have a chance of fulfilling the concept of biography - pages that have been tagged as biographical. Adding that new metadata often dramatically improves results. (Other examples are stock quotes, movie reviews, weather reports, etc.) p23

Search engines are still working on the problem of how to best match searches for soda with results for pop, tennis shoes with sneakers, or feline with cat. p24

Search has the potential to get better and better the more people use it. A good example is the spell checker found at Google - its suggestions are culled from watching vast numbers of misspellings and correlating them to the properly spelled word. p24

There are 3 pieces of search, and all 3 must scale to the size and continued growth of the Web: they must crawl (spider), they must index (DB), they must serve results (query). Google alone has 175,000 servers dedicated to this job. That's more than existed on Earth in the 1970s. p24

We type in a few words at most, then expect the engine to bring back the perfect results. More than 95% of us never use the advanced search features, and most search experts agree that the chances of ever getting that number lower are slim to none. p25

Pew Internet & American Life Project Results (2004): p25
85% of all American internet users, 2/3 of these are active (twice a week). This is 31% of the US population.
The younger you are or the higher your educational attainment, the more you search.
Avg number of searches per visit is 5.

Long tail: Google claims that 50% of the searches on a given day - 100M - are unique.
Piper Jaffray says that 20% of searches are for entertainment, 15% are commercial, 65% are informational. As much as 25% are local (with mostly commercial interests it is suspected). p28

Approx. Customer Acquisition Cost – Piper Jaffray p35
Direct Mail - $70
Email - $60
Online Ad Display - $50
Yellow Pages - $20
Search - $8.50

The yellow pages is a $14B business in the US ripe for the picking… Their bet: that soon local dentists, restaurants, dry cleaners, etc. might best spend their $500 on search rather than the yellow pages. P36

Google alone boasts more than 225,000 unique advertiser relationships. Try that with network television. P35

Behavioral targeting seeks to track your search and browsing history and display advertisements that might be contextually relevant based upon your online behaviors… If for example you seem to be looking for Lincoln quite a bit lately and tend to click not on results related to the former president, but rather on the the automobile, 2nd generation engines will display ads for cars. P37

Pagerank is named after Larry Page, not the ranking of web pages per se. p75

Google often glosses over the fact, but the truth is that the company lacked a viable plan for making money until early 2001… “We really couldn’t figure out a business model” Michael Moritz (Sequoia). P92

Bill Gross [founder of which became Overture and was purchased by Yahoo for $1.6B] was convinced that any approach to search driven algorithms would ultimately be outsmarted by spammers… So Gross turned to his original idea: to kill spam, one must add the friction of money to the equation…. Gross’s core insight, the one that now drives the entire search economy, is that the search term is inherently valuable – it can be priced. ‘I realized that when someone types in Princess Diana, they want in effect to go to a Princess Diana store – where all possible items, information , and goods about Princess Diana are laid out for them to see’. P106

In the fall of 2004, Gross has a new breed search engine (SNAP) that ranks sites by factors such as how many times they have been clicked on by prior searchers… Advertisers can sign up to pay only when a customer converts – in other words, when the customer actually buys a product or performs a specific action deemed valuable… “The relevance is going down on Google, mainly because of gaming… I think I have a search engine spam solution.” P121

Google developed its own OS (on Linux), and even customized and patented its approach to designing, cooling and stacking its components… This approach would become a core defensible asset… Google’s other major asset – the PageRank patent – is owned by Stanford, but licensed exclusively to Google until 2011. p130

Instead of top-down projects, Brin and Page created a more dynamic structure in which small teams of engineers tackled 100s of projects all at once. Brin, Page and other Sr Mgrs reviewed each project on a regular basis and the best projects received further resources. A top 100 list was soon developed… The company launched Google Labs where new projects – the best of the top 100 – could have an early public view… This approach to mgmt was generally well liked inside the company, but it also rankled quite a few employees. ‘It became a very political place’ says a former Sr Mgr ‘Nobody had the authority to do anything w/o Larry and Sergey’s approval.’p141

In February 2002, the company launched a new version of AdWords that included auction and pay per click (CPC), but with this service advertisers – unlike Overture – couldn’t just buy their way to the top listing. Instead, Google incorporated the notion of an ad’s popularity – its click through rate – into the its overall ranking... Thus a lower paying ad, with a higer CPC would float to the top… It only makes economic sense to give the lower price the top spot – because it makes Google, which gets a percentage of every click, more money. P142

Google has one of the largest datacenters in the world, and one of the largest collections of bandwidth in the world. I get to ask ‘What would you like to do with it? What are the technological possibilities of that platform?’ Eric Smidt p144

Google provides free meals (breakfast, lunch, dinner), snacks, beverages, massages, dentist and doctors on site, free drycleaning, lavish parties and outings like an annual company ski trip. While such a display might have motivated some engineers to apply for jobs at Google, chance are it alienated a few as well. After all, Google didn’t invent the freewheeling geek culture it espoused – it was simply the only company that was capable of affording it. P146

Google is going to have a major fall in the next couple of years, echoed a well known VC in late 2003. ‘They’ve pissed off too many people.’ P147

1000s of resumes streamed into Google each week… Legions of talented geeks never got so much as an acknowledgement… 100s got interviews, but were never hired, and many of those felt spurned by a fickle and mysterious process that no one seemed capable of explaining. When 100s of smart people feel poorly treated, the negative buzz starts to build. In late 2004, Brin acknowledges ‘Its something we have to fix’. P147 They haven’t fixed it yet, and its early 2006.

Many senior execs at Google operate with an alienating and unnecessary secrecy and isolation. P149

Pagerank rewarded sites with high ranking inbound links and relevant anchor text, so spammers began to create link farms and doorway pages – essentially pages that did nothing more than link to other pages – so as to trick Google’s index into assigning their pages a higher ranking… Google retaliated with more sophisticated algorithms, and the spammers counterstruck… Google banned certain IP addresses and spammers simply setup new ones. P161

Standing at around $15B [the yellow pages are huge] in the US alone…Within one generation, the yellow pages will be viewed as a dead industry. P175

The same goes for the classified industry, which also stands at around $15B in the US. P176

Purveyors of click fraud take advantage of syndicated nature of Google’s, Yahoo’s advertising networks. They sign up as Google AdSense publishers… But instead of running real content, they run only AdSense ads on their sites… Then they run robots (or low wage workers) mechanically clicking on every single ad, earning a cut for themselves and a cut for Google. The unwary advertiser pays the freight… Many advertisers claim that up to 25 to 30% of their budgets is lost to click fraud. P187

If Google has your email address, it could potentially tie your IP address to your identity, creating an opening for all sorts of potential privacy abuses. Theoretically anyway, Google could now track your entire web usage, not just your email. P195

Google’s privacy policy allows the company to review your personal information: “We may share private information if we conclude that we are required by law or have a good faith belief that access, preservation or disclosure of such information is reasonably necessary to protect the rights, property or safety of Google, its users or the public.” P203

Google has never known anything but success. The only thing Google has failed to do, so far, is fail. P236

You can sum up Google’s ambitions in the commercial world as this: the company would like to provide a platform that mediates supply and demand for pretty much the entire world economy… In a perfect market, where demand is simply one computable bit of information and supply another, matching the two is an extremely lucrative business… from Eric Smidt “Google’s addressable market, if you include the large and small companies throughout the world is the world’s gross domestic product.” P248

The search engine of the future isn’t really a search engine… It’s more like an intelligent agent or as Larry Page says – a reference librarian with complete mastery of the entire corpus of human knowledge. P252

In 2002 Paul Ford wrote a piece called “August 2009: How Google beat Ebay and Amazon to the semantic web.” Read it here:

The consensus view is that search is in the early days. The really hard problems – natural language queries – have yet to be solved… Search technology still has no idea what a document actually means – in the human sense. P268

Using IBM’s WebFountain, a customer can posit a theoretical query such as this: ‘Give me all the documents on the web that have at least 1 page of content in Arabic, are located in the Midwest, and are connected to at least 2 similar documents but are not connected to the official Al Jazeera web site, and mention anyone on a specified list of suspected terrorists.’ Not the kind of query you’d punch into Google. P270

So how does WebFountain make answers to such complex queries possible? Short answer: with lots of hardware. Longer answer: Webfountain does more than index the Web (like Google). WF goes several steps beyond Google by classifying those web pages across any number of semantic categories. WF basically restructures the web for each client query… WF customers can create entirely new tagging scheme, and IBM can crank out the entire database – that’d be the entire web – through those custom filters on the fly [more like 24 hours]. P271

Moore’s law has not caught up to the computing demands of WF. All of that annotation takes a lot of cycles and a lot of software, and the whole process must happen in a particular order. You can’t throw more linux boxes at the problem the way Google does. Imagine if Google had to re-index the whole web for each new searcher… WF is your classic supercomputer application. It runs on one of the the world’s top 50 of all supercomputers on earth. P272

A UCB study reported that humankind created 5 exabytes of stored data [print, film, optical media, etc.] in paper, the equivalent of 500,000 Libraries of Congress each year. P276

The web has no memory… But at some point in the not to distant future we’ll have live continuous historical copies of the web that will be searchable – creating if you will a time axis for the web… In our lifetimes we’ll see our cultural digital memory become contiguous, available, always there… You could ask questions like ‘Tell me what wer the most popular results for GW Bush on 5/3/04?’ or ‘Show me every reference to my great grandfather during 2016.’ P277

In the spring of 2001, Yahoo felt that it should own search, and buying the wildly popular but revenue deficient Google seemed a perfect way to do it. But there was no chemistry between Terry [Semel of CEO of Yahoo] and Larry & Sergey… Yahoo ultimately purchased Inktomi, AltaVista and Overture. P291

Page and Brin acknowledged that Google’s approach to mgmt has caused strains for some employees, and in 2004 they began to add add’l layers of mgmt. P294

But as of mid-2005, Google still had a long way to go in this dept, according to several midsize advertisers who spent from $50k to $150k Google rarely answered the phone and responded slowly if at all to their complaints of click fraud. P295

While Google does have an extraordinary infrastructure, it is not limitless. In early 2005, it introduced a beta product called Web accelerator, which used Google’s own servers as proxies for the internet… A source at the company who is close to the program said ‘We ran out of bandwidth. It’s as simple at that.’ [Akamai beware!] p 298