Friday, September 16, 2016
So, you want to search for pictures of puppies. Easy, you just type into Google "pics of cute puppies" and then it gives you tons of results, showing you the cutest puppies you've ever seen in your life. But how does Google work?
Well, Google uses an algorithm to scan through information and find keywords. The programs that do this are called spiders (lol Go Spiders!) or crawlers. Search engines in general will use these spiders to create indexes of keywords. It will scan a page, then follow links to other pages with the same keywords and keep tracking the pages it finds to create an index. Indexes are built with a method called hashing, which is a formula that applies a numerical value to each word that is indexed. Creating this list of words is called Web Crawling. The spiders will start at more popular sites, then branch out from there to other links.
Search engines like Google will index hundreds of millions of pages a day in response to tens of millions of queries. What makes Google unique is that it ranks the results based on how many times keywords show up and how long the webpage has existed. At the beginning, Google's system used 3 spiders at once, each of which could keep 300 connections to web pages open at one time. Using 4 spiders, Google could go through 100 pages per second, generating 600 kilobytes of data each second. Incredible! Because content on the internet is always changing, the spiders are always crawling. Computers are rad!
References:
http://computer.howstuffworks.com/internet/basics/search-engine1.htm
http://computer.howstuffworks.com/internet/basics/google1.htm
Subscribe to:
Post Comments (Atom)


Computers are rad!! This is kind of similar to my blog post on IBM Watson! Watson scans all the information possible and provides results. So fun that the programs are called spiders!! Does it process everything in a similar way from maps to shopping? Thanks for the cool blog post!!
ReplyDelete