Have you ever wondered how a search engine works? Then the answer is here. A search engine is a software program or script in the internet which helps the user to search documents, files, links, etc. based on keywords searched.
Popular examples of search engines are Google, Yahoo!, Baidu, Bing, AOL, etc. Coming to the brief working of Search engines, they use an automated application called robots/boots/spiders to crawl the web, following every link in a website and then building a list of searchable index.
How Search Engine Works
A typical search engine operates in the following order:
- Web crawling
Step 1: Web Crawling
Web crawling is the activity done by a specialised program called spider to build a list of words `found in a website. It visits every page of a particular website word by word and to build a searchable index the spider needs to visit a lot of webpages.
The spider first crawls very popular websites and webpages, indexing the words in the pages and every link found in the page. Then it starts crawling the other pages. The words in the title, subtitle, meta tags and other places which are relevant to the content of the page are recorded and indexed for future searches by the user.
Spiders of the different search engines use different algorithms for their spiders in an attempt to make the spider work faster. We can take the example that Google spider leaves the articles “a,” “an” and “the” whereas AltaVista index every single word found on a page including the articles.
Step 2: Indexing
After the spider completes the task of crawling or finding information on web pages (actually the task is never completed due to the rapid expansion of the web and the spiders keep crawling the web pages), it generates a list of keywords and build a searchable index. Now the task is to make the index searchable so that user can search the same. The indexing is mainly responsible to maintain a list of URLs associated with each keyword in a searchable database.
But this simple approach is not sufficient for the search engines to decide which link or result to be displayed first or we can say there is no ranking of the results. So the index tries to weight each word depending on no of times it appears and its relevance to the webpage and then find a relative rank for a particular page called page rank. Every search engine uses different algorithms for assigning weight to the words in its index. That’s why we can find different search engines producing different results.
Once the index is build and stored in a database with all the information it is encoded to save storage space. As index has the main objective of finding information as quickly as possible most of the search engines build a hash table to store indexed information. The process uses a formula to attach a number to different words. Then it attaches a pointer from the number to the actual data. This mix of efficient indexing storage helps the user to get results quickly.
Step 3: Searching
Searching is done by the user, where the user build a search query and submit the same through the search engine. The user may build a query by just entering a single word or even combining different words or an entire sentence. To build a complex search user can use Boolean operators as AND, OR, NOT, NEAR, FLLOWED BY, TYPE, Quotation marks, or some keywords prescribed by different search engines to find exact results. These type of searches are called literal searches. The advanced search engines now a days also have the capability to process natural language queries. The best example of this can be the Google Voice Search and Image Search.
This is the simple mechanism of working of a search engine. However the processes are complex enough now to give the user a better search experience. For more information on search engine you can always build a search query like: How Search Engine work”.
Abhisek Panda is the co-founder of PractoMind Solutions and Dresm House Publications and is a Marketing and Media professional, Blogger, Social Media Specialist and Web Developer from Cuttack, Odisha. He has completed his PGDM (Marketing) from Institute of Management and Information Science, BBSR and Bachelors degree in Information Technology and Management (ITM) from Ravenshaw University, Cuttack. Abhisek has more than 3 years of experience in the field of technology, commercial operations, sales and marketing. He has also worked for more than 1 year as an Webmaster for www.Techulator.com and an editor and tutorial author at www.DotNetSpider.com. You can find out more about him at his blog abhisekpanda.com