Introduction to search engines

Introduction to search engines

How search engines work

Search engines have a short list of critical operations that allows them to provide relevant web results when searchers use their system to find information:

Crawling the Web - Search engines run automated programs called 'bots' or 'spiders' that use the hyperlink structure of the web to 'crawl' the pages that make up the World Wide Web. It is estimated that of the approximately 20 billion existing pages, search engines have crawled between 8 and 10 billion. If your pages can't be crawled or don't get crawled for any reason, you simply won't be found.

Indexing documents - once a page has been crawled, its contents can be indexed. This is the process of storing pages in huge databases that make up a search engine's index. These indexes are carefully managed so that queries can be processed in fractions of a second.

Processing queries - when a request for information comes into the search engine the index is queried for suitable matches. A match is determined if the terms or phrase is found on the page in the manner specified by the user. For example, a search for car and driver magazine at Google returns 8.25 million results, but a search for the same phrase in quotes "car and driver magazine" returns only 166 thousand results. In the first example, called 'Findall' mode, Google returned all documents which had the terms 'car' 'driver' and 'magazine' (the term 'and' will have been ignored as it doesn't narrow the results), while in the second search only those pages with the exact phrase 'car and driver magazine' were returned. Other advanced options are available which will impact the search engine results for any given query.

Ranking results - once the search engine has determined which results are a match for the query the search engine's algorithm (the complex mathematical formula used for ranking results) runs calculations on each of the matches to determine which is most relevant to the given query. They sort these on the results pages in order from most to least relevant using a number of 'ranking factors'.

Ranking factors

As the description above suggests, SEO revolves around understanding ranking factors and ensuring these are all pulling in the right direction. A note of caution though - the precise emphasis search engines place on particular ranking factors at any particular time are unknown, vary from engine to engine and are likely to change without notice. This is why we consider a holistic approach, considering as many factors as possible, as vital. Ranking factors fall into two main groups:

Page factors - search engine algorithms rely heavily on factors that have nothing to do with the query entered by the user. These factors include the strength of links on the page, the number of visits to the page the depth of the page within a site's structure, and many more. These factors boost a page in the rankings for searches on any word that occurs on that page.

Query factors - as you might expect, the actual query the user enters weighs heavily in the matches returned. These factors include keywords found on the page, the number of times they appear, the context within which the keywords are found, and so on.

Indications of things to come

As already intimated, search engines are constantly evolving their capabilities and the way they operate. The drivers for this are competitive - in the first instance to stay ahead of the pack in offering the best possible service, in the second to combat SEO practitioners using 'black hat' techniques to sway search results. To date, search engines have evolved from relying on metatags stuffed with keywords and keyword rich pages (pre 1998), to increasing consideration of the page factors mentioned above.

The latest innovations revolve around semantics (the science of language) and understanding a user's intent when making a search. Rather than simply recognising and retrieving exact matches for query terms, search engines use their knowledge of semantics to carry out intelligent matching. An example might be a search for 'car loans' that also returns results that did not contain the specific phrase, but contains the term 'lenders'. Search engines' growing artificial intelligence on the subject of language means that queries will increasingly return more intelligent, evolved results. Natural Language Processing (NLP) will help achieve greater understanding of the meaning and intent behind their users' queries. Over the long term, users can expect the results of this work to produce increased relevancy in the SERPs (Search Engine Results Pages) and more accurate guesses from the engines as to the intent of a user's query.