Web Design
Mobile Internet
Brand Design
Innovative
News
Encyclopedias

How Search Engines Work

Date:2013-09-01 Source: Shangpin China Type: website encyclopedia
Word Size: small   medium   big

In the process of implementing the search engine strategy, enterprises need to do something targeted according to the search engine's retrieval principles and ranking rules SEO website optimization And finally achieve the goal of ranking the search results higher and attracting users to click on the website. Therefore, understanding the working principles of current mainstream search engines is an important basis for enterprises to improve their search engine strategies, and has extremely important practical significance.

What is a search engine

Search Engines is a system that collects and sorts information resources on the Internet and then provides users with queries. It includes three parts: information collection, information sorting and user queries. The main task of the search engine is to search for information on other websites, classify the information and build an index, and then put the index content into the database. When a user submits a search request to the search engine, the search engine will find matching information from the database and feed it back to the user. The user will visit the corresponding website according to the information, In order to find the information you need.

Search engine classification

According to the different ways of data collection, search engines are mainly divided into three categories: directory index search engines, full-text search engines and meta search engines.

1. Directory index search engine

The data in the directory index search engine (Search Index/Directory) is submitted by each website itself. It is like a phone number book. According to the nature of each website, its URLs are grouped together. The main category is followed by a small category, which goes all the way to the detailed address of each website. Generally, it also provides the content introduction of each website. Users can also query without using keywords. As long as they find the relevant directory, they can find the relevant website (note: it is the relevant website, not the content of a page on this website). This kind of search engine often also provides keyword query function, but when querying, it can only query according to the name, URL, profile and other contents of the website, so its query results are only the URL address of the website, and cannot find specific pages. Because the data of this kind of search engine is generally provided by the website, its search results are not completely accurate and are not strictly a search engine.

2. Full text search engine

Full Text Search Engine, a kind of search engine, automatically extracts information from various websites on the network through a program called "spider" to build its own database and provide users with query services. It is a real search engine. Such as AlaVista, Google, Excite, Hothat, Lycos, etc.

The data sources in the full-text search engine database are divided into two types: one is regular search, that is, the search engine actively sends "spider" programs every other period of time to search Internet websites within a certain IP address range. Once new websites are found, they will automatically extract website information and website addresses to add to their own database. The second is the information submitted by the website, that is, the website owner actively submits the address to the search engine, and the search engine will send a "spider" program to search the relevant information of the submitted website within a certain period of time, and store it in the person's own database. In general, these data are the specific contents of the web pages searched by the "spider" program, and the search results can also be accurate to the specific web pages.

In fact, today's search engines and directory indexes have begun to integrate with each other, and full-text search engines also provide directory index services. For example, Yahoo, a directory index, began to cooperate with Google and other search engines in the late 1990s to provide full-text search services.

3. Meta search engine

When the META Search Engine accepts a user's query request, it searches on multiple other engines at the same time and returns the results to the user. The famous metasearch engines include InfoSpace, Dogfile, Vivisimo, etc. (list of metasearch engines), and the representative Chinese metasearch engines include the star search engine. In terms of search results arrangement, some search results are arranged directly according to the source engine, such as Dogfile, while others are rearranged and combined according to their own rules, such as Vivisimo.

How Search Engines Work

From the perspective of working principle, all existing search engines are not really searching the Internet, and their search scope is actually limited to the pre arranged web page index database. According to data, the number of pages that can be retrieved by Google, which ranks first in the global search engine, does not exceed 4% of the total number of pages on the global Internet. Therefore, users can be reminded of two problems in the process of using the search engine: first. The scope of the web pages retrieved is limited. If they are not found through the search engine, it does not mean that there is really no web page on the Internet; Second, enterprises Website production It is not necessarily included in search engines.

1. Full text search engine

The real search engine usually refers to a full-text search engine that collects tens of millions to billions of web pages on the Internet, indexes every word (i.e. keyword) in the web pages, and establishes an index database. When a user searches for a keyword, all pages containing the keyword in the page content will be searched as search results. After being sorted by a complex algorithm, these results will be ranked in order of relevance to the search key.

Nowadays, search engines have generally used hyperlink analysis technology. In addition to analyzing the content of the index page itself, they also analyze and index all the URLs, AnchorText, and even the text around the link that points to the page. Therefore, sometimes, even if there is no same page A, such as "online marketing", if another page B points to this page A with the link "online marketing", users can also find page A when searching for "online marketing". Moreover, if more web pages (C, D, E, F...) point to this web page A with a link named "online marketing", or the better the source web page (B, C, D, E, F...) giving this link, then web page A will also be considered more relevant when users search for "online marketing", and will be ranked higher.

The working principle of full-text search engine is divided into three steps: Grab web pages from the Internet; Establish index database; Search the index database for sorting.

(1) Grab web pages from the Internet

The Spider system program, which can automatically collect web pages from the Internet, can automatically access the Internet. The program can follow all hyperlinks in any web page to crawl to other pages and repeat the process, eventually collecting all the pages it has crawled back.

(2) Build index database

The analysis index system program analyzes the collected web pages, extracts relevant web page information (including the URL of the web page, the encoding type, the keywords contained in the page content, the location of the keywords, the generation time, size, and the link relationship with other web pages, etc.), and performs a large number of complex calculations according to a certain correlation algorithm, Get the relevance (or importance) of each page for each key link in the page content and hyperlinks, and then use these relevant information to build a page index database.

(3) Search for sorting in the index database

When the user enters a keyword to search, the search system program will find all relevant pages matching the keyword from the page index database. Because all relevant pages have already calculated the relevance of the key link, you only need to sort them according to the ready-made relevance values. The higher the relevance, the higher the ranking. Finally, the page generation system organizes the link address and page content summary of the search results and returns them to the user.

Spiders of search engines generally revisit all web pages on a regular basis (the cycle of each search engine is different, which may be several days, weeks or months, or different update frequencies for pages of different importance), update the web page index database to reflect the update of web page content, add new web page information, and remove dead links, And reorder according to the changes in the content of the page and the link relationship. In this way, the specific content and changes of the web page will be reflected in the user's query results.

Although there is only one Internet, the capabilities and preferences of each search engine are different, so the pages captured are different, and the sorting algorithms are also different. The database of large search engines stores hundreds of millions to billions of web page indexes on the Internet, and the amount of data reaches thousands of megabytes or even tens of thousands of megabytes. But even if the largest search engine builds an index database of more than 2 billion pages, it can only account for less than 40% of the ordinary pages on the Internet, and the overlap rate of page data between different search engines is generally less than 70%. The important reason for using different search engines is that they can search different content separately. There is more content on the Internet, which cannot be indexed by search engines or searched by search engines.

2. Directory index search engine

Directory index, as its name implies, is to store websites in the corresponding directory by category. Therefore, when querying information, users can select keyword search or search by directory index layer by layer. For example, if you search with keywords, the returned results are the same as those of search engines. They also rank websites according to the degree of information relevance, but there are more human factors. If you search by hierarchical directory, the ranking of websites in a directory is determined by the order of the title letters (there are exceptions).

Compared with full-text search engines, directory index search engines work differently in the following ways:

First of all, the search engine is an automatic website retrieval, while the directory index is completely dependent on manual operation. After the user submits the website, the directory editor will visit your website in person, and then decide whether to accept your website according to a set of self determined criteria and even the subjective impression of the editor.

Secondly, when the search engine collects websites, as long as the website itself does not violate the relevant rules, it can generally log in successfully. The directory index has much higher requirements for the website, and sometimes even if you log in many times, it may not be successful. Especially for super indexes like Yahoo, login is more difficult.

Thirdly, when logging into the search engine, the classification of the website is generally not considered. When logging into the directory index, the website must be placed in the most appropriate directory (Directory).

Finally, the relevant information of each website in the search engine is automatically extracted from the user's webpage, so from the user's perspective, it has more autonomy; The directory index requires that the website information must be filled in manually, and there are various restrictions. What's more, if the staff thinks that the directory and website information you submit are inappropriate, they can adjust them at any time, of course, they will not discuss with you in advance.

At present, there is a trend of integration and penetration between search engines and directory indexes. Some original pure full-text search engines now also provide directory search. For example, Google borrows the Open Directory directory to provide classified queries. The old directory indexes like Yahoo expand their search scope through cooperation with search engines such as Google. In the default search mode, some directory search engines first return the matching websites in their own directories, such as Sohu, Sina, NetEase, etc; Others default to web search, such as Yahoo.
This is published on UEO Marketing website construction Company Shangpin China //ihucc.com/


Please contact our consultant

+86 10-60259772

Please provide your contact number. The project manager of shangpin China will contact you as soon as possible.