Retrieval using www search engine
Source: Shangpin China |
Type: website encyclopedia |
Time: July 7, 2014
WWW, also known as the Web or World Wide Web, is an information transmission network based on hypertext developed and named by Tim Bemers Lee, an American scientist at the European Institute of Particle Physics in 1989, turning a new page on the Internet. Tin Bemcrs-L "is therefore known as the father of the World Wide Web and won the world's first" Millennium Technology Award ". WWW and Internet are not the same concept, but one of the service functions provided by the Internet. WWW makes network users no longer face boring machine instructions. The exponential growth of text, images and various multimedia information on the network can be accessed intuitively and conveniently through browsers and hyperlinks. In addition, the WWW search engine is also a powerful tool that must be used to collect information that users are interested in on the Internet. Search engine is a platform providing information search services on the Internet, and is the most widely used network service tool. The search guide we usually use now is basically run in WWW, so it can also be called WWW search guide. With the increasing penetration of network information into the lives of ordinary people, search and climbing has become a popular and key technology on the Internet, and the competition of research and development has never stopped. The reason why we can click "hundreds of millions of Internet page information" is that we rely on thousands of searches in the Internet to find, grab, store, index and provide network information retrieval services. They are moving towards specialization, localization and life orientation. WWW search engines can be divided into three types according to their different operation modes: directory websites, full-text search engines and meta search engines.
1、 Directory Site
daily record of events Website construction It is an early WWW information search tool. Its working mode is to collect and sort network information manually, and present and browse it in the form of classified topics. Due to the high labor cost, the technical content is relatively low. In essence, it is not a real search guide, so it has not been paid attention to so far. Almost all directory websites have developed their own independent new generation of search guide, which has evolved into common keyword search forms, such as Sina, Sohu, and Yahoo China. At present, it is difficult to find the traces of the original directory browsing style, Only a few still retain the characteristics of the original website classification search. The most famous website directory is Yahoo's Chinese website directory, which is sorted from time of appearance to Sohu, Netease, Sina, etc., and LookSmart abroad About, etc. The directory website has the following characteristics. ① Browsing network information based on tree directory is simple and easy to use. The information tour organized by the tree directory structure has strict systematicness and good expandability. The monthly record has added human intelligence, shielded the complexity of the network senior system relative to users, and can improve the accuracy of information and navigation quality. ② The resource classification directory is not detailed enough. The complexity of network information resources makes it difficult to determine a comprehensive category system to cover all network information resources as the basis of the theme tree structure. In order to ensure the usability of the topic and the clarity of the structure, the category system should not have too many categories, which makes some special categories nowhere to be found on the one hand, and a large number of Web pages are ignored because they are not included in the directory on the other hand. With the growth of the Web, this problem will become more and more serious. Clustering or other automatic classification methods (including natural language processing, correlation top extraction, etc.) are still unsatisfactory. And there will also be the problem that the results of automatic classification of the machine are different from those of manual classification. ⑧ Because of manual intervention, large amount of maintenance, relatively little information and untimely information update, this directory website often sends queries to other search engines to search the entire Web in order to make users get more information. Today's directory websites and full-text search engines are integrated, and users can hardly distinguish between them. For example, Yahoo used Google's search guide to provide page search, while Google used the "OpenDirectory" directory to provide classified queries, and the search interface is almost the same. 2、 Full text search index
Full text search engine is called a real search engine. Its difference from website directory is that it no longer uses manual information search and classification, but uses software programs to collect, index, and retrieve network information. The structure of full-text search index consists of four parts. (1) Searcher. Searcher or network robot. It is a kind of network automatic search software, usually called "spider", crawler or robots. The only job of "Spider" is to roam the Web to find and collect information. It can "crawl" about 10 million pages every day and collect new information of various types as soon as possible. At the same time, because the information of the Web is updated very quickly, the old information that has been collected should be updated regularly to avoid dead links and invalid links. There are two strategies for collecting information. First, start with a set of URLs (resource locators), follow the hyperlinks in these URLs, and recursively extract information from the Web in a width first or depth first manner. These starting URLs are often very popular sites with many links, such as Yahoo's classification nodes; Second, the "Add URL" column can be set to allow web information authors to actively provide web addresses to search engines, but this method is often bombarded by spam pages, and almost 95% of the web addresses submitted by adding the URL column are rejected. Different search information strategies used by search engines, such as search frequency and search objects, will lead to differences in the search results and quality of each search engine. (2) Indexer. Indexers or indexers. Its function is to analyze the information collected by the collector, carry out automatic indexing, and represent the document as a form convenient for retrieval and store it in the index library, that is, to establish inverted documents. Each index item in the inverted document contains a set of pointers to the page where it appears. In order to provide the user with information about the checked out document, the index also contains a simple description of each page, such as the generation date, size, title, subtitle and summary. (3) Retriever. The function of the searcher, or retrieval software, is to quickly retrieve relevant documents in the index library according to the user's query, evaluate the relevance of documents and queries, sort the results to be output, and realize a user related feedback mechanism (that is, it can constantly revise the retrieval strategy). The searcher is regarded as the most complex part of the search engine, which contains important questions about the ranking of search results. Researchers found that users cannot patiently browse tens of thousands of search results, but only pay attention to the first few pages of search results. The simple sorting method based on click through rate and word frequency is obviously flawed. 3、 Meta search
Meta search engines are also called multi search engines. These search engines do not have their own massive databases, but submit users' query requests to multiple search engines at the same time, sort the returned results, and then return the results to users. According to its search mechanism, it can be divided into parallel and serial. The parallel meta search index refers to sending the query request time to each independent search index. The results are then provided to the user in a specific order. Serial meta search indexing is to send the query request to an independent search engine first, and then send the request to another search index after it returns the results.
Source Statement: This article is original or edited by Shangpin China's editors. If it needs to be reproduced, please indicate that it is from Shangpin China. The above contents (including pictures and words) are from the Internet. If there is any infringement, please contact us in time (010-60259772).