Principles of Computer Information Retrieval
Source: Shangpin China |
Type: website encyclopedia |
Time: 2015-06-19
According to the definition of computer retrieval, we know that storage and retrieval are the two cores of information retrieval. Therefore, computer information retrieval Website construction It can be expressed as follows: with the goal of full exchange and effective use of information, based on the collection of a large number of scattered information, indexing personnel form a literature or information database with literature or information description, extract or select the signs used to express literature, information characteristics and main content, and organize them in an orderly manner, Build a variety of retrieval systems. On the basis of unified storage and retrieval process, compare the identity of the user expressing the retrieval words with the identity of the document or information content and formal features in the location index system. If the identity of both sides is consistent, the document or information technology requirements with these identities will be output from the retrieval system. During the user's information retrieval process, The documents output by the retrieval system may be the final information required by the user, or may be used The user can further obtain the required information of the final literature according to the index of this information. Computer information retrieval includes two processes: information storage and information retrieval. The computer information storage process refers to the process of analyzing the subject concept of the collected original literature, extracting subject words, classification numbers and other features of the literature according to a certain retrieval language to identify or write a summary of the content of the literature, Then, these pre processed data are input into the computer in a certain format for storage. The computer processes the data under the control of program instructions to form a machine readable database, which is stored on a storage medium (such as tape, disk or optical disk) to complete the processing and storage process of information.
The process of computer information retrieval refers to the user's analysis of the retrieval broadcast and clarification of the retrieval scope. Make clear the concept of subject, form the search identification and search strategy, and enter the computer for search. The computer converts the retrieval strategy into a series of questions according to the user's requirements, and performs off grade logical operations under the control of a special program. Select the information that meets the requirements and output it. In fact, the process of computer retrieval is a process of comparison and matching. It is a matching process in which computers replace manual retrieval. On the one hand, the computer receives retrieval questions (i.e. retrieval question expressions), and on the other hand, it receives literature records from the database. Then the matching operation is carried out between the two, that is, computer information retrieval transforms the relevance retrieval between retrieval questions and document records into the similarity calculation between retrieval words and indexing words. At present, general computer information retrieval systems can use the following methods to perform similarity operations between retrieval words and indexing words.
(1) The comparison of a single whole word. For example, the search term is psychoan, and the index word is also psyeh, and if they are congruent, it is a hit. Congruent is a special form of similarity. For some words containing numerical values, such as the publication year, abstract number, etc., operations greater than (>) and less than (<) can also be performed
(2) The comparison of word fragments (mainly root words). For example, if the search word is Pscho @ represented by the truncation symbol (such as @), all words with the same beginning and before the truncation symbol will be hit, such as Psych. Shan! This is called truncation retrieval. The truncation includes right truncation, left truncation, left and right truncation, and middle mask. The number of characters allowed after the truncation character can be unlimited or limited to between 1000 characters. It can be divided into two modes: finite truncation and infinite truncation. Regardless of Ding Zhong, its basic principle is to carry out piecemeal comparison of words. This kind of comparison does not require that the retrieval words and indexing words are equal, but partially equal or similar. Therefore, this level of comparison is a typical similarity operation.
(3) The comparison of fixed phrases. For example, the search term is Libra Qing and bformation science, and the index word is LibraJ, and Info, ation science. This is a phrase composed of multiple whole words. But the comparison is still a congruent operation
(4) Comparison of positional logic between multiple whole words. That is, you can specify two whole words, which can be separated by at most a few words (that is, you can type in other words or ignore them). The order of occurrence of the two words can be specified as interchangeable or non interchangeable. For example, retrieving Information (2w) Re state eval can hit Infonm old lion store Langand R. Zhou cval. The operation of this series can be said to be a phrase comparison with shellfish activity, and this comparison allows a certain amount of similarity operations. It is called adjacency degree retrieval.
(5) Comparison of definite logical combination composed of several independent search words or phrases. Its goal is not to retrieve individual search words or phrases, but to combine and mutually restrict the complete combination of words (phrases) in the sense of recovery. For example "Information" and "network" are two separate words, and "information network" is a form of logical combination of these two words. "Information network" is not a mixture of "information" and "network", but a combination of two meanings. It is the combination of two concepts that combines a new concept.
Source Statement: This article is original or edited by Shangpin China's editors. If it needs to be reproduced, please indicate that it is from Shangpin China. The above contents (including pictures and words) are from the Internet. If there is any infringement, please contact us in time (010-60259772).