MENU service case
 Website construction website design Beijing website construction high-end website production company Shangpin China
We create by embracing change
360 ° brand value__
simplified Chinese character
Simplified Chinese English

Shangpin China Joins Hands with Beisheng Internet to Create a New Chapter in Website Construction

Type: Shangpin Dynamic Learn more

Which Robots Meta tags are supported by the search engine

Source: Shangpin China | Type: website encyclopedia | Time: October 18, 2019

Search engines support nofollow and noarchive.

   Methods for prohibiting search engine inclusion

1. What is the robots.txt file? Search Engines Use the spider program to automatically access web pages on the Internet and obtain web page information. When a spider visits a website, it will first check whether there is a plain text file called robots.txt under the root domain of the website. This file is used to specify the scope of spider fetching on your website. You can create a robots.txt in your website, declare in the file the parts of the website that you do not want to be included by the search engine, or specify that the search engine only includes specific parts.

Please note that you need to use the robots.txt file only if your website contains content that you do not want to be included in the search engine. If you want the search engine to include all the content on the website, do not create a robots.txt file.

2. Where is the robots.txt file located? The robots. txt file should be placed in the root directory of the website. For example, when spider visits a website (such as //www.abc.com )First, check whether the website exists //www.abc.com/robots.txt If the Spider finds this file, it will determine the scope of its access permissions according to the contents of the file.

The corresponding robots.txt URL of the website URL

   //www.w3.org/ //www.w3.org/robots.txt

   //www.w3.org:80/ //www.w3.org:80/robots.txt

   //www.w3.org:1234/ //www.w3.org:1234/robots.txt

   //w3.org/ //w3.org/robots.txt

3. I set in robots.txt to prohibit the search engine from including the content of my website. Why does it still appear in the search engine and search results? If other websites link to pages banned from inclusion set in your robots.txt file, these pages may still appear in the search results of the search engine, but the content on your page will not be captured, indexed and displayed. The search engine and search results display only the descriptions of other websites on your relevant pages.

4. It is forbidden for search engines to track the links of web pages, but only to index web pages. If you do not want search engines to track the links on this page, and do not transfer the weight of links, please place this meta tag in the part of the web page:

If you don't want the search engine to follow a specific link, and the search engine also supports more precise control, please write this mark directly on a link: sign in

To allow other search engines to track, but only prevent search engines from following the links of your page, please put this meta tag in the part of the page:

5. It is prohibited for search engines to display snapshots of web pages in search results, but only to index web pages To prevent all search engines from displaying snapshots of your website, please place this meta tag in the part of the web page:

To allow other search engines to display snapshots, but only prevent search engines from following, use the following tags:

Note: This mark only prohibits the search engine from following the snapshot of the page. The search engine will continue to index the page and display the summary of the page in the search results.

6. I want to prohibit Baidu Image Search from including some images. How can I set it? To prevent the Baiduspider from grabbing all pictures on the website, or to prohibit or allow the Baiduspider to grab picture files in a specific format on the website, you can set robots. Please refer to examples 10, 11, and 12 in "Example of the Use of Robots.txt Files".

7. The format of the robots.txt file "robots.txt" file contains one or more records, which are separated by blank lines (CR, CR/NL, or NL as terminators). The format of each record is as follows: ":"

You can use # for annotation in this file. The specific usage method is the same as that in UNIX. The records in this file usually start with one or more user agent lines, followed by several Disallow and Allow lines. The details are as follows:

  User-agent:

The value of this item is used to describe the name of the search engine robot. In the "robots. txt" file, if there are multiple user agent records, multiple robots will be restricted by "robots. txt". For this file, there must be at least one user agent record. If the value of this item is set to *, it is valid for any robot. In the "robots. txt" file, there can only be one record such as "User agent: *". If "User agent: SomeBot" and several Disallow and Allow lines are added to the "robots. txt" file, the name "SomeBot" is only limited by the Disallow and Allow lines after "User agent: SomeBot".

  Disallow:

The value of this item is used to describe a group of URLs that do not want to be accessed. This value can be a complete path or a non empty prefix of the path. URLs starting with the value of Disallow item will not be accessed by the robot. For example, "Disallow:/help" prohibits the robot from accessing/help.html,/helpabc.html,/help/index.html, while "Disallow:/help/" allows the robot to access/help.html,/helpabc.html, and cannot access/help/index.html. "Disallow:" indicates that the robot is allowed to access all urls of the website. There must be at least one Disallow record in the "/robots. txt" file. If "/robots. txt" does not exist or is an empty file, the website is open to all search engine robots.

  Allow:

The value of this item is used to describe a group of URLs that you want to access. Similar to the Disallow item, this value can be a complete path or a prefix of the path. URLs starting with the value of the Allow item are allowed to be accessed by robots. For example, "Allow:/hibaidu" allows the robot to access/hibaidu.htm,/hibaiducom.html,/hibaidu/com.html. All URLs of a website are Allow by default, so Allow is usually used in conjunction with Disallow to allow access to some pages and prohibit access to all other URLs.

Use "*" and "$":

Baiduspider supports the use of wildcards "*" and "$" to fuzzy match urls.

"$" matches the line terminator.

"*" matches 0 or more arbitrary characters.

8. URL matching example Allow or Disallow value URL matching result

  /tmp /tmp yes

  /tmp /tmp.html yes

  /tmp /tmp/a.html yes

  /tmp /tmp no

  /tmp /tmphoho no

  /Hello* /Hello.html yes

  /He*lo /Hello,lolo yes

  /Heap*lo /Hello,lolo no

  html$ /tmpa.html yes

  /a.html$ /a.html yes

  htm$ /a.html no

9. Example of robots.txt file usage 1 Prohibit all search engines from accessing any part of the website

Download the robots.txt file User agent:*

  Disallow: /

Example 2. Allow all robots to access

(Or you can create an empty file "/robots. txt") User agent:*

  Allow: /

Example 3. Only the Baidu pider is prohibited from visiting your website User agent: Baidu pider

  Disallow: /

Example 4. Only allow Baduspider to visit your website User agent: Baduspider

  Allow: /

  User-agent: *

  Disallow: /

Example 5. Only allow Baidu pider and Google bot to access User agent: Baidu pider

  Allow: /

  User-agent: Googlebot

  Allow: /

  User-agent: *

  Disallow: /

Example 6. Forbid spiders from accessing specific directories

In this example, the website has three directories that restrict the access of the search engine, that is, the robot will not access these three directories. It should be noted that each directory must be declared separately, not written as "Disallow:/cgi bin//tmp/". User-agent: *

  Disallow: /cgi-bin/

  Disallow: /tmp/

  Disallow: /~joe/

Example 7. Allow access to some urls in a specific directory User agent:*

  Allow: /cgi-bin/see

  Allow: /tmp/hi

  Allow: /~joe/look

  Disallow: /cgi-bin/

  Disallow: /tmp/

  Disallow: /~joe/

Example 8. Use "*" to restrict access to urls

Access to all URLs (including subdirectories) under the/cgi bin/directory with the suffix ". htm" is prohibited. User-agent: *

  Disallow: /cgi-bin/*.htm

Example 9. Use "$" to restrict access to urls

Only URLs with the suffix ". htm" are allowed to be accessed. User-agent: *

  Allow: /*.htm$

  Disallow: /

Example 10. Forbidden to visit all dynamic pages in the website User agent:*

  Disallow: /*?*

Example 11. Baiduspider is prohibited from grabbing all pictures on the website

Only web pages are allowed to be crawled, and no pictures are allowed to be crawled. User-agent: Baiduspider

  Disallow: /*.jpg$

  Disallow: /*.jpeg$

  Disallow: /*.gif$

  Disallow: /*.png$

  Disallow: /*.bmp$

Example 12. Only Baiduspider is allowed to grab web pages and. gif images

User agent: Baiduspider

  Allow: /*.gif$

  Disallow: /*.jpg$

  Disallow: /*.jpeg$

  Disallow: /*.png$

  Disallow: /*.bmp$

Example 13. Only the Baduspider is prohibited from grabbing. jpg image User agent: Baduspider

  Disallow: /*.jpg$

Source Statement: This article is original or edited by Shangpin China's editors. If it needs to be reproduced, please indicate that it is from Shangpin China. The above contents (including pictures and words) are from the Internet. If there is any infringement, please contact us in time (010-60259772).
TAG label:

What if your website can increase the number of conversions and improve customer satisfaction?

Make an appointment with a professional consultant to communicate!

* Shangpin professional consultant will contact you as soon as possible

Disclaimer

Thank you very much for visiting our website. Please read all the terms of this statement carefully before you use this website.

1. Part of the content of this site comes from the network, and the copyright of some articles and pictures involved belongs to the original author. The reprint of this site is for everyone to learn and exchange, and should not be used for any commercial activities.

2. This website does not assume any form of loss or injury caused by users to themselves and others due to the use of these resources.

3. For issues not covered in this statement, please refer to relevant national laws and regulations. In case of conflict between this statement and national laws and regulations, the national laws and regulations shall prevail.

4. If it infringes your legitimate rights and interests, please contact us in time, and we will delete the relevant content at the first time!

Contact: 010-60259772
E-mail: [email protected]

Communicate with professional consultants now!

  • National Service Hotline

    400-700-4979

  • Beijing Service Hotline

    010-60259772

Please be assured to fill in the information protection
Online consultation

Disclaimer

Thank you very much for visiting our website. Please read all the terms of this statement carefully before you use this website.

1. Part of the content of this site comes from the network, and the copyright of some articles and pictures involved belongs to the original author. The reprint of this site is for everyone to learn and exchange, and should not be used for any commercial activities.

2. This website does not assume any form of loss or injury caused by users to themselves and others due to the use of these resources.

3. For issues not covered in this statement, please refer to relevant national laws and regulations. In case of conflict between this statement and national laws and regulations, the national laws and regulations shall prevail.

4. If it infringes your legitimate rights and interests, please contact us in time, and we will delete the relevant content at the first time!

Contact: 010-60259772
E-mail: [email protected]