MENU service case
 Website construction website design Beijing website construction high-end website production company Shangpin China
We create by embracing change
360 ° brand value__
simplified Chinese character
Simplified Chinese English

Shangpin China Joins Hands with Beisheng Internet to Create a New Chapter in Website Construction

Type: Shangpin Dynamic Learn more

How about the technological development process and strength of well-known websites

Source: Shangpin China | Type: website encyclopedia | Time: May 31, 2012
Technological development history of well-known websites
Google currently ranks No. 1 in Alexa. It was born in 1997 as a research project. It builds an index every month. The index from the build is distributed to multiple servers (Index Servers) through sharding (shard by doc). Specific web page data is also distributed to multiple servers (Doc Servers) through sharding. When users submit a request, Submit the request to the Index Server through a server at the front end to obtain the inverted index scored, and then extract specific web page information from the Doc Server (such as web page title, search keyword matching fragment information, etc.), which is finally presented to the user.

With the index Beijing website production (www.sino web. net). This structure can store index and web page data by adding Index Server and Doc Server, but it will still face many other problems. Therefore, in the following ten years, Google has done a lot to improve the structure.

In 1999, Google added a Cache Cluster to cache query index results and document fragment information. At the same time, it changed the Index Server and Doc Server into a cluster through Replicate. The benefits of these two transformations are that the response speed, supportable traffic and availability of the website have been improved. This change has led to an increase in costs. Google's style of hardware is always not expensive high-end hardware, but to ensure the reliability and high performance of the system at the software level. So in the same year, Google began to use self-designed servers to reduce costs. In 2000, Google began to design DataCenter by itself, and used various methods (such as using other refrigeration methods to replace air conditioning) to optimize PUE (energy efficiency). At the same time, Google also made a lot of efforts to design its own servers. In 2001, Google modified the format of the index, putting all the indexes into memory. The benefit of this transformation is that the response speed of the website and the number of visitors it can support have been greatly improved. In 2003, Google published the article Google Cluster Architecture, whose cluster structure consists of hardware LB+Index Cluster+Doc Cluster+a large number of low-cost servers (such as IDE hard disks, cost-effective CPUs, etc.). Through parallel processing+sharding, it can ensure that the response speed is still fast while reducing the hardware requirements. In the same year, Google published a paper on Google's file system (GFS has been online since 2000), which to a large extent reflects Google's style of not using expensive hardware. A large amount of data can be stored through GFS+a large number of inexpensive servers. In 2004, Google again modified the format of the Index, making the response speed of the website continue to improve. In the same year, Google published a paper on MapReduce. Through MapReduce+a large number of cheap servers, it can quickly complete the computing tasks that were previously required to use expensive minicomputers, medium-sized computers, and even mainframe computers. This obviously helps Google quickly build indexes. In 2006, Google published a paper on BigTable (launched online in 2003), enabling the analysis of massive data to meet the requirements of online systems, which has greatly helped Google improve the response speed of its website.

The above three papers have completely changed the method of storage, analysis and retrieval of massive data in the industry (gossip: Google has completed the replacement of GFS, MapReduce and BigTable internally), and also established Google's technical leadership in the industry.

In some scenarios, Google also uses MySQL to store data. Similarly, Google has made many changes to MySQL. The MySQL information it uses can be downloaded from //code.google.com/p/google-mysql/ Understand.

In 2007, Google shortened the time for building an index to minutes. When a new page appeared, it could be searched in Google a few minutes later. At the same time, the Index Cluster provided services through Protocol Buffers for Google's various searches (such as web pages, pictures, news, books, etc.). In addition to the services provided by the Index Cluster, there were many other services, For example, advertising, lexical inspection, etc. Google needs to call more than 50 internal services for a search, which are mainly written in C++or Java. In 2009, Google's "How Google uses Linux" article revealed that Google has also made a lot of efforts to improve the utilization of machines, such as deploying applications of different resource consumption types on the same machine.

Later, Google developed Colossus (the next generation of GFS like file system), Spanner (the next generation of BigTable like mass storage and computing architecture), and real-time search (based on Colossus), mainly to improve the real-time performance of search and store more data. In addition to the innovation in massive data related technologies, Google also continues to innovate on the traditional technologies in the industry, such as improving the initial congestion window value of TCP, improving the SPDY protocol of HTTP, and the new image format WebP.

In the development process of Google, its technological transformation mainly focuses on four aspects: scalability, performance, cost and availability. Google's style of not using expensive hardware and the amount of data leading other websites determine that its technological transformation is basically an innovation of traditional hardware and software technologies.

Facebook is currently ranked No. 2 by Alexa. It is built with LAMP. With the development of business, it has also made many technical changes.
As the first step of the transformation, Facebook first added Memcached in the LAMP structure to cache various data, thus significantly improving the system's response time and supportable access volume. Then it added the Services layer, which provides more general functions such as News Feed and Search as services to the front-end PHP system, which accesses these services through Thrift. Facebook uses a variety of languages to write various services, mainly selecting the appropriate language for different scenarios, such as C++, Java, and Erlang.

The massive use of Memcached and the increasing number of visits have led to too much network traffic to access Memcached and the switch can not support it. Facebook uses UDP to access Memcached through transformation to reduce the network traffic on a single connection. In addition, there are other modifications, and specific information can be viewed //on.fb.me/8R0C

As a scripting language, PHP has the advantages of simple development and easy to use, while its disadvantage is that it needs to consume more CPU and memory. When Facebook's traffic grew to a certain scale, this disadvantage became more prominent. Since 2007, Facebook has tried many ways to solve this problem. Finally, the HipHop product born in Facebook Hackathon successfully stood out.

HipHop can automatically convert PHP into C++code. After using HipHop, Facebook can support 6 times the number of requests of machines with the same configuration, and the CPU utilization rate decreases by 50% on average, saving Facebook a lot of hosts. In the future, Facebook will further improve HipHop by compiling PHP into bytecode through HipHop, putting it into HipHop VM for execution, and then compiling it into machine code by HipHop VM in a similar way to JIT.

In 2009, Facebook developed BigPipe. With this system, Facebook successfully tripled the speed of the website. With the increase of Facebook's visits, collecting execution logs on many servers also began to face challenges, so Facebook developed Scribe to solve this problem. For the data stored in MySQL, Facebook uses the vertical splitting database and horizontal splitting table to support the growing data volume. As an important part of Facebook's technical system, Facebook has also made many optimizations and improvements to MySQL, such as Online Schema Change. More information can be seen //www.facebook.com/MySQLAtFacebook

At the beginning of its development, Facebook used high-end storage devices (such as NetApp and Akamai) to store pictures. With the increase of pictures, the cost also increased significantly. So in 2009, Facebook developed Haystack to store pictures. Haystack can use inexpensive PC Server for storage, greatly reducing the cost.

In addition to using MySQL to store data, Facebook has also begun to explore new ways in recent years. In 2008, Facebook developed Cassandra as a new storage method in Message Inbox Search. However, in 2010, Facebook gave up Cassandra and used HBase as its message storage. In 2011, it applied HBase to more Facebook projects (such as Puma and ODS). It is said that Facebook is now trying to migrate its user and relationship data from MySQL to HBase.
Since 2009, Facebook has tried to design DataCenter and servers by itself to reduce its operating costs, and has opened up the related technologies of DataCenter with only 1.07 PUE. Facebook's basic technical principle is: "Open source products should be used when they are available, and they should be optimized and fed back to the community according to the situation". From the development of Facebook's technology, we can see that this principle has always been implemented. Facebook's technical transformation is also mainly centered on four aspects: scalability, performance, cost and availability.

Twitter is currently ranked 8th by Alexa. It was built with Ruby On Rails+MySQL when it was born in 2006. In 2007, Memcached was added as the cache layer to improve the response speed. Based on Ruby on Rails, Twitter has enjoyed the ability of rapid development, but with the growth of the number of visits, its consumption of CPU and memory also makes Twitter unbearable. So Twitter has made many changes and efforts, such as writing an optimized version of Ruby GC.

In 2008, Twitter decided to gradually migrate to Java, chose Scala as the main development language (for the reason that "it is difficult to sell Java to Ruby programmers in a room"), adopted Thrift as its main communication framework, and developed Finagle as its service framework, which can expose various functions of the back-end as services and provide them to front-end systems, The front-end system does not need to care about different communication protocols (for example, users can access Memcache, Redis, and Thrift servers in the same way to call services), and has developed Kestrel as its message middleware (replacing Starling written in Ruby before).

The data storage of Twitter has always used MySQL. A small episode in the development process is that when Facebook opened Cassandra, Twitter planned to use it, but eventually gave up, and still used MySQL. The MySQL version of Twitter has been open source( //github.com/twitter/mysql )。 Twitter also uses the method of sub database and sub table to support large amounts of data. Memcached is used to cache tweets, and timeline information is migrated to Redis for caching.
In 2010, Twitter had the first self built DataCenter in Salt Lake City, mainly to increase controllability. From the perspective of Twitter's development process, its technical transformation in the past six years has mainly focused on scalability and usability.

As an employee of an e-commerce website, please allow me to introduce the technical evolution of this famous e-commerce website ranked 21 by Alexa.
In 1995, eBay was born, and it was written by CGI. The database used GDBM, which could only support 50000 online goods at most. In 1997, eBay migrated its operating system from FreeBSD to Windows NT and its database from GDBM to Oracle. In 1999, eBay transformed its front-end system into a cluster (there was only one host before), used Resonate as the load balancer, and upgraded its back-end Oracle machine to a Sun E1000 minicomputer. In the same year, eBay added a machine to the database as a standby database to improve availability. The front-end machine can still cope with the increasing access, but the database machine reached the bottleneck in November 1999 (CPU and memory can no longer be added), so in November 1999, the database was split into multiple databases by business. From 2001 to 2002, eBay split the data table horizontally, for example, storing goods by category, and replacing the Oracle minicomputer with the Sun A3500. In 2002, the entire website was migrated to be built in Java. At this stage, the DAL framework was made to shield the impact of database and table segmentation. At the same time, a development framework was designed for developers to better start functional development. From the perspective of the whole development process of eBay, the technical transformation mainly focuses on scalability and availability.

At present, Tencent Alexa ranks 9th. At first, QQ IM used a single access server to handle user login and status maintenance, but when one million users were online at the same time, this server could not support it. So QQ IM transformed all single servers into clusters, and added a status synchronization server to complete the status synchronization within the cluster. The user information is stored in MySQL, with separate databases and tables, and the friendship relationship is stored in the self implemented file storage. In order to improve the efficiency of interprocess communication, Tencent has implemented user mode IPC. Later, Tencent transformed the status synchronization server into a synchronization cluster to support more and more online users. After the previous several transformations, Tencent has basically been able to support tens of millions of users to be online at the same time, but the availability is relatively poor. So Tencent has reconstructed QQ IM again, realizing disaster recovery across IDC in the same city, and strengthening the construction of monitoring and operation and maintenance systems. Since then, Tencent has decided to completely rewrite the QQ IM architecture (probably from 2009 to the present), mainly to enhance flexibility, support cross city IDC, and support tens of millions of friends. During this major technological transformation, Tencent's data is no longer stored in MySQL, but all stored in the system designed by Tencent itself.
From the technical evolution of QQ IM, its technical transformation mainly focuses on scalability and availability.

In 2003, Taobao was born. It directly purchased a commercial phpAuction software, and on this basis, it was transformed to produce Taobao. In 2004, the system was migrated from PHP to Java, MySQL to Oracle (minicomputer, high-end storage device), and the application server used WebLogic. In the development process from 2005 to 2007, JBoss was used to replace WebLogic, the database was divided, distributed cache was made based on BDB, the distributed file system TFS was self-developed to support the storage of small files, and its own CDN was built. From 2007 to 2009, the application system was vertically split. The split system provides external functions in the form of service. The data was split vertically and horizontally.
After the vertical and horizontal splitting of data, the cost of Oracle is getting higher and higher. So in the following years, Taobao began to gradually migrate data from Oracle to MySQL. At the same time, it began to try new data storage solutions, such as using HBase to support the storage and retrieval of historical transaction orders. In recent years, Taobao has begun to modify and customize the Linux kernel, JVM, Nginx and other software. At the same time, it has designed its own low-power server, and optimized its software and hardware to better reduce costs.
From the perspective of the whole development process of Taobao, the technical transformation mainly focuses on scalability and availability, and now it has gradually focused on performance and cost. At present, Alexa of Taobao ranks 14th.
summary
From the technical development process of these websites ranked high by Alexa, each website will adopt different methods to support the development of business in different development stages of technology due to its different businesses, team members and work styles, but the basic focus will be on scalability, availability, performance and cost, After developing to a relatively large scale, each website has many similarities in technical structure, and these structures will continue to evolve.
Lin Hao, the original author, worked on Taobao. From 2007 to 2010, he was responsible for the design and implementation of Taobao's service framework, which was widely used on Taobao and received more than 15 billion requests every day; Since 2011, he has been responsible for the landing of HBase on Taobao. At present, more than 20 online projects on Taobao are using HBase.
Others: Cassandra
When designing Cassandra storage, the book suggests modeling around queries rather than modeling data first. Some people object to this and think that the query type changes too quickly. The author retorts that both the query type and the data itself have changed. Cassandra's most fundamental model is simple kv. Therefore, it is necessary to model around queries as much as possible. How to better coordinate here is a challenge.
Cassandra's column family is like a table structure. It needs to be restarted after modification. A cf has a single file, and a row of data can have multiple column families. User is a family, user_ext is a family, row key is uid, and it feels that cassandra is more like db when used,

Kevin Allocca, social trend manager of Youtube, explains the three common points of the hottest YouTube videos. 1. Tastemakers - Recommended by experts. 2. Community - like-minded groups. 3. Surprise - Unexpected surprise.
Cassandra's application in 360 has a large amount of storage needs for online businesses such as user favorites, graph beds, and vertical search. Considering that MySQL cannot meet the needs, but HBase has the disadvantage of Availabilty, Cassandra has been selected. The current scale is 600~700, and it is expected to be around 1500 at the end of the year. There is no major failure at present. It is estimated to be the largest Cassandra cluster in the world

Who can help me explain the differences and connections between the three concepts of Key Value, Column oriented and Document oriented. After seeing the division in the picture, I was completely confused. I always thought the three were the same thing
: message queue transmission: kafka, timetunnel, kestrel, subscribe; Column storage database domain hbase; Kv database: cassandra, riak, voldemort, tair; Document database: mongodb, couchdb; Graphic database: neo4j, pregel, flockdb; Stream computing: Storm, iprocess; Real time calculation: prom; Graphic calculation: pregel, apache hama; Offline computing: hive, spark.

Facebook local kv expiration is solved by complex middle layer
There is no join query on MySQL, and expensive Fusion IO is used. Nginx is better than lighttpd. Scribe is a good thing
This article was published by Beijing website production company Shangpin China //ihucc.com/
Source Statement: This article is original or edited by Shangpin China's editors. If it needs to be reproduced, please indicate that it is from Shangpin China. The above contents (including pictures and words) are from the Internet. If there is any infringement, please contact us in time (010-60259772).
TAG label:

What if your website can increase the number of conversions and improve customer satisfaction?

Make an appointment with a professional consultant to communicate!

* Shangpin professional consultant will contact you as soon as possible

Disclaimer

Thank you very much for visiting our website. Please read all the terms of this statement carefully before you use this website.

1. Part of the content of this site comes from the network, and the copyright of some articles and pictures involved belongs to the original author. The reprint of this site is for everyone to learn and exchange, and should not be used for any commercial activities.

2. This website does not assume any form of loss or injury caused by users to themselves and others due to the use of these resources.

3. For issues not covered in this statement, please refer to relevant national laws and regulations. In case of conflict between this statement and national laws and regulations, the national laws and regulations shall prevail.

4. If it infringes your legitimate rights and interests, please contact us in time, and we will delete the relevant content at the first time!

Contact: 010-60259772
E-mail: [email protected]

Communicate with professional consultants now!

  • National Service Hotline

    400-700-4979

  • Beijing Service Hotline

    010-60259772

Please be assured to fill in the information protection
Online consultation

Disclaimer

Thank you very much for visiting our website. Please read all the terms of this statement carefully before you use this website.

1. Part of the content of this site comes from the network, and the copyright of some articles and pictures involved belongs to the original author. The reprint of this site is for everyone to learn and exchange, and should not be used for any commercial activities.

2. This website does not assume any form of loss or injury caused by users to themselves and others due to the use of these resources.

3. For issues not covered in this statement, please refer to relevant national laws and regulations. In case of conflict between this statement and national laws and regulations, the national laws and regulations shall prevail.

4. If it infringes your legitimate rights and interests, please contact us in time, and we will delete the relevant content at the first time!

Contact: 010-60259772
E-mail: [email protected]